CN113610366A - Risk warning generation method and device and electronic equipment - Google Patents

Risk warning generation method and device and electronic equipment Download PDF

Info

Publication number
CN113610366A
CN113610366A CN202110836040.XA CN202110836040A CN113610366A CN 113610366 A CN113610366 A CN 113610366A CN 202110836040 A CN202110836040 A CN 202110836040A CN 113610366 A CN113610366 A CN 113610366A
Authority
CN
China
Prior art keywords
information
risk
user
historical
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110836040.XA
Other languages
Chinese (zh)
Inventor
李心宇
聂婷婷
沈赟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qiyue Information Technology Co Ltd
Original Assignee
Shanghai Qiyue Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qiyue Information Technology Co Ltd filed Critical Shanghai Qiyue Information Technology Co Ltd
Priority to CN202110836040.XA priority Critical patent/CN113610366A/en
Publication of CN113610366A publication Critical patent/CN113610366A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Educational Administration (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure relates to a risk warning generation method, a risk warning generation device, an electronic device and a computer readable medium. The method comprises the following steps: acquiring user information of a user, wherein the user information comprises basic information and behavior information; generating multi-dimensional feature information based on the user information and the feature policy; inputting the multi-dimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user; and generating risk warning information when the at least one risk score meets a preset strategy. According to the risk warning generation method and device, the over-fitting problem caused by over-sampling or under-sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks are rapidly determined, and the safety of user resource allocation is improved.

Description

Risk warning generation method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer information processing, and in particular, to a risk warning generation method and apparatus, an electronic device, and a computer-readable medium.
Background
Individual users or enterprise users often conduct resource borrowing activities by resource service organizations, and the borrowing activities of the users are likely to bring risks to resource service companies for the resource service organizations. In actual wind control, it is often necessary and valuable to predict and obtain corresponding risk techniques in advance. Currently, the resource risk is often determined by analyzing the basic information and behavior information of the user. Different risk methods have corresponding wind control means, such as malicious default, behaviors and characteristic expressions of malicious default users can be observed for malicious default cases, and if variables and strategies with characteristics modeled can play a positive role in risk prevention and control.
The fraudulent user characteristics can be learned, for example, in identifying fraudulent users, often in a model predictive manner for the discovery of new fraudulent users. However, during the modeling training of these users, the staff found that the annotation itself was less accurate for the fraudulent user. As mentioned above, the tags labeling fraudulent users are largely dependent on manual and later investigation, which results in that many fraudulent users cannot be identified, i.e. the defined users without fraud may include true non-fraudulent clients or fraudulent clients but are not found by manual and investigation. In the training process of the sample, if the labeled data of the sample is not accurate, when the classification problem is predicted, the label is subjected to one-hot coding, and fitting is performed by adopting cross entropy as a loss function.
The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the present disclosure provides a risk warning generation method, an apparatus, an electronic device, and a computer readable medium, which can solve the over-fitting problem caused by over-sampling or under-sampling during machine model training, obtain an accurate calculation model, and further quickly determine a user with a financial risk, thereby improving the security of user resource allocation.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, a risk alert generation method is proposed, the method comprising: acquiring user information of a user, wherein the user information comprises basic information and behavior information; generating multi-dimensional feature information based on the user information and the feature policy; inputting the multi-dimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user; and generating risk warning information when the at least one risk score meets a preset strategy.
Optionally, the method further comprises: obtaining multi-dimensional characteristic information of a plurality of historical users; respectively distributing sample labels to the plurality of historical users based on the multi-dimensional characteristic information; determining label parameters for the sample labels based on a regularization policy; training a machine learning model based on the plurality of historical users and their corresponding sample labels, label parameters to generate the risk model.
Optionally, the risk model includes a plurality of sub-risk models, and the inputting the multidimensional feature information into the risk model to generate at least one risk score includes: determining at least one sub-risk model from a plurality of sub-risk models of the risk model according to the user information; and inputting the multi-dimensional characteristic information into at least one sub-risk model to generate at least one risk score.
Optionally, when the at least one risk score satisfies a preset policy, generating risk warning information, including: randomly combining the at least one risk score to generate at least one joint score; and generating the risk warning information when the at least one joint score meets a preset strategy.
Optionally, acquiring multi-dimensional feature information of a plurality of historical users includes: acquiring a plurality of pieces of historical user information meeting preset conditions; performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information; determining a plurality of historical multidimensional feature information from the plurality of historical feature information; generating a feature policy based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
Optionally, allocating sample labels to the plurality of historical users respectively based on the user information includes: comparing the user information of the historical user with a plurality of discrimination strategies; and allocating sample labels to the historical users based on a discrimination strategy met by the user information, wherein the sample labels are represented by discrete positive integers.
Optionally, determining label parameters for the sample labels based on a regularization policy includes: generating a determined deviation coefficient based on a regularization strategy; generating label parameters for the sample label based on the deviation factor.
Optionally, training a machine learning model based on the plurality of historical users and their corresponding sample labels, label parameters to generate the risk model comprises: inputting a plurality of historical users with sample labels and label parameters into a machine learning model for training; generating a cross entropy loss function based on the label parameters in the training process; and when the cross entropy loss function obtains an optimal solution, determining the risk model based on the model parameters of the current machine learning model.
Optionally, when the cross entropy loss function obtains an optimal solution, the method includes: solving the cross entropy loss function based on a gradient descent mode; and taking the stable solution of the cross entropy loss function as the optimal solution.
According to an aspect of the present disclosure, a risk alert generating device is proposed, the device comprising: the information module is used for acquiring user information of a user, wherein the user information comprises basic information and behavior information; the characteristic module is used for generating multi-dimensional characteristic information based on the user information and the characteristic strategy; the scoring module is used for inputting the multi-dimensional characteristic information into a risk model to generate at least one risk score, the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user; and the warning module is used for generating risk warning information when the at least one risk score meets a preset strategy.
According to an aspect of the present disclosure, an electronic device is provided, the electronic device including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.
According to an aspect of the disclosure, a computer-readable medium is proposed, on which a computer program is stored, which program, when being executed by a processor, carries out the method as above.
According to the risk warning generation method, the risk warning generation device, the electronic equipment and the computer readable medium, user information of a user is obtained, wherein the user information comprises basic information and behavior information; generating multi-dimensional feature information based on the user information and the feature policy; inputting the multi-dimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user; when the at least one risk score meets a preset strategy, a risk warning message is generated, so that the over-fitting problem caused by over-sampling or under-sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks are rapidly determined, and the safety of user resource allocation is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely some embodiments of the present disclosure, and other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.
FIG. 1 is a system block diagram illustrating a risk alert generation method and apparatus according to an example embodiment.
FIG. 2 is a flow diagram illustrating a risk alert generation method according to an example embodiment.
FIG. 3 is a flow chart illustrating a risk alert generation method according to another exemplary embodiment.
FIG. 4 is a flow chart illustrating a risk alert generation method according to another exemplary embodiment.
FIG. 5 is a block diagram illustrating a risk alert generation apparatus according to an exemplary embodiment.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
FIG. 7 is a block diagram illustrating a computer-readable medium in accordance with an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below may be termed a second component without departing from the teachings of the disclosed concept. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It is to be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present disclosure and are, therefore, not intended to limit the scope of the present disclosure.
In this disclosure, resources refer to any substance, information, time that may be utilized, information resources including computing resources and various types of data resources. The data resources include various private data in various domains. The innovation of the present disclosure is how to use information interaction technology between a server and a client to make the process of risk warning information generation more automated, efficient, and reduce human costs. Thus, in essence, the present disclosure can be applied to the distribution of various types of resources, including physical goods, water, electricity, and meaningful data. However, for convenience, the resource allocation is illustrated as being implemented by taking financial data resources as an example in the disclosure, but those skilled in the art will understand that the disclosure can also be used for allocation of other resources.
FIG. 1 is a system block diagram illustrating a risk alert generation method and apparatus according to an example embodiment.
As shown in fig. 1, the system architecture 10 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a financial services application, a shopping application, a web browser application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, such as a background management server that supports financial services websites browsed by the user using the terminal apparatuses 101, 102, and 103. The background management server may analyze the received user data, and feed back the processing result (e.g., risk warning information) to the administrator of the financial services website.
The server 105 may, for example, obtain user information of the user, the user information including basic information and behavior information; server 105 may generate multidimensional feature information, for example, based on the user information and a feature policy; server 105 may, for example, input the multidimensional feature information into a risk model generated from user information of historical users and a machine learning model, wherein the historical users assign sample labels in a regularized policy according to their corresponding user information, generate at least one risk score; server 105 may generate risk alert information, for example, when the at least one risk score satisfies a preset policy.
The server 105 may also, for example, obtain multi-dimensional feature information for a plurality of historical users; respectively distributing sample labels to the plurality of historical users based on the multi-dimensional characteristic information; determining label parameters for the sample labels based on a regularization policy; training a machine learning model based on the plurality of historical users and their corresponding sample labels, label parameters to generate the risk model.
The server 105 may also, for example, acquire a plurality of pieces of historical user information that satisfy preset conditions; performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information; determining a plurality of historical multidimensional feature information from the plurality of historical feature information; generating a feature policy based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
The server 105 may also set the trained risk model and preset policy in the terminal devices 101, 102, and 103, for example, so that the user information of the users of the terminal devices 101, 102, and 103 generates multidimensional feature information based on the user information and the feature policy; inputting the multi-dimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user; when the at least one risk score meets a preset policy, the terminal devices 101, 102, 103 generate and send risk warning information to the server 105.
The server 105 may be a server of one entity, and may also be composed of a plurality of servers, for example, and a part of the server 105 may be used as a risk warning system in the present disclosure, for generating risk warning information; a portion of the servers 105 may be, for example, a predictive policy generation system in the present disclosure, for generating a preset policy; and a portion of the server 105 may also be, for example, a model training system in the present disclosure, for training a machine learning model based on the plurality of historical users and their corresponding sample labels, label parameters to generate the risk model.
It should be noted that the risk warning generating method provided by the embodiment of the present disclosure may be executed by the server 105 and/or the terminal devices 101, 102, and 103, and accordingly, the risk warning generating apparatus may be disposed in the server 105 and/or the terminal devices 101, 102, and 103. And the web page end provided for the user to browse the financial service platform is generally positioned in the terminal equipment 101, 102 and 103.
FIG. 2 is a flow diagram illustrating a risk alert generation method according to an example embodiment. The risk alert generation method 20 includes at least steps S202 to S208.
As shown in fig. 2, in S202, user information of a user is acquired, the user information including basic information and behavior information. In the embodiment of the present disclosure, the user may be an individual user or an enterprise user, and the allocation of the resource amount may be adjustment of a financial resource amount, or allocation of an electric power resource and a hydraulic resource. The user information may include basic information, which may be, for example, service account information, page operation data of the user, service access duration of the user, service access frequency of the user, terminal device identification information of the user, and region information where the user is located, and may be specifically determined according to an actual application scenario, which is not limited herein. The user information may also include behavior information, which may be, for example, page operation data of the user, service access duration of the user, service access frequency of the user, and the like, and specific content of the user information may be determined according to an actual application scenario, which is not limited herein. More specifically, the user information of the current user can be obtained in a webpage point burying mode based on user authorization.
More specifically, behavior information of a user on a webpage can be acquired through a Fiddler tool, the Fiddler tool works in the form of a web proxy server, a client side firstly sends out request data, the Fiddler proxy server intercepts a data packet, and the proxy server impersonates the client side to send data to a server; similarly, the server returns the response data, and the proxy server intercepts the data and returns the intercepted data to the client. And the Fiddler can acquire the related browsing data of residence time, residence page, click operation and the like of the user network browsing.
In S204, multi-dimensional feature information is generated based on the user information and the feature policy. A feature policy may be generated, for example, based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
The data cleaning and data fusion can be carried out on the user information so as to convert the user information into multi-dimensional data, and more particularly, the variable loss rate analysis and processing and abnormal value processing can be carried out on the user information; and the user information discretized by continuous variables can be subjected to WOE conversion, discrete variable WOE conversion, text variable processing, text variable word2vec processing and the like.
Among them, WOE is "Weight of Evidence", i.e., Evidence Weight. WOE is a form of encoding of the original features. To WOE encode a feature, this variable needs to be first grouped. Word2vec, a group of correlation models used to generate Word vectors. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic word text. The word2vec model may be used to map each word to a vector, which may be used to represent word-to-word relationships.
In S206, the multidimensional feature information is input into a risk model, which is generated by using user information of a historical user and a machine learning model, and at least one risk score is generated, wherein the historical user assigns a sample label in a regularization policy according to the corresponding user information.
In one embodiment, the risk model includes a plurality of sub-risk models, and the inputting the multi-dimensional feature information into the risk model generates at least one risk score, including: determining at least one sub-risk model from a plurality of sub-risk models of the risk model according to the user information; and inputting the multi-dimensional characteristic information into at least one sub-risk model to generate at least one risk score.
More specifically, each sub-risk model may represent a risk of the user in a certain aspect, and sub-risk model a may represent a risk of the user returning resources over time, for example; the sub-risk model B may, for example, represent a risk that the user does not plan the resource; the sub-risk model C may for example represent the risk of intentional fraud by the user.
In S208, when the at least one risk score satisfies a preset policy, generating risk warning information. At least one joint score may be generated, for example, by randomly combining the at least one risk score; and generating the risk warning information when the at least one joint score meets a preset strategy.
For example, the risk scores may be combined and compared with a preset strategy according to the combined value to determine whether to generate risk warning information. More specifically, it may be determined to generate warning information when risk score a is greater than 0.5 and risk score B is greater than 0.3; it may also be determined to generate warning information, for example, when the risk score C is greater than 0.8.
According to the risk warning generation method, user information of a user is acquired, wherein the user information comprises basic information and behavior information; generating multi-dimensional feature information based on the user information and the feature policy; inputting the multi-dimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user; when the at least one risk score meets a preset strategy, a risk warning message is generated, so that the over-fitting problem caused by over-sampling or under-sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks are rapidly determined, and the safety of user resource allocation is improved.
It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
FIG. 3 is a flow chart illustrating a risk alert generation method according to another exemplary embodiment. The process 30 shown in fig. 3 is a supplementary description of the process shown in fig. 2.
As shown in fig. 3, in S302, multi-dimensional feature information of a plurality of historical users is acquired. And generating the multi-dimensional characteristic information from the user information of a plurality of historical users based on a preset strategy.
In S304, sample labels are respectively assigned to the plurality of historical users based on the multi-dimensional feature information. The method comprises the following steps: comparing the user information of the historical user with a plurality of discrimination strategies; and allocating sample labels to the historical users based on a discrimination strategy met by the user information, wherein the sample labels are represented by discrete positive integers.
The number of sample labels can be determined according to the number of the sub-models in the risk model to be trained, the risk sub-models can be, for example, three types of A, B and C, and then the corresponding risk labels can be numerical values 1,2,3 and 4. Wherein tag 1 represents having risk a, tag 2 represents having risk B, tag 3 represents having risk C, and tag 4 represents no risk.
In S306, label parameters are determined for the sample labels based on a regularization policy. The method comprises the following steps: generating a determined deviation coefficient based on a regularization strategy; generating label parameters for the sample label based on the deviation factor. The values of the tags determined above may be smoothed so that the tags become in the form of probability values, where the probability value at the real tag is the largest and the probability values at other locations are very small numbers. Therefore, the distance between different classes in training is increased, the intra-class distance is reduced, the predicted overfitting is reduced, and the prediction robustness is improved.
May for example be a deviation factor of
Figure BDA0003177060980000101
Wherein epsilon is a hyper-parameter, K is the total number of categories, i is the number of the sub-models in the application, and i represents one of a plurality of categories. The above formula means that the probability that i is a label of a certain class is (1-epsilon), and the probability that i is not a label of a certain class is
Figure BDA0003177060980000111
In S308, a machine learning model is trained based on the plurality of historical users and their corresponding sample labels, label parameters to generate the risk model. A plurality of historical users with sample labels and label parameters can be input into a machine learning model for training; generating a cross entropy loss function based on the label parameters in the training process; and when the cross entropy loss function obtains an optimal solution, determining the risk model based on the model parameters of the current machine learning model.
In one embodiment, the cross entropy loss function may be solved, for example, based on a gradient descent approach; and taking the stable solution of the cross entropy loss function as the optimal solution.
Specifically, a submodel is respectively constructed for a sample set of each label, user information of each historical user in the sample set is input into the submodel to obtain a predicted label, the predicted label is compared with a corresponding real label, whether the predicted label is consistent with the real label or not is judged, the number of the predicted labels consistent with the real label is counted, the ratio of the number of the predicted labels consistent with the real label to the number of all the predicted labels is calculated, if the ratio is larger than or equal to a preset ratio, the submodel is converged to obtain a trained submodel, if the ratio is smaller than the preset ratio, parameters in the submodel are adjusted, and the predicted label of each object is predicted again through the adjusted submodel until the ratio is larger than or equal to the preset ratio. The method for adjusting the parameters in the adjustment model may be performed by using a random gradient descent algorithm, a gradient descent algorithm, or a normal equation.
In the application, the machine learning model may be a classification model, and specifically may be one or a combination of multiple classification algorithms such as logistic regression, naive bayes, decision trees, support vector machines, random forests, gradient boosting trees, and the like, and if the number of times of adjusting the parameters of the model exceeds a preset number of times, the type of the machine learning model used by the structure model may be changed to improve the model training efficiency.
According to the risk warning generation method, label labeling is subjected to smoothing processing, so that the model classification hyperplane is not close to original data, the weight of the class probability of a real label in calculating a loss value is reduced, and meanwhile the weight of the prediction probability of other classes in a final loss function is increased. Therefore, the difference between the probability of the real category and the probability mean value of other categories is reduced, the excessive confidence of the model is reduced, and the risk user is effectively identified.
FIG. 4 is a flow chart illustrating a risk alert generation method according to another exemplary embodiment. The flow 40 shown in fig. 4 is a detailed description of "generating a preset policy".
As shown in fig. 4, in S402, a plurality of pieces of history user information satisfying a preset condition are acquired. In the present embodiment, the financial resource borrowing is taken as an example for illustration, and it is understood that the method of the present application can also be applied to other distribution scenarios. Based on real business data of a certain financial service platform, historical users with 30+ (namely MOB 330 +) overdue repayment performance in 3 periods are defined as target samples of the modeling through index analysis such as view, mobility and the like, and the proportion of the overdue samples is less than 5%.
In S404, the plurality of historical user information is subjected to data cleaning and data fusion to generate a plurality of historical feature information. After the information is fused to form a wide-table variable with tens of thousands of dimensions, further cleaning and processing are needed to be carried out on the data so as to ensure the stability and accuracy of the later model. The data cleaning steps include but are not limited to variable missing rate analysis and processing, abnormal value processing, continuous variable discretization and WOE conversion, discrete variable WOE conversion, text variable processing and the like.
In S406, a plurality of historical multidimensional feature information is determined from the plurality of historical feature information. Variable parameters, discrimination parameters, information values and model characteristic parameters of the historical characteristic information can be calculated; and extracting a plurality of historical multidimensional characteristic information from the plurality of historical characteristic information based on the variable parameter, the discrimination parameter, the information value and the model characteristic parameter.
The method can comprehensively consider in many aspects such as variable coverage, single value coverage, correlation and significance with the target variable, distinguishing degree (KS) and Information Value (IV) of the target variable, characteristic importance of tree models (such as XGboost, RF and the like), and the like, and screen the characteristics with high coverage and obvious distinguishing effect on the target variable as multi-dimensional characteristics.
In S408, a feature policy is generated based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. When executed by the CPU, performs the functions defined by the above-described methods provided by the present disclosure. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.
FIG. 5 is a block diagram illustrating a risk alert generation apparatus according to an exemplary embodiment. As shown in fig. 5, the risk alert generating device 50 includes: an information module 502, a characteristics module 504, a scoring module 506, and an alert module 508.
The information module 502 is configured to obtain user information of a user, where the user information includes basic information and behavior information;
the feature module 504 is configured to generate multi-dimensional feature information based on the user information and a feature policy;
the scoring module 506 is configured to input the multidimensional feature information into a risk model, which is generated by using user information of a historical user and a machine learning model, and generate at least one risk score, where the historical user assigns a sample label in a regularization policy according to the user information corresponding to the historical user;
the warning module 508 is configured to generate risk warning information when the at least one risk score satisfies a preset policy.
According to the risk warning generation device disclosed by the invention, user information of a user is acquired, wherein the user information comprises basic information and behavior information; generating multi-dimensional feature information based on the user information and the feature policy; inputting the multi-dimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user; when the at least one risk score meets a preset strategy, a risk warning message is generated, so that the over-fitting problem caused by over-sampling or under-sampling during machine model training can be solved, an accurate calculation model is obtained, users with financial risks are rapidly determined, and the safety of user resource allocation is improved.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
An electronic device 600 according to this embodiment of the disclosure is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code that is executable by the processing unit 610 such that the processing unit 610 performs steps in accordance with various exemplary embodiments of the present disclosure in the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 2,3, 4.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 600' (e.g., keyboard, pointing device, bluetooth device, etc.), such that a user can communicate with devices with which the electronic device 600 interacts, and/or any device (e.g., router, modem, etc.) with which the electronic device 600 can communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, as shown in fig. 7, the technical solution according to the embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above method according to the embodiment of the present disclosure.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of: acquiring user information of a user, wherein the user information comprises basic information and behavior information; generating multi-dimensional feature information based on the user information and the feature policy; inputting the multi-dimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user; and generating risk warning information when the at least one risk score meets a preset strategy. The computer readable medium may also implement the following functions: obtaining multi-dimensional characteristic information of a plurality of historical users; respectively distributing sample labels to the plurality of historical users based on the multi-dimensional characteristic information; determining label parameters for the sample labels based on a regularization policy; training a machine learning model based on the plurality of historical users and their corresponding sample labels, label parameters to generate the risk model. The computer readable medium may also implement the following functions: acquiring a plurality of pieces of historical user information meeting preset conditions; performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information; determining a plurality of historical multidimensional feature information from the plurality of historical feature information; generating a feature policy based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
Those skilled in the art will appreciate that the modules described above may be distributed in the apparatus according to the description of the embodiments, or may be modified accordingly in one or more apparatuses unique from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that the present disclosure is not limited to the precise arrangements, instrumentalities, or instrumentalities described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (12)

1. A risk alert generation method, comprising:
acquiring user information of a user, wherein the user information comprises basic information and behavior information;
generating multi-dimensional feature information based on the user information and the feature policy;
inputting the multi-dimensional feature information into a risk model to generate at least one risk score, wherein the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user;
and generating risk warning information when the at least one risk score meets a preset strategy.
2. The method of claim 1, further comprising:
obtaining multi-dimensional characteristic information of a plurality of historical users;
respectively distributing sample labels to the plurality of historical users based on the multi-dimensional characteristic information;
determining label parameters for the sample labels based on a regularization policy;
training a machine learning model based on the plurality of historical users and their corresponding sample labels, label parameters to generate the risk model.
3. The method of claim 1, wherein the risk model comprises a plurality of sub-risk models,
inputting the multi-dimensional feature information into a risk model, and generating at least one risk score, including:
determining at least one sub-risk model from a plurality of sub-risk models of the risk model according to the user information;
and inputting the multi-dimensional characteristic information into at least one sub-risk model to generate at least one risk score.
4. The method of claim 1, wherein generating risk warning information when the at least one risk score satisfies a preset policy comprises:
randomly combining the at least one risk score to generate at least one joint score;
and generating the risk warning information when the at least one joint score meets a preset strategy.
5. The method of claim 2, wherein obtaining multi-dimensional feature information for a plurality of historical users comprises:
acquiring a plurality of pieces of historical user information meeting preset conditions;
performing data cleaning and data fusion on the plurality of historical user information to generate a plurality of historical characteristic information;
determining a plurality of historical multidimensional feature information from the plurality of historical feature information;
generating a feature policy based on a relationship between the plurality of historical multi-dimensional feature information and the historical user information.
6. The method of claim 2, wherein assigning sample labels to the plurality of historical users based on the user information, respectively, comprises:
comparing the user information of the historical user with a plurality of discrimination strategies;
and allocating sample labels to the historical users based on a discrimination strategy met by the user information, wherein the sample labels are represented by discrete positive integers.
7. The method of claim 2, wherein determining label parameters for the sample labels based on a regularization policy comprises:
generating a determined deviation coefficient based on a regularization strategy;
generating label parameters for the sample label based on the deviation factor.
8. The method of claim 2, wherein training a machine learning model to generate the risk model based on the plurality of historical users and their corresponding sample labels, label parameters comprises:
inputting a plurality of historical users with sample labels and label parameters into a machine learning model for training;
generating a cross entropy loss function based on the label parameters in the training process;
and when the cross entropy loss function obtains an optimal solution, determining the risk model based on the model parameters of the current machine learning model.
9. The method of claim 8, wherein when the cross entropy loss function obtains an optimal solution, comprising:
solving the cross entropy loss function based on a gradient descent mode;
and taking the stable solution of the cross entropy loss function as the optimal solution.
10. A risk alert generating device, comprising:
the information module is used for acquiring user information of a user, wherein the user information comprises basic information and behavior information;
the characteristic module is used for generating multi-dimensional characteristic information based on the user information and the characteristic strategy;
the scoring module is used for inputting the multi-dimensional characteristic information into a risk model to generate at least one risk score, the risk model is generated through user information of a historical user and a machine learning model, and the historical user distributes a sample label in a regularization strategy mode according to the corresponding user information of the historical user;
and the warning module is used for generating risk warning information when the at least one risk score meets a preset strategy.
11. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-9.
CN202110836040.XA 2021-07-23 2021-07-23 Risk warning generation method and device and electronic equipment Pending CN113610366A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110836040.XA CN113610366A (en) 2021-07-23 2021-07-23 Risk warning generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110836040.XA CN113610366A (en) 2021-07-23 2021-07-23 Risk warning generation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113610366A true CN113610366A (en) 2021-11-05

Family

ID=78338188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110836040.XA Pending CN113610366A (en) 2021-07-23 2021-07-23 Risk warning generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113610366A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460059A (en) * 2022-07-28 2022-12-09 浪潮通信信息***有限公司 Risk early warning method and device
CN117521042A (en) * 2024-01-05 2024-02-06 创旗技术有限公司 High-risk authorized user identification method based on ensemble learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180033009A1 (en) * 2016-07-27 2018-02-01 Intuit Inc. Method and system for facilitating the identification and prevention of potentially fraudulent activity in a financial system
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance
CN111080440A (en) * 2019-12-18 2020-04-28 上海良鑫网络科技有限公司 Big data wind control management system
CN112037009A (en) * 2020-08-06 2020-12-04 百维金科(上海)信息科技有限公司 Risk assessment method for consumption credit scene based on random forest algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180033009A1 (en) * 2016-07-27 2018-02-01 Intuit Inc. Method and system for facilitating the identification and prevention of potentially fraudulent activity in a financial system
CN109034658A (en) * 2018-08-22 2018-12-18 重庆邮电大学 A kind of promise breaking consumer's risk prediction technique based on big data finance
CN111080440A (en) * 2019-12-18 2020-04-28 上海良鑫网络科技有限公司 Big data wind control management system
CN112037009A (en) * 2020-08-06 2020-12-04 百维金科(上海)信息科技有限公司 Risk assessment method for consumption credit scene based on random forest algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460059A (en) * 2022-07-28 2022-12-09 浪潮通信信息***有限公司 Risk early warning method and device
CN115460059B (en) * 2022-07-28 2024-03-08 浪潮通信信息***有限公司 Risk early warning method and device
CN117521042A (en) * 2024-01-05 2024-02-06 创旗技术有限公司 High-risk authorized user identification method based on ensemble learning
CN117521042B (en) * 2024-01-05 2024-05-14 创旗技术有限公司 High-risk authorized user identification method based on ensemble learning

Similar Documents

Publication Publication Date Title
CN112348660B (en) Method and device for generating risk warning information and electronic equipment
CN111210335B (en) User risk identification method and device and electronic equipment
CN112529702B (en) User credit granting strategy allocation method and device and electronic equipment
CN110705719A (en) Method and apparatus for performing automatic machine learning
CN111583018A (en) Credit granting strategy management method and device based on user financial performance analysis and electronic equipment
CN111145009A (en) Method and device for evaluating risk after user loan and electronic equipment
CN112348321A (en) Risk user identification method and device and electronic equipment
CN113610366A (en) Risk warning generation method and device and electronic equipment
CN111612635A (en) User financial risk analysis method and device and electronic equipment
CN111178687A (en) Financial risk classification method and device and electronic equipment
CN111198967A (en) User grouping method and device based on relational graph and electronic equipment
CN112348659A (en) User risk identification strategy allocation method and device and electronic equipment
CN112017062B (en) Resource quota distribution method and device based on guest group subdivision and electronic equipment
CN111191677B (en) User characteristic data generation method and device and electronic equipment
CN111582314A (en) Target user determination method and device and electronic equipment
CN111190967B (en) User multidimensional data processing method and device and electronic equipment
CN113610625A (en) Overdue risk warning method and device and electronic equipment
CN113570207B (en) User policy allocation method and device and electronic equipment
CN113568739B (en) User resource quota allocation method and device and electronic equipment
CN114742645B (en) User security level identification method and device based on multi-stage time sequence multitask
CN112348661B (en) Service policy distribution method and device based on user behavior track and electronic equipment
CN113610536A (en) User strategy distribution method and device for transaction rejection user and electronic equipment
CN113902545A (en) Resource limit distribution method and device and electronic equipment
CN111626438B (en) Model migration-based user policy allocation method and device and electronic equipment
CN113902543A (en) Resource quota adjusting method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: Zhong Guo

Address after: Room 1109, No. 4, Lane 800, Tongpu Road, Putuo District, Shanghai, 200062

Applicant after: Shanghai Qiyue Information Technology Co.,Ltd.

Address before: Room a2-8914, 58 Fumin Branch Road, Hengsha Township, Chongming District, Shanghai, 201500

Applicant before: Shanghai Qiyue Information Technology Co.,Ltd.

Country or region before: Zhong Guo

CB02 Change of applicant information