CN114782161A - Method, device, storage medium and electronic device for identifying risky users - Google Patents

Method, device, storage medium and electronic device for identifying risky users Download PDF

Info

Publication number
CN114782161A
CN114782161A CN202210336071.3A CN202210336071A CN114782161A CN 114782161 A CN114782161 A CN 114782161A CN 202210336071 A CN202210336071 A CN 202210336071A CN 114782161 A CN114782161 A CN 114782161A
Authority
CN
China
Prior art keywords
user
target
feature
data
risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210336071.3A
Other languages
Chinese (zh)
Inventor
李寒
许贤铭
严澄
杨青
单黎平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Du Xiaoman Technology Beijing Co Ltd
Original Assignee
Du Xiaoman Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Du Xiaoman Technology Beijing Co Ltd filed Critical Du Xiaoman Technology Beijing Co Ltd
Priority to CN202210336071.3A priority Critical patent/CN114782161A/en
Publication of CN114782161A publication Critical patent/CN114782161A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Finance (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a device for identifying a risk user, a storage medium and an electronic device. Wherein, the method comprises the following steps: acquiring target data; preprocessing target data to obtain a preprocessing result; constructing an association network based on the preprocessing result, wherein the vertex of the association network is a target entity associated with a target user, and the edge of the association network is the relation between the target entities; performing feature extraction processing according to the associated network to obtain a first user feature, a second user feature and a third user feature; and performing risk identification processing on the first user characteristic, the second user characteristic and the third user characteristic by using the first model to obtain a risk identification result corresponding to the target user, wherein the risk identification result is used for formulating a target risk strategy. The invention solves the technical problems of high rule making cost and poor interpretability of the method for identifying the risk users in the prior art.

Description

Method, device, storage medium and electronic device for identifying risky users
Technical Field
The invention relates to the technical field of computers, in particular to a method, a device, a storage medium and an electronic device for identifying a risk user.
Background
The credit is credit loan, which is a main loan mode for China for a long time. In credit business, there are two main risks (i.e., fraud risk and credit risk). The fraud risk refers to the possibility of fraud and reimbursement for the borrower or lender. The credit risk is the possibility that the borrower will default because of the failure to timely and fully repay the bank loan for various reasons. The credit anti-fraud and credit risk prediction methods mainly include the following two methods
In the prior art, a traditional expert anti-fraud method or a method for establishing an anti-fraud and credit risk model can be used for credit anti-fraud and credit risk prediction, so as to identify an adventure user. However, the conventional expert anti-fraud method is high in rule making cost when the credit anti-fraud prediction is carried out, and the method for establishing the anti-fraud and credit risk models has certain hysteresis quality and poor interpretability when the credit anti-fraud and credit risk prediction is carried out.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a method, a device, a storage medium and an electronic device for identifying a risk user, which are used for at least solving the technical problems of high rule making cost and poor interpretability of the method for identifying the risk user in the prior art.
According to an aspect of the embodiments of the present invention, there is provided a method for identifying a risky user, including: acquiring target data; preprocessing the target data to obtain a preprocessing result, wherein the preprocessing result comprises entity information and attribute information of a target user and relationship information between the target user and an associated user; constructing an association network based on the preprocessing result, wherein the vertex of the association network is a target entity associated with a target user, and the edge of the association network is the relation between the target entities; performing feature extraction processing according to the associated network to obtain a first user feature, a second user feature and a third user feature, wherein the first user feature is used for determining group borrowing application data corresponding to a target user, the second user feature is used for determining overdue borrowing data corresponding to the associated user, and the third user feature is used for determining deep learning features corresponding to the associated user; and performing risk identification processing on the first user characteristic, the second user characteristic and the third user characteristic by using the first model to obtain a risk identification result corresponding to the target user, wherein the risk identification result is used for formulating a target risk strategy.
According to another aspect of the embodiments of the present invention, there is also provided an apparatus for identifying an at-risk user, including: the acquisition module is used for acquiring target data; the first processing module is used for preprocessing the target data to obtain a preprocessing result, wherein the preprocessing result comprises entity information and attribute information of a target user and relationship information between the target user and an associated user; the construction module is used for constructing an association network based on the preprocessing result, wherein the vertex of the association network is a target entity associated with a target user, and the edge of the association network is the relation between the target entities; the second processing module is used for carrying out feature extraction processing according to the associated network to obtain a first user feature, a second user feature and a third user feature, wherein the first user feature is used for determining group loan application data corresponding to a target user, the second user feature is used for determining overdue loan data corresponding to the associated user, and the third user feature is used for determining deep learning features corresponding to the associated user; and the third processing module is used for carrying out risk identification processing on the first user characteristic, the second user characteristic and the third user characteristic by using the first model to obtain a risk identification result corresponding to the target user, wherein the risk identification result is used for formulating a target risk strategy.
According to another aspect of the embodiments of the present invention, there is also provided a non-volatile storage medium, wherein the storage medium stores a computer program, and the computer program is configured to execute the method for identifying a risky user according to the present invention when the computer program is executed.
According to another aspect of the embodiments of the present invention, there is also provided a processor, which is characterized in that the processor is configured to execute a program, wherein the program is configured to execute the method for identifying a risky user according to the present invention.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for identifying an at-risk user according to the present invention.
In the embodiment of the invention, target data is preprocessed to obtain a preprocessing result, wherein the preprocessing result comprises entity information and attribute information of a target user and relationship information between the target user and an associated user, then an associated network is constructed based on the preprocessing result, the vertex of the associated network is a target entity associated with the target user, the edge of the associated network is a relationship between the target entities, feature extraction processing is carried out according to the associated network to obtain a first user feature, a second user feature and a third user feature, wherein the first user feature is used for determining group loan application data corresponding to the target user, the second user feature is used for determining overdue loan data corresponding to the associated user, the third user feature is used for determining deep learning feature corresponding to the associated user, and finally the first user feature is utilized for determining the first user feature, The second user characteristic and the third user characteristic are subjected to risk identification processing to obtain a risk identification result corresponding to the target user, so that a target risk strategy can be formulated according to the risk identification result, the aim of fully utilizing information of the target user and the associated user to carry out risk identification is fulfilled, the technical effects of reducing rule formulation cost of a method for identifying the risk user and improving interpretability of the method for identifying the risk user are achieved, and the technical problems that the rule formulation cost is high and the interpretability is poor in the method for identifying the risk user in the prior art are solved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:
fig. 1 is a flowchart of a method for identifying an at-risk user according to an embodiment of the present invention;
fig. 2 is a diagram illustrating an alternative heterogeneous network architecture with multi-factor association according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an alternative transcription factor correlation network computing framework provided in accordance with an embodiment of the present invention;
fig. 4 is a schematic diagram of a data flow direction of a method for identifying an at-risk user according to an embodiment of the present invention;
fig. 5 is a block diagram of a device for identifying a risky user according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the good custom of the public order.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In accordance with an embodiment of the present invention, there is provided a method of identifying at risk users, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of a method for identifying an at-risk user according to an embodiment of the present invention. As shown in fig. 1, the method may include the steps of:
in step S11, target data is acquired.
The target data can be personal information authorized by the user for credit approval, emergency contact information, historical credit records such as credit investigation and the like, behavior information left on the credit borrowing application software, internet behaviors and the like, and can also be information such as credit platform collection behavior records, fraud lists, risk lists, transaction data and the like of the user. In addition, the target data needs to be applied after data desensitization according to the requirements of compliance and relevant national departments, wherein the data desensitization can be a technology for providing false data instead of real data for users to prevent abuse of sensitive data.
And step S12, preprocessing the target data to obtain a preprocessing result, wherein the preprocessing result comprises entity information and attribute information of the target user, and relationship information between the target user and the associated user.
The preprocessing refers to some processing performed on data before main processing, and may include data cleaning, data integration, data transformation, data reduction, and the like, wherein the data cleaning may be processing on missing data. The entity information can be the mobile phone number of the user, the information of the computer terminal equipment, the network connection state (such as a wireless local area network) and the like. The attribute information may be information such as the number of times the user applies for loan, the amount of the loan, a history of the application, and the age of the user. The associated user may be a relative, friend, neighbor, etc. of the target user that is in contact with the target user.
For example, a lending system counts and summarizes user information for applying for borrowing in this year and arranges the user information into a table, where the user information may include gender, age, mobile phone number, borrowing amount, time for initiating a borrowing application, and the like of a user, and the table content may include, in addition to the user information, an average age of the user, an average value of the borrowing amount of a male user in this year, an average value of the borrowing amount of a female user in this year, and the like. The borrowing and lending system can automatically generate a table through a corresponding program, when the table is found to lack of borrowing amount data of part of users, missing data can be processed through a similar interpolation method, namely the average value of the borrowing amount of the male users in the current year and the average value of the borrowing amount of the female users in the current year can be respectively calculated, then the average value of the borrowing amount of the male users in the current year replaces the borrowing amount data of the missing male users, the average value of the borrowing amount of the female users in the current year replaces the borrowing amount data of the missing female users, and therefore the missing data processing is completed.
And step S13, constructing an association network based on the preprocessing result, wherein the top point of the association network is a target entity associated with the target user, and the edge of the association network is the relationship between the target entities.
The association network can use a multi-factor association heterogeneous network constructed by a distributed graph processing framework (such as Spark graph X), wherein Spark is an open source cluster computing environment, and graph X is a computing engine, provides a strong computing interface, and can conveniently process complex business logic. In addition, GraphX is a component of Spark for graph and graph computation.
And step S14, performing feature extraction processing according to the associated network to obtain a first user feature, a second user feature and a third user feature, wherein the first user feature is used for determining group loan application data corresponding to the target user, the second user feature is used for determining overdue loan data corresponding to the associated user, and the third user feature is used for determining deep learning features corresponding to the associated user.
The feature extraction refers to a method and a process for reducing the data dimension of some original input or recombining the original features for subsequent use.
And step S15, performing risk identification processing on the first user characteristic, the second user characteristic and the third user characteristic by using the first model to obtain a risk identification result corresponding to the target user, wherein the risk identification result is used for formulating a target risk strategy.
The risk identification can be a process of identifying real and potential risk properties in a judgment and classification manner.
According to the present invention from the above-mentioned step S11 to step S15, the target data is first preprocessed to obtain a preprocessing result, and then an associated network is constructed based on the preprocessing result, secondly, feature extraction processing is carried out according to the associated network to obtain a first user feature, a second user feature and a third user feature, so that the first model is used for carrying out risk identification processing on the first user characteristic, the second user characteristic and the third user characteristic to obtain a risk identification result corresponding to the target user, therefore, the aim of identifying risks by fully utilizing the information of the target user and the associated user is fulfilled, thereby realizing the technical effects of reducing the rule making cost of the method for identifying the risk users and improving the interpretability of the method for identifying the risk users, further, the technical problems of high rule making cost and poor interpretability of the method for identifying the risk users in the prior art are solved.
The above-described method of this embodiment is further described below.
As an alternative implementation, in step S14, performing the feature extraction process according to the association network to obtain the first user feature may include the following method steps:
step S141, a target algorithm is adopted to perform identification processing on the associated network to obtain a group identification result, wherein the group identification result comprises a group associated with a target user;
and step S142, performing statistical feature extraction processing on the group identification result to obtain a first user feature.
Fig. 2 is a diagram of an alternative heterogeneous network structure with multi-factor association according to an embodiment of the present invention. As shown in fig. 2, the vertices in the figure may represent entities, where the entities may be mobile phone numbers, bank cards, identification numbers, devices, and the like of different users. The edges in the graph may represent relationships between vertices, such as: the device 1 used by the mobile phone number 1, the application of the mobile phone number 1 and the identification number 1 are associated, and the user using the mobile phone number 1 and the user using the mobile phone number 2 are contacts and the like, so that the multi-factor associated heterogeneous network is constructed based on the preprocessing result.
The first user characteristic is used for determining group borrowing application data corresponding to the target user, wherein the group borrowing application data can be group data such as the number of nodes of a group, borrowing times and the like. The target algorithm may be a louvian community discovery algorithm.
The louvian community discovery algorithm is a graph algorithm based on modularity. Specifically, the algorithm first takes each vertex in the graph as an independent community, the number of the communities is the same as that of the vertices, then sequentially combines each vertex with the adjacent vertex, calculates whether the modularity gain of the vertices is larger than 0, places the vertex into the community where the adjacent vertex is located if the modularity gain of the vertices is larger than 0, iterates until the community where all the vertices belong does not change, finally compresses all the vertices of each community into a node, converts the weight of the vertex in the community into the weight outside a new node, and converts the weight between communities into the weight of a new node edge until the algorithm is stable. In addition, the modularity can be used for quantitatively measuring the partition quality of the network community, and the closer the modularity value is to 1, the better the partition quality of the network community is. In addition, the graph is a data structure composed of a set of vertices and a set of relationships between the vertices.
In the process of extracting the features of the multi-factor associated heterogeneous network, a logian community discovery algorithm can be firstly adopted to identify the multi-factor associated heterogeneous network to obtain a group identification result associated with a target user, and then the features of the group identification result are extracted to obtain group data such as group node number, loan times and the like corresponding to the target user.
As an alternative implementation, in step S14, performing the feature extraction process according to the associated network to obtain the second user feature may include the following method steps:
and S143, performing feature extraction processing on the associated network by using a statistical algorithm to obtain second user features.
The second user characteristic is used for determining overdue borrowing application data corresponding to the associated user, wherein the overdue borrowing application data corresponding to the associated user can be data such as the number and proportion of overdue histories of one-degree neighbors. The statistical algorithm may be a K-nearest neighbor algorithm.
The K-nearest neighbor algorithm is a machine learning method that classifies according to the distance between different feature values. Specifically, the algorithm firstly needs to collect data to form a training sample set, each data in the sample set has a label (namely, each data in the sample set has a classification to which the data belongs), then new data without labels is input, each feature of the new data is compared with the feature corresponding to the data in the sample set, then the classification label of the data with the most similar features to the input new data in the sample set is extracted, finally K data with the most similar features to the input new data are selected in the training sample set, and the classification result with the most occurrence of the K data is used as the classification of the input new data. In addition, the value of K is generally less than 20.
In the process of extracting the features of the multi-factor associated heterogeneous network, a K neighbor algorithm can be adopted to extract the features of the multi-factor associated heterogeneous network, so that data such as the number and proportion of overdue histories of first-degree neighbors corresponding to associated users can be obtained.
As an alternative implementation, in step S14, performing the feature extraction process according to the association network to obtain the third user feature may include the following method steps:
step S144, acquiring first format data of the target user, wherein the first format data comprises node characteristics of the target user and node characteristics of the associated user;
step S145, performing feature extraction processing on the first format data by using a second model to obtain a third user feature, wherein the second model can be obtained by using a plurality of groups of data through machine learning training, and each group of data in the plurality of groups of data comprises: the node characteristics of the sample user and the node characteristics of the sample associated user.
The third user characteristic is used for determining the deep learning characteristic corresponding to the associated user. The first format data may be standard format data for the lender. The second model may be a deep learning method model such as a Graph Convolutional neural Network (GCN) model, a Graph Attention neural Network (GAT) model, and a Node2Vector model.
For example, in the process of feature extraction of a multi-factor associated heterogeneous network, a GCN model can be used to mine deep learning features of neighbors. Specifically, the node characteristics of the target user and the node characteristics of the associated user are obtained, and then the GCN model is used for carrying out feature extraction on the node characteristics of the target user and the node characteristics of the associated user, so that deep learning features corresponding to the associated user are obtained. In addition, the GCN is a multi-layer graph convolutional neural network, which is composed of one or more convolutional layers and a top fully-connected layer, and also includes associated weights and pooling layers. Each convolution layer in the GCN processes first-order neighborhood information, and then multi-order neighborhood information transmission is achieved by superposing a plurality of convolution layers.
As an optional implementation manner, in step S15, performing risk identification processing on the first user characteristic, the second user characteristic, and the third user characteristic by using the first model, and obtaining a risk identification result corresponding to the target user may include the following method steps:
step S151, sampling the associated user to obtain a sampling result;
step S152, performing feature aggregation processing on the first user feature, the second user feature and the third user feature based on a target derivation algorithm to obtain an aggregation result;
step S153, merging the sampling result and the aggregation result to obtain second format data;
and step S154, carrying out risk identification processing on the second format data by using the first model to obtain a risk identification result corresponding to the target user.
The sampling may be a process of drawing an individual or sample from the population. The target derivation algorithm may be a GraphSage inductive learning algorithm. The second format data may be model standard input format data formed by combining the user node feature data and the neighbor node data. The first model may be a traditional logistic regression model or an integrated tree model such as Extreme Gradient Boosting (XGBoost for short).
For example, feature aggregation processing is performed on the neighbor statistical features, the clustering features and the neighbor deep learning features through a GraphSage inductive learning algorithm. The method comprises the steps of firstly sampling the associated user nodes to obtain sampling results, then carrying out aggregation processing on the characteristics of the neighbor nodes of the associated users based on the sampling results, carrying out iteration updating until the aggregation information of each neighbor node is almost global information, terminating iteration to obtain aggregation information, and finally learning the neighbor nodes according to the aggregation information to obtain the aggregation results. In addition, the GraphSage inductive learning algorithm may be an implementation of GCN from the spatial domain.
Based on the steps S151 to S154, firstly, sampling is performed on the associated user characteristics to obtain a sampling result, then, a graphpage inductive learning algorithm is used to perform characteristic aggregation processing on the neighbor statistical characteristics, the clustering characteristics and the neighbor deep learning characteristics to obtain an aggregation result, secondly, the characteristics of personal information processing of the target user, the neighbor statistical characteristics, the clustering characteristics and the neighbor deep learning characteristics are combined to obtain target user node characteristic data and neighbor node characteristic data, and finally, an XGBoost model is used to perform risk identification on the target user node characteristic data and the neighbor node characteristic data, so that a risk identification result corresponding to the target user is obtained.
As an alternative embodiment, the method for identifying an at-risk user further comprises the following method steps:
step S16, acquiring a target group label;
and step S17, acquiring a target list from the associated network according to the target group label, wherein the target list is used for formulating a target risk strategy.
The target group label can be a black product, a malicious product, a fraud and the like, wherein the black product refers to an illegal behavior which takes the internet as a medium and a network technology as a main means and brings potential threats to the security of a computer information system and the management order of a network space. The target list may be a list of black producers, a list of malicious complaint agents, a list of fraudulent groups, etc. The target risk policy may be an anti-fraud policy.
In the process of making anti-fraud policy, group tags such as black products, malicious products, fraud and the like can be obtained first, and then a corresponding black product person list, a malicious complaint agent list, a fraud group list and the like are obtained from a multi-factor associated heterogeneous network according to the group tags, so that the requirement of anti-fraud scenarios is met.
FIG. 3 is a schematic diagram of an alternative transcription factor correlation network computing framework provided in accordance with an embodiment of the present invention. As shown in fig. 3, the computing framework may include an offline modeling part and an online service part, wherein the offline modeling part needs to insert and update each node, then generates corresponding node features, and finally calls the node features according to task requirements, while the online service part only needs to construct a sub-graph for a target user, then combines the features of personal information processing, neighbor statistical features, clustering features, and neighbor deep learning features of the target user to obtain target user node feature data and neighbor node feature data, and finally performs risk identification on the target user node feature data and the neighbor node feature data by using an XGBoost model, thereby predicting fraud and credit risks that may exist in the user.
Specifically, the offline modeling part may first acquire standard format data of the target user from the underlying graph database through a Spark computing environment, where the data includes target user node features and randomly sampled 1-degree and 2-degree neighbor node features, and then the standard format data of the target user is trained through a transcription factor correlation network computing framework, so as to output a corresponding model to be called according to task requirements. In addition, model modules such as GCN, GAT, N2V and the like for deep learning feature processing are embedded in the transcription factor correlation network computing framework.
Fig. 4 is a schematic diagram of a data flow direction of a method for identifying an at-risk user according to an embodiment of the present invention. As shown in fig. 4, the data flow may be that the data flow starts from source data such as personal information, financial data, non-financial data, and the like of a target user, then the source data is subjected to data preprocessing to obtain bottom layer data such as entity data, attribute data, and relationship data, then a relationship network is constructed based on the bottom layer data and the bottom layer data is stored, and finally a credit risk model and an anti-fraud model are established by mining risk characteristics of the user, so that the anti-fraud model is applied to the formulation of an anti-fraud policy, and the credit risk model is applied to the formulation of a credit granting policy, thereby meeting the requirements of the anti-fraud scenario and the credit scenario.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The present invention further provides a device for identifying a risky user, which is used to implement the above embodiments and preferred embodiments, and the description of the device is omitted here. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 5 is a block diagram of a risk identifying user device according to an embodiment of the present disclosure, and as shown in fig. 5, the risk identifying user device 500 includes: the system comprises an acquisition module 501, a first processing module 502, a construction module 503, a second processing module 504 and a third processing module 505.
An obtaining module 501, configured to obtain target data; a first processing module 502, configured to perform preprocessing on target data to obtain a preprocessing result, where the preprocessing result includes entity information and attribute information of a target user, and relationship information between the target user and an associated user; a constructing module 503, configured to construct an association network based on the preprocessing result, where a vertex of the association network is a target entity associated with the target user, and an edge of the association network is a relationship between the target entities; the second processing module 504 is configured to perform feature extraction processing according to the associated network to obtain a first user feature, a second user feature and a third user feature, where the first user feature is used to determine group loan application data corresponding to the target user, the second user feature is used to determine overdue loan data corresponding to the associated user, and the third user feature is used to determine a deep learning feature corresponding to the associated user; and a third processing module 505, configured to perform risk identification processing on the first user characteristic, the second user characteristic, and the third user characteristic by using the first model to obtain a risk identification result corresponding to the target user, where the risk identification result is used to formulate a target risk policy.
Optionally, the second processing module 504 is further configured to perform identification processing on the associated network by using a target algorithm to obtain a group identification result, where the group identification result includes a group associated with the target user; and performing statistical feature extraction processing on the group identification result to obtain a first user feature.
Optionally, the second processing module 504 is further configured to perform feature extraction processing on the associated network by using a statistical algorithm, so as to obtain a second user feature.
Optionally, the second processing module 504 is further configured to obtain first format data of the target user, where the first format data includes node characteristics of the target user and node characteristics of the associated user; utilize the second model to carry out feature extraction processing to first form data, obtain third user's feature, wherein, the second model can utilize multiunit data to obtain through machine learning training, and every group data in the multiunit data all include: the node characteristics of the sample user and the node characteristics of the sample associated user.
Optionally, the third processing module 505 is further configured to perform sampling processing on the associated user to obtain a sampling result; performing feature aggregation processing on the first user feature, the second user feature and the third user feature based on a target derivation algorithm to obtain an aggregation result; merging the sampling result and the aggregation result to obtain second format data; and carrying out risk identification processing on the second format data by using the first model to obtain a risk identification result corresponding to the target user.
Optionally, the obtaining module 501 is further configured to obtain a target group tag; and acquiring a target list from the associated network according to the target group label, wherein the target list is used for formulating a target risk strategy.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are located in different processors in any combination.
According to an embodiment of the present invention, there is also provided a non-volatile storage medium storing a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when executed.
Alternatively, in this embodiment, the above-mentioned nonvolatile storage medium may be configured to store a computer program for executing the steps of:
step S1, acquiring target data;
step S2, preprocessing the target data to obtain a preprocessing result, wherein the preprocessing result comprises entity information and attribute information of the target user, and relationship information between the target user and the associated user;
step S3, constructing an association network based on the preprocessing result, wherein the top point of the association network is a target entity associated with the target user, and the edge of the association network is the relationship between the target entities;
step S4, performing feature extraction processing according to the associated network to obtain a first user feature, a second user feature and a third user feature, wherein the first user feature is used for determining group loan application data corresponding to a target user, the second user feature is used for determining overdue loan data corresponding to the associated user, and the third user feature is used for determining deep learning features corresponding to the associated user;
and step S5, performing risk identification processing on the first user characteristic, the second user characteristic and the third user characteristic by using the first model to obtain a risk identification result corresponding to the target user, wherein the risk identification result is used for formulating a target risk strategy.
Optionally, in this embodiment, the nonvolatile storage medium may include but is not limited to: various media capable of storing computer programs, such as a U disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.
According to an embodiment of the present disclosure, there is also provided an electronic device including a memory having a computer program stored therein and at least one processor configured to execute the computer program to perform the steps in any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
step S1, acquiring target data;
step S2, preprocessing the target data to obtain a preprocessing result, wherein the preprocessing result comprises entity information and attribute information of the target user and relationship information between the target user and the associated user;
step S3, constructing an association network based on the preprocessing result, wherein the top point of the association network is a target entity associated with a target user, and the edge of the association network is the relationship between the target entities;
step S4, performing feature extraction processing according to the associated network to obtain a first user feature, a second user feature and a third user feature, wherein the first user feature is used for determining group loan application data corresponding to a target user, the second user feature is used for determining overdue loan data corresponding to the associated user, and the third user feature is used for determining deep learning features corresponding to the associated user;
and step S5, performing risk identification processing on the first user characteristic, the second user characteristic and the third user characteristic by using the first model to obtain a risk identification result corresponding to the target user, wherein the risk identification result is used for formulating a target risk strategy.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.
In the above embodiments of the present invention, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described in detail in a certain embodiment.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method of identifying an at-risk user, comprising:
acquiring target data;
preprocessing the target data to obtain a preprocessing result, wherein the preprocessing result comprises entity information and attribute information of a target user, and relationship information between the target user and an associated user;
constructing an association network based on the preprocessing result, wherein the vertex of the association network is a target entity associated with the target user, and the edge of the association network is the relation between the target entities;
performing feature extraction processing according to the associated network to obtain a first user feature, a second user feature and a third user feature, wherein the first user feature is used for determining group borrowing application data corresponding to the target user, the second user feature is used for determining overdue borrowing data corresponding to the associated user, and the third user feature is used for determining deep learning features corresponding to the associated user;
and performing risk identification processing on the first user characteristic, the second user characteristic and the third user characteristic by using a first model to obtain a risk identification result corresponding to the target user, wherein the risk identification result is used for formulating a target risk strategy.
2. The method of claim 1, wherein performing feature extraction processing according to the association network to obtain the first user feature comprises:
adopting a target algorithm to identify the association network to obtain a group identification result, wherein the group identification result comprises a group associated with the target user;
and performing statistical feature extraction processing on the group identification result to obtain the first user feature.
3. The method of claim 1, wherein performing feature extraction processing according to the association network to obtain the second user feature comprises:
and performing feature extraction processing on the associated network by using a statistical algorithm to obtain the second user feature.
4. The method of claim 1, wherein performing feature extraction processing according to the association network to obtain the third user feature comprises:
acquiring first format data of the target user, wherein the first format data comprises node characteristics of the target user and node characteristics of the associated user;
and performing feature extraction processing on the first format data by using a second model to obtain the third user features, wherein the second model can be obtained by using multiple groups of data through machine learning training, and each group of data in the multiple groups of data comprises: the node characteristics of the sample user and the node characteristics of the sample associated user.
5. The method for identifying a risky user according to claim 1, wherein the processing the first user characteristic, the second user characteristic, and the third user characteristic using the first model to obtain the risk identification result corresponding to the target user comprises:
sampling the associated user to obtain a sampling result;
performing feature aggregation processing on the first user feature, the second user feature and the third user feature based on a target derivation algorithm to obtain an aggregation result;
merging the sampling result and the aggregation result to obtain second format data;
and carrying out risk identification processing on the second format data by using the first model to obtain a risk identification result corresponding to the target user.
6. The method of identifying at-risk users of claim 1, further comprising:
obtaining a target group label;
and acquiring a target list from the associated network according to the target group label, wherein the target list is used for formulating the target risk strategy.
7. An apparatus for identifying an at-risk user, comprising:
the acquisition module is used for acquiring target data;
the first processing module is used for preprocessing the target data to obtain a preprocessing result, wherein the preprocessing result comprises entity information and attribute information of a target user and relationship information between the target user and an associated user;
a construction module, configured to construct an association network based on the preprocessing result, where a vertex of the association network is a target entity associated with the target user, and an edge of the association network is a relationship between the target entities;
the second processing module is used for performing feature extraction processing according to the associated network to obtain a first user feature, a second user feature and a third user feature, wherein the first user feature is used for determining group borrowing application data corresponding to the target user, the second user feature is used for determining overdue borrowing data corresponding to the associated user, and the third user feature is used for determining deep learning features corresponding to the associated user;
and the third processing module is used for carrying out risk identification processing on the first user characteristic, the second user characteristic and the third user characteristic by utilizing a first model to obtain a risk identification result corresponding to the target user, wherein the risk identification result is used for formulating a target risk strategy.
8. A non-volatile storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of identifying an at-risk user of any of claims 1 to 6 when executed.
9. A processor for running a program, wherein the program is arranged to perform the method of identifying at risk users of any of claims 1 to 6 when running.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method of identifying a risky user as claimed in any one of claims 1 to 6.
CN202210336071.3A 2022-03-31 2022-03-31 Method, device, storage medium and electronic device for identifying risky users Pending CN114782161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210336071.3A CN114782161A (en) 2022-03-31 2022-03-31 Method, device, storage medium and electronic device for identifying risky users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210336071.3A CN114782161A (en) 2022-03-31 2022-03-31 Method, device, storage medium and electronic device for identifying risky users

Publications (1)

Publication Number Publication Date
CN114782161A true CN114782161A (en) 2022-07-22

Family

ID=82427120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210336071.3A Pending CN114782161A (en) 2022-03-31 2022-03-31 Method, device, storage medium and electronic device for identifying risky users

Country Status (1)

Country Link
CN (1) CN114782161A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203487A (en) * 2022-09-15 2022-10-18 深圳市洞见智慧科技有限公司 Data processing method based on multi-party security graph and related device
CN115423542A (en) * 2022-11-07 2022-12-02 中邮消费金融有限公司 Old belt new activity anti-fraud identification method and system
CN116205664A (en) * 2023-04-28 2023-06-02 成都新希望金融信息有限公司 Intermediary fraud identification method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203487A (en) * 2022-09-15 2022-10-18 深圳市洞见智慧科技有限公司 Data processing method based on multi-party security graph and related device
CN115203487B (en) * 2022-09-15 2022-12-20 深圳市洞见智慧科技有限公司 Data processing method based on multi-party security graph and related device
CN115423542A (en) * 2022-11-07 2022-12-02 中邮消费金融有限公司 Old belt new activity anti-fraud identification method and system
CN116205664A (en) * 2023-04-28 2023-06-02 成都新希望金融信息有限公司 Intermediary fraud identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110009174B (en) Risk recognition model training method and device and server
CN110188198B (en) Anti-fraud method and device based on knowledge graph
US10896421B2 (en) Smart retail analytics and commercial messaging
CN114782161A (en) Method, device, storage medium and electronic device for identifying risky users
CN110827138B (en) Push information determining method and device
CN112989059A (en) Method and device for identifying potential customer, equipment and readable computer storage medium
CN112580952A (en) User behavior risk prediction method and device, electronic equipment and storage medium
CN111639690A (en) Fraud analysis method, system, medium, and apparatus based on relational graph learning
CN112669143A (en) Risk assessment method, device and equipment based on associated network and storage medium
CN111428217A (en) Method and device for identifying cheat group, electronic equipment and computer readable storage medium
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN115375480A (en) Abnormal virtual coin wallet address detection method based on graph neural network
CN112487284A (en) Bank customer portrait generation method, equipment, storage medium and device
CN112990989B (en) Value prediction model input data generation method, device, equipment and medium
CN113408627A (en) Target object determination method and device and server
CN111245815B (en) Data processing method and device, storage medium and electronic equipment
Zhao et al. Network-based feature extraction method for fraud detection via label propagation
CN115358878A (en) Financing user risk preference level analysis method and device
CN116249987A (en) Graph-based learning system with update vectors
CN110570301B (en) Risk identification method, device, equipment and medium
CN113902576A (en) Deep learning-based information pushing method and device, electronic equipment and medium
CN112116358A (en) Transaction fraud prediction method and device and electronic equipment
CN113706258A (en) Product recommendation method, device, equipment and storage medium based on combined model
CN112597390A (en) Block chain big data processing method based on digital finance and big data server
Yang et al. Automatic Feature Engineering‐Based Optimization Method for Car Loan Fraud Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination