CN114596111A - Risk identification model generation method, device, equipment and storage medium - Google Patents

Risk identification model generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN114596111A
CN114596111A CN202210211156.9A CN202210211156A CN114596111A CN 114596111 A CN114596111 A CN 114596111A CN 202210211156 A CN202210211156 A CN 202210211156A CN 114596111 A CN114596111 A CN 114596111A
Authority
CN
China
Prior art keywords
sample
account
identification model
risk identification
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210211156.9A
Other languages
Chinese (zh)
Inventor
李玉柱
史彬
凌国沈
史何富
田舟贤
黎勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Geely Holding Group Co Ltd
Hangzhou Youxing Technology Co Ltd
Original Assignee
Zhejiang Geely Holding Group Co Ltd
Hangzhou Youxing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Geely Holding Group Co Ltd, Hangzhou Youxing Technology Co Ltd filed Critical Zhejiang Geely Holding Group Co Ltd
Priority to CN202210211156.9A priority Critical patent/CN114596111A/en
Publication of CN114596111A publication Critical patent/CN114596111A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0225Avoiding frauds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a risk identification model generation method, a risk identification model generation device, risk identification model generation equipment and a storage medium. The method comprises the steps of firstly, collecting a sample set, wherein the sample set comprises historical orders of a plurality of sample accounts within historical duration, enabling the sample accounts to become registered accounts through an invitation way, then determining a sample feature set according to the sample set, wherein the sample feature set comprises individual sample feature data and group sample feature data corresponding to each sample account, then generating a risk identification model according to the sample feature set, and identifying whether an order placing account of a target order has a false invitation relation by utilizing the risk identification model. Therefore, a risk identification model is constructed, so that whether the account with the false invitation relation exists is identified by utilizing the constructed risk identification model, the identification efficiency of the false invitation relation is improved, the labor cost is reduced, and the identification coverage rate of the false invitation relation in a complex scene can be improved.

Description

Risk identification model generation method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a risk identification model.
Background
With the rapid development of the internet technology and the mobile communication technology, more and more users select the car booking for going out, so that the competition of the car booking industry is increasingly severe. In order to acquire more new users, passengers invite passengers as an activity mode for acquiring new users, and the abundant invitation reward causes the phenomenon that the platform reward is cheated by a large number of false invitations. Therefore, it is very important to hit the false invitation relation in the network car appointment scene.
At present, the identification of the false invitation relation is usually carried out by manual judgment or by making a single rule according to historical experience so as to judge whether the invitation relation really exists. For example, if the inviter and the invitee use the same mobile phone number, the same device number, and the same payment account, it is usually determined as a false invitation; or the same inviter invites a large number of users continuously in a certain time period, and the registration time, login location, login IP and the like of the users have strong aggregations, so that the invitation relationship of the inviter is probably false invitation.
However, the efficiency of manually checking the false invitation relationship is low, the cost is high, the recognition means formed by formulating a single rule by historical experience has a single scene and poor generalization capability, only a few false invitations can be recognized, and the false invitation under the complex scene is difficult to be effectively covered by the discrimination by formulating the single rule or manually checking.
Disclosure of Invention
The application provides a risk identification model generation method, a risk identification model generation device, risk identification model generation equipment and a risk identification model storage medium, which are used for constructing a risk identification model so as to solve the problem in the prior art that a false invitation relation is identified by formulating a single rule or a manual review mode.
In a first aspect, the present application provides a risk identification model generation method, including:
collecting a sample set, wherein the sample set comprises historical orders of a plurality of sample accounts in a historical duration, and the sample accounts are registered accounts of an invited user;
determining a sample feature set according to the sample set, wherein the sample feature set comprises individual sample feature data and group sample feature data corresponding to each sample account;
and generating a risk identification model according to the sample feature set, wherein the risk identification model is used for identifying whether a false invitation relation exists in an order placing account of the target order.
In one possible design, the determining a set of sample features from the set of samples includes:
and determining the individual sample characteristic data and the group sample characteristic data corresponding to each sample account according to the historical order of each sample account.
In one possible design, the individual sample characteristic data for each sample account includes at least: whether the first login city is inconsistent with the first car city, the distance between the car taking position and the order starting position, the interval between the passenger payment time and the driver settlement time, the order discount amount and the order total amount;
the group sample characteristic data of each sample account at least comprises: the total account number invited by the invitation account of the sample account, the total account number for completing payment invited by the invitation account of the sample account, and the invited average data corresponding to the sample account;
the invited mean value data comprises a mean value of time intervals between the single-forming time and the invited time of each invited account, a mean value of the actual payment amount of each invited account, a mean value of the riding mileage of each invited account and the number of payment accounts corresponding to each invited account, wherein each invited account refers to each registered account invited by the inviting account of the sample account.
In one possible design, the generating a risk identification model from the sample feature set includes:
dividing the sample feature set corresponding to each sample account into a training feature set and a verification feature set according to a preset proportion;
training a preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the training characteristic set;
evaluating an output result of the trained preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the verification characteristic set;
and repeating the steps until the evaluation result meets a preset threshold value, finishing the training, and determining the preset learning model after finishing the training as the risk identification model.
In one possible design, the evaluation result includes a target precision rate and/or a target recall rate;
the target accuracy rate and the target recall rate are respectively the accuracy rate and the recall rate for obtaining a target output result, and the target output result is an output result of the trained preset learning model on a sample account number of the risk marker.
In one possible design, before the determining the set of sample features from the set of samples, the method further includes:
and marking the sample accounts according to a preset risk index, wherein the sample accounts with the false invitation relation obtain the risk mark, and the sample accounts without the false invitation relation obtain a non-risk mark.
In one possible design, the risk identification model identifies whether the order placing account of the target order has the false invitation relationship, including:
acquiring a target order of a placing account;
determining target characteristic data of the order-placing account according to the target order, wherein the target characteristic data comprises individual target characteristic data and group target characteristic data;
and determining whether the target order is a false order according to the target characteristic data and the risk identification model, and if so, determining that the order placing account number has the false invitation relation.
In one possible design, the preset learning model is any one of a logistic regression model, an extreme gradient boosting model, and a gradient boosting decision tree.
In a second aspect, the present application provides a risk identification model generation apparatus, including:
the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring a sample set, the sample set comprises historical orders of a plurality of sample accounts in a historical duration, and the sample accounts are registered accounts of an invited user;
the characteristic extraction module is used for determining a sample characteristic set according to the sample set, wherein the sample characteristic set comprises individual sample characteristic data and group sample characteristic data corresponding to each sample account;
and the generating module is used for generating a risk identification model according to the sample feature set, and the risk identification model is used for identifying whether the order placing account of the target order has a false invitation relation.
In one possible design, the feature extraction module is specifically configured to:
and determining the individual sample characteristic data and the group sample characteristic data corresponding to each sample account according to the historical order of each sample account.
In one possible design, the individual sample characteristic data for each sample account includes at least: whether the first login city is inconsistent with the first car using city, the distance between the car taking position and the order starting position, the interval between the passenger payment time and the driver settlement time, the order discount amount and the order total amount are determined;
the group sample characteristic data of each sample account at least comprises: the total account number invited by the invitation account of the sample account, the total account number of payment completion invited by the invitation account of the sample account and the invited mean value data corresponding to the sample account;
the invited mean value data comprises a mean value of time intervals between the single-forming time and the invited time of each invited account, a mean value of the actual payment amount of each invited account, a mean value of the riding mileage of each invited account and the number of payment accounts corresponding to each invited account, wherein each invited account refers to each registered account invited by the inviting account of the sample account.
In one possible design, the generating module is specifically configured to:
dividing the sample feature set corresponding to each sample account into a training feature set and a verification feature set according to a preset proportion;
training a preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the training characteristic set;
evaluating an output result of the trained preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the verification characteristic set;
and repeating the steps until the evaluation result meets a preset threshold value, finishing the training, and determining the preset learning model after finishing the training as the risk identification model.
In one possible design, the evaluation result includes a target precision rate and/or a target recall rate;
the target accuracy rate and the target recall rate are respectively the accuracy rate and the recall rate for obtaining a target output result, and the target output result is an output result of the trained preset learning model on a sample account number of the risk marker.
In one possible design, the risk identification model generation apparatus further includes: a marking module; the marking module is configured to:
and marking the sample accounts according to a preset risk index, wherein the sample accounts with the false invitation relation obtain the risk mark, and the sample accounts without the false invitation relation obtain a non-risk mark.
In one possible design, the risk identification model generation apparatus further includes: an identification module; the identification module is configured to:
acquiring a target order of a placing account;
determining target characteristic data of the order placing account according to the target order, wherein the target characteristic data comprises individual target characteristic data and group target characteristic data;
and determining whether the target order is a false order according to the target characteristic data and the risk identification model, and if so, determining that the order placing account number has the false invitation relation.
In one possible design, the preset learning model is any one of a logistic regression model, an extreme gradient boosting model, and a gradient boosting decision tree.
In a third aspect, the present application provides an electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement any one of the possible risk identification model generation methods as provided by the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing any one of the possible risk identification model generation methods as provided in the first aspect when executed by a processor.
In a fifth aspect, the present application provides a computer program product comprising computer executable instructions for implementing any one of the possible risk identification model generation methods provided in the first aspect when executed by a processor.
The application provides a risk identification model generation method, a risk identification model generation device, risk identification model generation equipment and a storage medium. The method comprises the steps of firstly collecting a sample set, wherein the sample set comprises historical orders of a plurality of sample accounts in historical duration, the sample accounts become registration accounts through an invitation way, then determining a sample feature set according to the sample set, the sample feature set comprises individual sample feature data and group sample feature data corresponding to each sample account, then generating a risk identification model according to the sample feature set, and identifying whether an order placing account of a target order has a false invitation relation by using the risk identification model. Therefore, a risk identification model is constructed, so that whether the account with the false invitation relation exists is identified by utilizing the constructed risk identification model, the identification efficiency of the false invitation relation is improved, the labor cost is reduced, and the identification coverage rate of the false invitation relation in a complex scene can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a risk identification model generation method according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart illustrating another risk identification model generation method according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of another risk identification model generation method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a risk identification model generation apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of another risk identification model generation apparatus provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of another risk identification model generation apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
At present, the identification of the false invitation relation is usually carried out by manual judgment or by making a single rule according to historical experience so as to judge whether the invitation relation really exists. For example, if the inviter and the invitee use the same mobile phone number, the same device number, and the same payment account, it is usually determined as a false invitation; or the same inviter invites a large number of users continuously in a certain time period, and the registration time, login location, login IP and the like of the users have strong aggregations, so that the invitation relationship of the inviter is probably false invitation. However, the efficiency of manually checking the false invitation relationship is low, the cost is high, the recognition means formed by formulating a single rule by historical experience has a single scene and poor generalization capability, only a few false invitations can be recognized, and the false invitation under the complex scene is difficult to be effectively covered by the discrimination by formulating the single rule or manually checking.
In view of the above problems in the prior art, the present application provides a method, an apparatus, a device and a storage medium for generating a risk identification model. The inventive concept of the risk identification model generation method provided by the application is as follows: the individual sample characteristic data and the group sample characteristic data corresponding to each sample account are described on the basis of a historical order of a plurality of sample accounts in a historical duration to form a sample characteristic set, a preset learning model such as a machine learning algorithm is trained by using the sample characteristic set, and a risk identification model is constructed, so that whether a false invitation relation exists in an order-placing account of a target order or not is identified by using the risk identification model, the identification efficiency of the false invitation relation can be improved, the manual auditing cost is reduced, the identification coverage rate of the false invitation in a complex scene can be improved, and the wind control capability of a network reservation platform is enhanced.
An exemplary application scenario of the embodiments of the present application is described below.
Fig. 1 is a schematic view of an application scenario provided by an embodiment of the present application, as shown in fig. 1, in order to obtain more additional users, a passenger invites the passenger to register as a new user of the online car booking platform 10, and the online car booking platform 10 provides a rich invitation reward for the new user. In view of this, the user account in the networked appointment platform 10 may have a false invitation relationship, and therefore, the false invitation relationship needs to be recognized to enhance the wind control capability of the networked appointment platform 10.
The electronic device 20 may be configured to execute the risk identification model generation method provided in the embodiment of the present application, and construct the risk identification model based on the historical orders in the network appointment platform 10, so as to effectively identify whether a false invitation relationship exists in an order placing account, that is, a user account, in the network appointment platform 10.
It should be noted that the network car booking platform 10 is provided by a network car booking developer, and the specific contents of the network car booking platform 10 in the embodiment of the present application are not limited. The electronic device 20 may be a smart phone, a computer, a server, or a server cluster, and the type of the electronic device is not limited in this embodiment. The electronic device 20 in fig. 1 is illustrated as a computer.
It should be noted that the above application scenarios are only illustrative, for example, the online booking platform 10 may also be any other registration platform that adopts an invitation reward to become a new user, and the risk identification model generation method, apparatus, device and storage medium provided in the embodiments of the present application include, but are not limited to, the above application scenarios.
Fig. 2 is a schematic flow chart of a risk identification model generation method according to an embodiment of the present application.
As shown in fig. 2, the risk identification model generation method provided in the embodiment of the present application includes:
s101: a sample set is collected.
The sample set comprises historical orders of a plurality of sample accounts in historical duration, and the sample accounts are registered accounts of invited users.
The sample account is a registered account which is an invited object in a manner that the user invites the user, in other words, the sample account is a registered account of the invited user. And collecting historical orders of a plurality of sample accounts within the historical duration to form a sample set.
Optionally, the data in the historical order includes, but is not limited to, historical login time of the user, login location, order calling time, location of the user when the order is called, estimated order mileage, estimated order amount, order payment time, actual order mileage, actual order payment amount, account number for order payment, and the like. It is understood that the user refers to a passenger getting off an order using the network car booking platform.
S102: a set of sample features is determined from the set of samples.
The sample feature set comprises individual sample feature data and group sample feature data corresponding to each sample account.
And performing feature extraction based on the sample set to determine a sample feature set, wherein the sample feature set comprises individual sample feature data and group sample feature data corresponding to each sample account. The individual sample characteristic data can be understood as the characteristic of describing the ordering condition of the sample account, and the group sample characteristic data can be understood as the characteristic of describing the ordering condition of each registered account in the inviting group to which the inviting user of the sample account belongs.
Optionally, feature extraction is performed on data in the historical order of each sample account to obtain individual sample feature data and group sample feature data corresponding to each sample account.
For example, the individual sample characteristic data for each sample account includes, but is not limited to, whether the first-time city of log-in is inconsistent with the city of car-in, the distance between the location of the car-in and the location of the start of the order, the interval between the time of payment of the passenger and the time of settlement of the driver, the amount of discount of the order, and the total amount of the order.
The population sample characteristic data for each sample account includes, but is not limited to: the total account number invited by the invitation account of the sample account, the total account number for payment completion invited by the invitation account of the sample account, and the invited average data corresponding to the sample account. Optionally, the invited average data may include, but is not limited to, an average value of time intervals between the singleout time and the invited time of each invited account, an average value of the payment amount of each invited account, an average value of the mileage of each invited account, and the number of payment accounts corresponding to each invited account. Each invited account refers to each registered account invited by the inviting account for inviting the sample account to become the registered account, and the registered accounts and the sample account belong to an inviting group of the inviting account.
The individual sample characteristic data and the group sample characteristic data of the sample account with the false invitation relationship have corresponding false characteristics, for example, the first login city is inconsistent with the car city, the distance between the car taking position and the order starting position is long, the total number of accounts invited by the invitation account of the sample account is large, such as exceeding a preset number, the average value of the actual payment amount of each invited account is very small, and the historical order with the false characteristics is a false order.
It will be appreciated that the invited account is a registered account of the invited user, and the inviting account is a registered account of the inviting user.
The feature extraction process for determining the sample feature set according to the sample set can be understood as a process for sorting and counting data in the historical order of each sample account, and the specific manner for sorting and counting is not limited in the embodiment of the present application.
S103: and generating a risk identification model according to the sample characteristic set.
The risk identification model is used for identifying whether a false invitation relation exists in an order placing account of a target order.
For example, a machine learning algorithm is trained by utilizing the sample characteristic set, a machine learning model which is trained to meet the preset requirement is determined as a risk identification model, the construction of the risk identification model is completed, and then the risk identification model is utilized to identify whether the order placing account number of the target order has a false invitation relation or not.
The machine learning algorithm is defined as a preset learning model, which may be any one of various models such as a logistic regression model, an extreme gradient boost model (XGBoost), a Gradient Boost Decision Tree (GBDT), a decision tree, and the like, and is not limited in this embodiment.
According to the risk identification model generation method provided by the embodiment of the application, individual sample characteristic data and group sample characteristic data corresponding to each sample account are described on the basis of historical orders of a plurality of sample accounts in historical duration, so that a sample characteristic set is formed. And then training a preset learning model such as a machine learning algorithm by using the sample feature set, and constructing a risk identification model, so as to identify whether the order placing account of the target order has a false invitation relation or not by using the risk identification model. The risk identification model is used for directly identifying the false invitation relation, so that the identification efficiency can be obviously improved and the labor cost can be reduced compared with the manual examination in the prior art. The risk identification model is obtained by performing machine learning training on a historical order generated by a registered account of an invited user, and compared with the identification means formed by formulating a single rule by historical experience in the prior art, the risk identification model widens the available scenes of the identification means, so that the identification coverage rate of false invitation in a complex scene can be improved, and the wind control capability of the network appointment platform is further enhanced.
Fig. 3 is a schematic flowchart of another risk identification model generation method according to an embodiment of the present application. As shown in fig. 3, the risk identification model generation method provided in the embodiment of the present application includes:
s201: a sample set is collected.
The sample set comprises historical orders of a plurality of sample accounts in historical duration, and the sample accounts are registered accounts of invited users.
The implementation manner, principle and technical effect of step S201 are similar to those of step S101, and the detailed content can be described with reference to the foregoing embodiments, and will not be described herein again.
S202: and marking the sample accounts according to a preset risk index.
And obtaining a risk mark by the sample account with the false invitation relation, and obtaining a non-risk mark by the sample account without the false invitation relation.
The preset risk index can be a corresponding index which is artificially set and is judged to be a false invitation relation, a plurality of sample account numbers are marked according to the preset risk index, if the sample account numbers accord with the preset risk index, the sample account numbers are judged to have the false invitation relation, and the sample account numbers obtain risk marks. Otherwise, if the sample account does not meet the preset risk index, the sample account is determined to have no false invitation relation, and the sample account obtains a non-risk mark.
For example, a column of data is added to a sample set formed by historical orders of a plurality of sample accounts, the column of data is used for representing whether the sample accounts obtain risk marks or non-risk marks, and if the risk marks are represented by "1" and the non-risk marks are represented by "0", the column of data of the sample accounts with the false invitation relation is determined to be "1" according to a preset risk index, and the column of data of the sample accounts without the false invitation relation is determined to be "0" according to the preset risk index.
It should be noted that, the embodiment of the present application does not limit the specific content of the preset risk indicator that is artificially created.
S203: a set of sample features is determined from the set of samples.
The sample feature set comprises individual sample feature data and group sample feature data corresponding to each sample account.
The implementation manner, principle and technical effect of step S203 are similar to those of step S102, and the details can be described with reference to the foregoing embodiment, which are not repeated herein.
S204: and dividing the sample feature set corresponding to each sample account into a training feature set and a verification feature set according to a preset proportion.
S205: and training the preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the training characteristic set.
S206: and evaluating the output result of the trained preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the verification characteristic set.
S207: and repeating the steps until the evaluation result meets a preset threshold value, finishing the training, and determining the preset learning model after finishing the training as a risk identification model.
The sample feature sets corresponding to the sample account numbers are divided into a training feature set and a verification feature set according to a preset proportion, for example, 90% of the sample feature sets are used as the training feature sets, 10% of the sample feature sets are used as the verification feature sets, and the preset proportion is 9: 1. Specific values of a preset proportion can be set according to actual conditions in actual working conditions to divide the sample feature set, and the sample feature set is not limited in the embodiment of the application.
Training a preset learning model by using the individual sample characteristic data and the group sample characteristic data which are included in the training characteristic set, and simultaneously evaluating the training effect of the trained preset learning model by using the individual sample characteristic data and the group sample characteristic data which are included in the verification characteristic set, for example, taking the individual sample characteristic data and the group sample characteristic data which are included in the verification characteristic set as the input of the trained preset learning model, correspondingly obtaining an output result, wherein the output result is a sample account number which is divided into the verification characteristic set by the pair of the trained preset learning model, identifying the false invitation relation, and then evaluating the output result, wherein the training process and the evaluation process are repeatedly carried out until the training effect reaches the preset effect when the evaluation result meets a preset threshold value, the training is finished, so that the preset learning model after the training is finished is determined as the risk identification model.
It can be understood that the process of training and evaluating the preset learning model is a process of optimizing parameters for constructing the preset learning model, when the evaluation result meets a preset threshold value, the optimization is shown to achieve a preset effect, the preset learning model with optimized corresponding parameters is determined as a risk identification model, and the construction of the risk identification model is completed.
In one possible design, the evaluation results may include a target accuracy rate and/or a target recall rate. The target accuracy rate and the target recall rate are respectively the accuracy rate and the recall rate for obtaining a target output result, and the target output result is an output result of the trained preset learning model for the risk marked sample account.
As described above, the risk label is represented by "1", and the target output result is an output result output by inputting the individual sample feature data and the group sample feature data corresponding to the sample account labeled by "1" in the verification feature set to the trained preset learning model. And calculating the accuracy and the recall rate of the target output result to respectively determine the accuracy and the recall rate of the target output result as the target accuracy and the target recall rate, and comparing the target accuracy and/or the target recall rate with a preset threshold value by taking the target accuracy and/or the target recall rate as an evaluation result.
Alternatively, the calculation formulas of the target accuracy rate and the target recall rate may be, for example, as shown in the following formulas (1) and (2):
target accuracy ratio TP/(TP + FP) (1)
Target recall ratio TP/(TP + FN) (2)
Wherein, TP represents the sample account numbers marked as '1' in the verification feature set, and the output result is the number of the sample account numbers with false invitation relation, and the output result is represented by '1';
FP represents sample account numbers marked as '0' in the verification feature set, the output result is the number of the sample account numbers with false invitation relation, and the output result is represented by '1';
FN represents the number of sample accounts marked as "1" in the verification feature set, but the output result is the number of these sample accounts without false invitation relationship, and the output result is represented by "0".
The evaluation result satisfying the preset threshold may be understood as determining whether the evaluation result reaches the preset threshold, and a specific value of the preset threshold may be set according to an actual working condition, which is not limited in the embodiment of the present application.
Alternatively, the evaluation result may also be represented by an index such as a receiver operating characteristic curve (ROC curve), F1 score, and the like, which is not limited in this embodiment of the present application.
According to the risk identification model generation method provided by the embodiment of the application, individual sample characteristic data and group sample characteristic data corresponding to each sample account are described on the basis of historical orders of a plurality of sample accounts in historical duration, so that a sample characteristic set is formed. And then dividing the sample feature set into a training feature set and a verification feature set, training a preset learning model such as a machine learning algorithm by adopting the training feature set, evaluating the training effect by adopting the verification feature set, repeatedly performing training and evaluation until the apple result meets a preset threshold value, determining the preset learning model after finishing training as a risk identification model, and finishing the construction of the risk identification model. The risk identification model meeting the expected effect is constructed through training and evaluation, so that when the risk identification model is used for directly identifying the false invitation relation, the identification efficiency can be obviously improved, the labor cost can be reduced, and the identification accuracy can be effectively improved compared with manual examination in the prior art. In addition, the risk identification model is obtained by machine learning based on historical orders generated by the registered accounts of the invited users, and compared with the identification means formed by formulating a single rule by historical experience in the prior art, the risk identification model also widens the available scenes of the identification means, improves the identification coverage rate of false invitation in complex scenes and further strengthens the wind control capability of the network appointment platform.
The embodiments described above describe possible implementation manners for constructing a risk identification model, and the constructed risk identification model is used for identifying whether a false invitation relationship exists in an order placing account of a target order. Fig. 4 is a schematic flowchart of another risk identification model generation method according to an embodiment of the present application. As shown in fig. 4, the embodiment of the present application includes:
s301: and acquiring a target order of the order-placing account.
For example, a real-time order of an order-placing account number in the network appointment platform is obtained, and the real-time order is a target order.
The data in the target order may include, but is not limited to, historical login time, login location, order calling time, location of the user when the order is called, estimated order mileage, estimated order amount, order payment time, actual order mileage, actual order payment amount, and actual order payment account number of the user who placed the order in the order account. It is understood that the user here refers to the passenger who places an order using the order-placing account in the network appointment platform.
S302: and determining target characteristic data of the order-placing account according to the target order.
The target feature data comprises individual target feature data and group target feature data.
And performing characteristic extraction on data in the target order to obtain target characteristic data of the order-placing account. The individual target characteristic data may be understood as the characteristic describing the ordering condition of the ordering account, and the group target characteristic data may be understood as the characteristic describing the ordering condition of each registered account in the invitation group to which the ordering account belongs. It should be noted that the order-placing account number may be a registered account that is formed by invitation or a registered account that is registered by itself.
Specifically, the type of the data included in each of the individual target characteristic data and the group target characteristic data is the same as the type of the data included in each of the individual sample characteristic data and the group sample characteristic data, and the specific contents may be described with reference to the foregoing embodiments, which are not limited herein. If the order account number is a registered user which is registered by the user, the group target characteristic data is empty, and correspondingly, the identification result of the order account number which uses the risk identification model to identify the false invitation relation is that no false invitation relation exists.
S303: and determining whether the target order is a false order according to the target characteristic data and the risk identification model, and if so, determining that a false invitation relation exists in the order-placing account.
And identifying the target characteristic data as the input of the risk identification model to obtain a corresponding identification result, and determining whether the target order is a false order according to the identification result.
For example, if the target characteristic data is identified by the risk identification model and then the target order is determined to be a false order, it is indicated that the order placing account corresponding to the target order has a false invitation relationship, and identification of whether the order placing account has the false invitation relationship is achieved.
According to the risk identification model generation method provided by the embodiment of the application, whether the order placing account number of the target order has the false invitation relation or not is identified by utilizing the generated risk identification model, so that the target order of the order placing account number in the network taxi appointment platform is identified in real time, whether the order placing account number has the false invitation relation or not is directly identified, and compared with manual auditing in the prior art, the identification efficiency can be obviously improved, and the labor cost can be reduced. In addition, the risk identification model is obtained by machine learning based on historical orders generated by the registered accounts of the invited users, and compared with the identification means formed by formulating a single rule by historical experience in the prior art, the risk identification model also widens the available scenes of the identification means, improves the identification coverage rate of false invitation in complex scenes and further strengthens the wind control capability of the network appointment platform.
Fig. 5 is a schematic structural diagram of a risk identification model generation apparatus according to an embodiment of the present application. As shown in fig. 5, a risk identification model generation apparatus 400 provided in an embodiment of the present application includes:
the collecting module 401 is configured to collect a sample set.
The sample set comprises historical orders of a plurality of sample accounts in historical duration, and the sample accounts are registered accounts of invited users.
A feature extraction module 402, configured to determine a sample feature set according to the sample set.
The sample feature set comprises individual sample feature data and group sample feature data corresponding to each sample account.
And a generating module 403, configured to generate a risk identification model according to the sample feature set.
The risk identification model is used for identifying whether a false invitation relation exists in an order placing account of a target order.
In one possible design, the feature extraction module 402 is specifically configured to:
and determining individual sample characteristic data and group sample characteristic data corresponding to each sample account according to the historical order of each sample account.
In one possible design, the individual sample characteristic data of each sample account includes at least: whether the first login city is inconsistent with the first car city, the distance between the car taking position and the order starting position, the interval between the passenger payment time and the driver settlement time, the order discount amount and the order total amount;
the group sample characteristic data of each sample account at least comprises the following data: the total account number invited by the invitation account of the sample account, the total account number of the completed payment invited by the invitation account of the sample account and the invited mean value data corresponding to the sample account;
the invited mean value data comprises a mean value of time intervals between the order-forming time and the invited time of each invited account, a mean value of the actual payment amount of each invited account, a mean value of the riding mileage of each invited account and the number of payment accounts corresponding to each invited account, wherein each invited account refers to each registered account invited by the inviting account of the sample account.
In one possible design, the generating module 403 is specifically configured to:
dividing a sample feature set corresponding to each sample account into a training feature set and a verification feature set according to a preset proportion;
training a preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the training characteristic set;
evaluating an output result of the trained preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the verification characteristic set;
and repeating the steps until the evaluation result meets a preset threshold value, finishing the training, and determining the preset learning model after finishing the training as a risk identification model.
In one possible design, the evaluation result includes a target accuracy rate and/or a target recall rate;
the target accuracy rate and the target recall rate are respectively the accuracy rate and the recall rate for obtaining a target output result, and the target output result is an output result of the trained preset learning model for the risk marked sample account.
On the basis of fig. 5, fig. 6 is a schematic structural diagram of another risk identification model generation apparatus provided in the embodiment of the present application. As shown in fig. 6, the risk identification model generation apparatus 400 according to the embodiment of the present application further includes: a marking module 404. The marking module 404 is configured to:
and marking the sample accounts according to a preset risk index, wherein the sample accounts with the false invitation relation obtain a risk mark, and the sample accounts without the false invitation relation obtain a non-risk mark.
On the basis of fig. 6, fig. 7 is a schematic structural diagram of another risk identification model generation apparatus provided in the embodiment of the present application. As shown in fig. 7, the risk identification model generation apparatus 400 according to the embodiment of the present application further includes: a module 405 is identified. The identification module 405 is configured to:
acquiring a target order of a placing account;
determining target characteristic data of an order-placing account according to a target order, wherein the target characteristic data comprises individual target characteristic data and group target characteristic data;
and determining whether the target order is a false order according to the target characteristic data and the risk identification model, and if so, determining that a false invitation relation exists in the order-placing account.
In one possible design, the preset learning model is any one of a logistic regression model, an extreme gradient boosting model, and a gradient boosting decision tree.
The risk identification model generation device provided in the embodiment of the application can execute corresponding steps of the risk identification model generation method in the above method embodiments, and the implementation principle and the technical effect are similar, and are not described herein again.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device 500 may include: a processor 501, and a memory 502 communicatively coupled to the processor 501.
The memory 502 is used for storing programs. In particular, the program may include program code comprising computer-executable instructions.
Memory 502 may comprise high-speed RAM memory, and may also include non-volatile memory (MoM-volatile memory), such as at least one disk memory.
Processor 501 is configured to execute computer-executable instructions stored in memory 502 to implement a risk identification model generation method.
The processor 501 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
Alternatively, the memory 502 may be separate or integrated with the processor 501. When the memory 502 is a device independent of the processor 501, the electronic device 500 may further include:
the bus 503 is used to connect the processor 501 and the memory 502. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. Buses may be classified as address buses, data buses, control buses, etc., but do not represent only one bus or type of bus.
Alternatively, in a specific implementation, if the memory 502 and the processor 501 are integrated on a chip, the memory 502 and the processor 501 may communicate through an internal interface.
The present application also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, specifically, the computer-readable storage medium stores therein computer-executable instructions, and the computer-executable instructions are used in the risk identification model generation method in the foregoing embodiment.
The present application also provides a computer program product comprising computer executable instructions that, when executed by a processor, implement the risk identification model generation method in the above embodiments.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (11)

1. A method for generating a risk identification model, comprising:
collecting a sample set, wherein the sample set comprises historical orders of a plurality of sample accounts in a historical duration, and the sample accounts are registered accounts of an invited user;
determining a sample feature set according to the sample set, wherein the sample feature set comprises individual sample feature data and group sample feature data corresponding to each sample account;
and generating a risk identification model according to the sample feature set, wherein the risk identification model is used for identifying whether a false invitation relation exists in an order placing account of the target order.
2. The method of generating a risk identification model according to claim 1, wherein the determining a set of sample features from the set of samples comprises:
and determining the individual sample characteristic data and the group sample characteristic data corresponding to each sample account according to the historical order of each sample account.
3. The risk identification model generation method of claim 2, wherein the individual sample characteristic data of each sample account at least comprises: whether the first login city is inconsistent with the first car city, the distance between the car taking position and the order starting position, the interval between the passenger payment time and the driver settlement time, the order discount amount and the order total amount;
the group sample characteristic data of each sample account at least comprises: the total account number invited by the invitation account of the sample account, the total account number of payment completion invited by the invitation account of the sample account and the invited mean value data corresponding to the sample account;
the invited mean value data comprises a mean value of time intervals between the single-forming time and the invited time of each invited account, a mean value of the actual payment amount of each invited account, a mean value of the riding mileage of each invited account and the number of payment accounts corresponding to each invited account, wherein each invited account refers to each registered account invited by the inviting account of the sample account.
4. The method of generating a risk identification model according to claim 3, wherein the generating a risk identification model from the sample feature set comprises:
dividing the sample feature set corresponding to each sample account into a training feature set and a verification feature set according to a preset proportion;
training a preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the training characteristic set;
evaluating an output result of the trained preset learning model by using the individual sample characteristic data and the group sample characteristic data included in the verification characteristic set;
and repeating the steps until the evaluation result meets a preset threshold value, finishing the training, and determining the preset learning model after finishing the training as the risk identification model.
5. The risk identification model generation method of claim 4, wherein the assessment results comprise a target accuracy rate and/or a target recall rate;
the target accuracy rate and the target recall rate are respectively the accuracy rate and the recall rate for obtaining a target output result, and the target output result is an output result of the trained preset learning model on a sample account number of the risk marker.
6. The method of generating a risk identification model according to claim 5, further comprising, prior to said determining a set of sample features from the set of samples:
and marking the sample accounts according to a preset risk index, wherein the sample accounts with the false invitation relation obtain the risk mark, and the sample accounts without the false invitation relation obtain a non-risk mark.
7. The risk identification model generation method of any one of claims 1 to 6, wherein the risk identification model identifies whether the order placing account of the target order has the false invitation relationship, and comprises:
acquiring a target order of a placing account;
determining target characteristic data of the order-placing account according to the target order, wherein the target characteristic data comprises individual target characteristic data and group target characteristic data;
and determining whether the target order is a false order according to the target characteristic data and the risk identification model, and if so, determining that the order placing account number has the false invitation relation.
8. The risk identification model generation method according to any one of claims 4 to 6, wherein the preset learning model is any one of a logistic regression model, an extreme gradient boosting model, and a gradient boosting decision tree.
9. A risk identification model generation apparatus, comprising:
the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring a sample set, the sample set comprises historical orders of a plurality of sample accounts in a historical duration, and the sample accounts are registered accounts of an invited user;
the characteristic extraction module is used for determining a sample characteristic set according to the sample set, wherein the sample characteristic set comprises individual sample characteristic data and group sample characteristic data corresponding to each sample account;
and the generating module is used for generating a risk identification model according to the sample feature set, and the risk identification model is used for identifying whether the order placing account of the target order has a false invitation relation.
10. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the risk identification model generation method of any of claims 1 to 8.
11. A computer-readable storage medium having stored thereon computer-executable instructions for implementing the risk identification model generation method of any one of claims 1 to 8 when executed by a processor.
CN202210211156.9A 2022-03-03 2022-03-03 Risk identification model generation method, device, equipment and storage medium Pending CN114596111A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210211156.9A CN114596111A (en) 2022-03-03 2022-03-03 Risk identification model generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210211156.9A CN114596111A (en) 2022-03-03 2022-03-03 Risk identification model generation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114596111A true CN114596111A (en) 2022-06-07

Family

ID=81814717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210211156.9A Pending CN114596111A (en) 2022-03-03 2022-03-03 Risk identification model generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114596111A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115423323A (en) * 2022-09-05 2022-12-02 浙江口碑网络技术有限公司 Security management method and device, electronic equipment and computer storage medium
CN116452006A (en) * 2023-06-08 2023-07-18 北京龙驹易行科技有限公司 Wind control method, device, computer equipment and medium for new activity of driver

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409918A (en) * 2018-08-24 2019-03-01 深圳壹账通智能科技有限公司 The recognition methods of wool party, device, equipment and storage medium based on user behavior
US20190244115A1 (en) * 2017-05-16 2019-08-08 Tsinghua University Invitation behavior prediction method and apparatus, and storage medium
CN110428291A (en) * 2019-08-07 2019-11-08 上海观安信息技术股份有限公司 A method of Hei Chan clique is identified using directed acyclic graph
CN111489190A (en) * 2020-03-16 2020-08-04 上海趣蕴网络科技有限公司 Anti-cheating method and system based on user relationship
CN111681027A (en) * 2020-04-16 2020-09-18 上海淇玥信息技术有限公司 Risk management method and device in information popularization, electronic equipment and storage medium
CN111881991A (en) * 2020-08-03 2020-11-03 联仁健康医疗大数据科技股份有限公司 Method and device for identifying fraud and electronic equipment
CN112100452A (en) * 2020-09-17 2020-12-18 京东数字科技控股股份有限公司 Data processing method, device, equipment and computer readable storage medium
CN112685610A (en) * 2020-12-24 2021-04-20 中国平安人寿保险股份有限公司 False registration account identification method and related device
CN113205129A (en) * 2021-04-28 2021-08-03 五八有限公司 Cheating group identification method and device, electronic equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190244115A1 (en) * 2017-05-16 2019-08-08 Tsinghua University Invitation behavior prediction method and apparatus, and storage medium
CN109409918A (en) * 2018-08-24 2019-03-01 深圳壹账通智能科技有限公司 The recognition methods of wool party, device, equipment and storage medium based on user behavior
CN110428291A (en) * 2019-08-07 2019-11-08 上海观安信息技术股份有限公司 A method of Hei Chan clique is identified using directed acyclic graph
CN111489190A (en) * 2020-03-16 2020-08-04 上海趣蕴网络科技有限公司 Anti-cheating method and system based on user relationship
CN111681027A (en) * 2020-04-16 2020-09-18 上海淇玥信息技术有限公司 Risk management method and device in information popularization, electronic equipment and storage medium
CN111881991A (en) * 2020-08-03 2020-11-03 联仁健康医疗大数据科技股份有限公司 Method and device for identifying fraud and electronic equipment
CN112100452A (en) * 2020-09-17 2020-12-18 京东数字科技控股股份有限公司 Data processing method, device, equipment and computer readable storage medium
CN112685610A (en) * 2020-12-24 2021-04-20 中国平安人寿保险股份有限公司 False registration account identification method and related device
CN113205129A (en) * 2021-04-28 2021-08-03 五八有限公司 Cheating group identification method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邹斌: "复杂网络关键数据虚假特征准确识别仿真", 计算机仿真, no. 11, 15 November 2018 (2018-11-15), pages 452 - 455 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115423323A (en) * 2022-09-05 2022-12-02 浙江口碑网络技术有限公司 Security management method and device, electronic equipment and computer storage medium
CN116452006A (en) * 2023-06-08 2023-07-18 北京龙驹易行科技有限公司 Wind control method, device, computer equipment and medium for new activity of driver

Similar Documents

Publication Publication Date Title
TWI784941B (en) A multi-sampling model training method and device
CN114596111A (en) Risk identification model generation method, device, equipment and storage medium
CN111275491B (en) Data processing method and device
CN108665159A (en) A kind of methods of risk assessment, device, terminal device and storage medium
CN107730131B (en) Capability prediction and recommendation method and device for crowdsourced software developers
CN106127505A (en) The single recognition methods of a kind of brush and device
CN106557955A (en) Net about car exception order recognition methodss and system
KR20190032495A (en) METHOD AND DEVICE FOR MODELING EVALUATION MODELS
CN105931068A (en) Cardholder consumption figure generation method and device
CN108764375B (en) Highway goods stock transprovincially matching process and device
CN111260102A (en) User satisfaction prediction method and device, electronic equipment and storage medium
CN109816409A (en) A kind of used car pricing method, device, equipment and computer-readable medium
CN111861643A (en) Riding position recommendation method and device, electronic equipment and storage medium
CN108182448B (en) Selection method of marking strategy and related device
CN110784435B (en) Abnormal service identification method and device, electronic equipment and storage medium
CN116028702A (en) Learning resource recommendation method and system and electronic equipment
CN111754261B (en) Method and device for evaluating taxi willingness and terminal equipment
CN113420789A (en) Method, device, storage medium and computer equipment for predicting risk account
CN111626828B (en) Wind control detection method, system and device for network appointment vehicle order and storage medium
CN111091401A (en) Network appointment pricing method, system, storage medium and equipment
CN111835730B (en) Service account processing method and device, electronic equipment and readable storage medium
CN113744070A (en) New energy automobile insurance cost prediction method and device and computer equipment
CN113256368B (en) Product pushing method and device, computer equipment and storage medium
CN112801750B (en) Order allocation method and device
CN115204972A (en) Method, device and medium for processing volume order based on sharing of volume masters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination