WO2022199185A1 - 用户操作检测方法及程序产品 - Google Patents

用户操作检测方法及程序产品 Download PDF

Info

Publication number
WO2022199185A1
WO2022199185A1 PCT/CN2021/142701 CN2021142701W WO2022199185A1 WO 2022199185 A1 WO2022199185 A1 WO 2022199185A1 CN 2021142701 W CN2021142701 W CN 2021142701W WO 2022199185 A1 WO2022199185 A1 WO 2022199185A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
feature
cost
model
data
Prior art date
Application number
PCT/CN2021/142701
Other languages
English (en)
French (fr)
Inventor
郭旭阳
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022199185A1 publication Critical patent/WO2022199185A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of financial technology (Fintech), and in particular, to a user operation detection method and a program product.
  • artificial experience is usually used to process the data generated by the user's online operation behavior, extract corresponding features, and then form a corresponding detection model based on the extracted features to identify financial fraud risks. Since the selection of features depends on human experience, it may cause the imbalance of feature selection, resulting in model failure. On the other hand, detection models formed by selecting features based on artificial experience usually have a low iteration frequency and cannot identify new and emerging fraudulent behaviors. Furthermore, due to the particularity of anti-fraud scenarios, the proportion of fraudulent behavior samples is much smaller than that of non-fraudulent behavior samples.
  • the commonly used methods for sample training include under-sampling of most types of samples, Minority class sample oversampling, minority class sample synthetic data (such as Synthetic Minority Oversampling Technique, SMOTE) method, etc., or the use of cost-sensitive learning algorithm for training sample data, all of which are mechanically based on the sample ratio of the original data. Changes will cause deviations in the training results, and it is easy to ignore the cost of losses caused by misjudging fraudulent samples as normal samples, causing harm to the interests of users and financial institutions.
  • SMOTE Synthetic Minority Oversampling Technique
  • the present application provides a user operation detection method and program product, which are used to solve the problem that when the prior art detects fraudulent behavior, the artificial selection of features is unbalanced, resulting in a low iteration frequency of the model used for detection, which will lead to detection failure, and the uneven sample ratio is easy to ignore.
  • the present application provides a user operation detection method, the detection method includes:
  • the behavioral data is used to represent the data generated by the online operation of the account by historical users;
  • the original feature list is determined according to the behavior data and each preset time granularity
  • the target feature is determined according to the original feature list and a preset feature screening strategy, and the preset feature screening strategy is used to adaptively filter all the features according to the original feature list. describe the characteristics of the target;
  • a detection model is generated according to the preset training model, the target feature, and the result data included in the behavior data, so as to detect whether the user operation is the behavior of the user through the detection model, and the result data is used to represent Whether the online operation is the historical user's own behavior.
  • the detecting whether the user operation is the user's behavior through the detection model includes:
  • the current behavior data is used to represent the data generated by the online operation of the account by the current user, and the operation result of the current behavior data is success;
  • the method further includes:
  • the judgment result is reported, and the detection model is iterated according to the judgment result and the current behavior data.
  • determining the original feature list according to the behavior data and each preset time granularity includes:
  • each dimension data corresponding to the result data mapping in each piece of behavior data includes environment dimension data and user operation dimension data;
  • the original feature list is generated according to each feature value corresponding to each dimension data.
  • the target feature is determined according to the original feature list and a preset feature screening strategy, including:
  • a preset feature is specified from the corresponding feature set of the original feature list according to the preset business scenario, and the features in the feature set other than the preset feature are specified.
  • the remaining features are determined as remaining features;
  • the selected candidate features are designated as the new preset features, and the determination of the correlation coefficient between the preset features and each remaining feature according to the preset correlation algorithm is repeated, and the correlation coefficient is determined according to the preset correlation algorithm.
  • the correlation coefficient and the preset correlation threshold value select candidate features from the remaining features, until the feature set of each dimension data is empty;
  • the preset feature screening strategy includes the preset correlation algorithm and the screening step.
  • the detection model is generated according to the preset training model, the target feature, and the result data contained in the behavior data, including:
  • the detection model is generated according to the cost model, the cost threshold and the target feature.
  • the preset training model is a preset random forest model
  • the preset training model is based on the cost model, the cost threshold and the target feature to generate the The detection model described above, including:
  • splitting steps are repeated until the classification of all target features is completed, so as to obtain the detection model according to the decision tree formed by each sub-decision tree.
  • the preset training model is any one of a preset regression algorithm, a preset classification algorithm and a preset neural network model
  • the preset training model is based on the preset training model according to the preset training model.
  • the detection model is generated by the cost model, the cost threshold and the target feature, including:
  • All target features and the result data corresponding to the target features are used as training samples, and the training parameters are obtained through a preset algorithm so that the target loss function is the minimum value, and the preset algorithm includes a preset gradient descent algorithm or Preset maximum likelihood estimation algorithm;
  • a classifier is obtained according to the training parameters and the prediction function, so as to obtain the detection model according to the classifier.
  • the method further includes:
  • a cost saving parameter is determined according to the respective corresponding values of the first cost and the second cost, so that the cost saving level of the detection model is represented by the cost saving parameter.
  • the present application further provides a computer program product, including a computer program, which implements any possible detection model generation provided by the first aspect when the computer program is executed by a processor.
  • the present application provides a user operation detection method and program product.
  • First obtain multiple pieces of behavior data that represent historical users' online operations on the account, and then determine the original feature list of each piece of behavior data according to the obtained behavior data and each preset time granularity, and then further according to the original feature list and preset time granularity.
  • Set the feature screening strategy to adaptively filter out the target features required for generating the detection model, and finally generate the detection model based on the screened target features, the preset cost matrix and the result data contained in the behavior data, so as to operate the user through the detection model.
  • Whether it is the user's own behavior is detected, wherein the result data is used to represent whether the historical user's online operation is the historical user's own behavior.
  • the detection model generation method provided by the present application, firstly, the target features required for the generation model are obtained by adaptive screening according to behavior data, and do not completely rely on artificial experience to extract corresponding features. When determining the target features, more attention is paid to the behavior data itself, so as not to There is a situation where the model fails due to the imbalance of feature selection, and the generated detection model can have an efficient iteration frequency, which can effectively identify unknown new operation behaviors.
  • a preset cost matrix is used to generate a detection model, which fully considers the loss cost of different types of samples.
  • the generated detection model aims to pursue the minimum loss cost and effectively maintain users and financial institutions. 's rights.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a user operation detection method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another user operation detection method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of still another user operation detection method provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of another user operation detection method provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another user operation detection method provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of another user operation detection method provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of another user operation detection method provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of another user operation detection method provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a user operation detection device provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a feature screening module provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of a generation module provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • data generated by online operations are usually processed manually to extract corresponding features, and then a detection model is formed based on the extracted features.
  • the extraction of these features relies on human experience, which will cause an imbalance in feature selection, which in turn leads to the failure of the model recognition function.
  • manually selected features are usually used as fixed features, resulting in low model iteration frequency and inability to identify new and unknown fraud patterns.
  • the samples of fraudulent behaviors are usually much smaller than the samples of non-fraudulent behaviors.
  • samples are usually under-sampling of the majority class and oversampling of the minority class of samples during the model generation process for sample training.
  • the present application provides a user operation detection method and a program product.
  • the inventive concept of the user operation detection method provided by the present application is: in view of the technical defects existing in the manual selection of features in the prior art, the present application sets a preset feature screening strategy to automatically select the target features required for generating a detection model according to behavior data. Adapted to the screening, it does not rely entirely on human experience and pays more attention to the behavior data itself, so that there is no situation where the model fails due to the imbalance of feature selection, and the generated detection model can have an efficient iteration frequency to deal with unknown operations. Effective identification of behavior.
  • the present application sets a preset cost matrix, which fully considers the loss costs of different types of samples.
  • the goal is to pursue the minimum loss cost, so as to effectively maintain rights of users and financial institutions.
  • FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • the network is used as a medium for providing a communication link between the terminal device 11 and the server 12 , and the network may include various connection types, such as wired, wireless communication links, or fiber optic cables.
  • the terminal device 11 and the server 12 can interact through the network to receive or send messages.
  • the terminal device 11 may be any terminal configured on the user end, so that the user can perform online operations on the account through the terminal device 11 .
  • the server 12 is an electronic device corresponding to the user operation detection device that can execute the user operation detection method provided by the embodiment of the present application.
  • the server 12 can be configured in any terminal or server on the financial institution side, and the connection between the server 12 and the terminal device 11 Information can be exchanged through the network, so that the server 12 can execute the user operation detection method provided by the embodiment of the present application, so as to effectively identify the user operation performed on the terminal device 11 and effectively maintain the relevant interests of the user and the financial institution.
  • the terminal device 11 may be a computer, a smart phone, smart glasses, a smart bracelet, a smart watch, a tablet computer, etc., as shown in FIG. 1 .
  • the terminal device 11 is shown as an example of a smart phone.
  • the server 12 may also be a server cluster, which is not limited in this embodiment.
  • FIG. 2 is a schematic flowchart of a user operation detection method provided by an embodiment of the present application. As shown in FIG. 2 , the user operation detection method provided by this embodiment includes:
  • the behavior data is used to represent the data generated by historical users' online operations on the account.
  • the generated data is defined as behavioral data, that is, behavioral data is used to represent historical users' actions on accounts. Data generated by online operations.
  • the data generated by one user's online operation of the account is regarded as one copy, and in this step, the data generated by the online operation of the account by multiple users within a preset period of time in the past can be obtained, which is: Obtain multiple pieces of behavior data, and users within a preset period of time in the past are historical users.
  • the acquired behavioral data can be desensitized by hashing, and corresponding verification tools can be embedded to ensure the integrity of the behavioral data. accuracy.
  • S102 Determine an original feature list according to the behavior data and each preset time granularity, and determine a target feature according to the original feature list and a preset feature screening strategy.
  • the preset feature screening strategy is used to adaptively screen target features according to the original feature list.
  • FIG. 3 is a schematic flowchart of another user operation detection method provided by an embodiment of the present application. As shown in FIG. 3 , in the user operation detection method provided by this embodiment, an original feature list is determined according to behavior data and each preset time granularity methods, including:
  • the dimension data includes environment dimension data and user operation dimension data.
  • the behavior data For the behavior data generated by the historical user's online operation, the behavior data contains result data, and the included result data is used to indicate whether the online operation on the account by the historical user corresponding to the behavior data is the historical user himself.
  • the result data is the behavior of the historical user, and the other is the behavior of the non-historical user.
  • each piece of behavior data also includes various dimension data of other dimensions corresponding to the result data mapping. These dimension data include environmental dimension data and user operation dimension data.
  • a result data corresponds to each link. Dimension data and each user's operation dimension data.
  • the environmental dimension data refers to the corresponding data in the behavior data used to represent the online operating environment of the historical user.
  • the environmental dimension data may include but not limited to the operation time of the historical user on the account, the relevant information of the operation device, the operation time of the user. network information and geographic location, etc.
  • User operation dimension data refers to some corresponding data in behavioral data that is used to represent the historical user's identity information when performing online operations on the account. Optical Character Recognition, Optical Character Recognition), the name, mobile phone number, bank card number, login password, number of times of wrong password input, etc. used in historical user operations. Therefore, for each piece of behavior data, each link dimension data and user operation dimension data corresponding to the result data in the piece of behavior data can be obtained.
  • the characteristic value of each dimension data in each preset time granularity is determined. For example, for the user operation dimension data of name used in historical user operations, multiple different time granularities can be set, such as yesterday, nearly 7 days, nearly a month, and nearly half a year, and then determine the time granularity at each time granularity. , the number of occurrences of the dimension data of the name used in historical user operations, the proportion of the final operation result being successful, and other corresponding values, these corresponding values are the corresponding feature values of the dimension data in the preset time granularity.
  • each link dimension data and each user operation dimension data the corresponding feature values in each preset time granularity are respectively determined, so as to obtain each feature value corresponding to each dimension data. It can be seen from this that the feature value corresponding to each dimension data is essentially the data representation of each dimension data in the preset time granularity.
  • the specific setting of the preset time granularity may be set correspondingly according to the specific situation of the behavior data in the actual working condition, which is not limited in this embodiment.
  • S203 Generate an original feature list according to each feature value corresponding to each dimension data.
  • each characteristic value corresponding to each dimension data is obtained. If each dimension data and the corresponding characteristic value of the dimension data in the preset time granularity are passed through a list For representation, the list is the original feature list, so for each dimension data in each piece of behavior data, an original feature list is generated according to each feature value corresponding to each dimension data. In addition, for each piece of behavior data, each dimension data and the corresponding feature value of the dimension data in each preset time granularity can be represented by the original feature list.
  • each feature represented by the original feature list is Each dimension data and the feature value corresponding to the dimension data in each preset time granularity, for example, the name "Zhang San” is a dimension data, the last 7 days is a preset time granularity, and the dimension data is in the preset time granularity. If there are three occurrences in the time granularity, that is, the feature value is "3", then "the name Zhang San has appeared three times in the past 7 days" is the feature.
  • the original feature list is determined according to the behavior data and each preset time granularity, and each dimension data corresponding to the result data mapping in each piece of behavior data is first obtained, wherein the dimension data includes environmental dimension data and user operation dimension data, and then determine the corresponding feature values of each environmental dimension data and each user operation dimension data in each preset time granularity, and finally, for each dimension data in each piece of behavior data, according to each dimension data
  • the corresponding feature values generate the original feature list.
  • the embodiment shown in FIG. 3 describes the determination of the original feature list according to the behavior data and the preset time granularity. Further, adaptive feature screening may be performed based on the original feature list and the preset feature screening strategy, so as to filter the selected target features Corresponding steps for subsequent detection model generation process.
  • the preset feature screening strategy is to combine the actual business scenarios when the behavior data is generated as much as possible, and based on the correlation between the features in the original feature list and the corresponding relationship between the features and the result data, so that the The preset feature screening strategy realizes the adaptive screening of features, so as to pay more attention to the behavior data itself and make the target features dynamic with the changes of the behavior data.
  • the model discovers unknown new non-personal operating behaviors for effective identification.
  • a detection model is generated according to the preset training model, the target feature and the result data included in the behavior data, so as to detect whether the user operation is the behavior of the user through the detection model.
  • the result data is used to characterize whether the online operation is a historical user behavior.
  • the result data refers to whether the online operation of the historical user's account is the corresponding result of the historical user's own behavior, that is, the result data represents one of two situations, one of which is the historical user's online operation of the account.
  • the operation is the historical user's own behavior, and the other is the historical user's online operation on the account that is not the historical user's own behavior.
  • the elements in the preset cost matrix are used to represent the loss cost to users and financial institutions after correctly identifying non-personal behaviors and personal behaviors and misidentifying non-personal behaviors and personal behaviors.
  • the detection model if the detection model recognizes non-personal behaviors, the identified non-personal behaviors are regarded as fraudulent behaviors.
  • the loss cost of different types of samples is fully considered, and the goal is to pursue the minimum loss cost, rather than the application scenarios where the samples are unbalanced in the prior art for the purpose of pursuing the recognition accuracy of the model.
  • This method mechanically alters the sample proportions so that the cost of misjudging fraudulent behaviors as non-fraudulent behaviors is easily overlooked.
  • the loss cost caused by identifying a non-fraudulent behavior as a fraudulent behavior is often smaller than the loss cost caused by identifying a fraudulent behavior as a non-fraudulent behavior.
  • the detection model In the process of identifying fraudulent behaviors, the detection model fully considers the loss cost of different types of samples, and aims to pursue the minimum loss cost, which can effectively maintain the relevant rights and interests of users and financial institutions.
  • the selected target features and corresponding result data are used as training samples of the preset training model, and the preset training model is trained accordingly to obtain the detection model, and the generation process of the detection model is completed.
  • FIG. 4 is a schematic flowchart of still another user operation detection method provided by an embodiment of the present application. As shown in FIG. 4 , in the user operation detection method provided by this embodiment, the detection model is used to detect whether the user operation is the behavior of the user, including:
  • S301 Acquire current behavior data of the current user.
  • the current behavior data is used to represent the data generated by the current user's online operation on the account, and the operation result of the current behavior data is successful.
  • acquiring the data generated by any current user's online operation on his account that is, acquiring the current behavior data of the current user.
  • the operation result of the current behavior data of the current user is successful, it can be understood that adding the business of the current user's online operation is an account opening application, and the corresponding operation result is that the account opening application is successful.
  • adding the business of the current user's online operation is an account opening application, and the corresponding operation result is that the account opening application is successful.
  • the actual situation of the online operation performed by the current user is his own behavior, which is a non-fraudulent behavior.
  • the judgment result includes that the current user's online operation on the account is the current user's own behavior or a non-current user's own behavior.
  • the current behavior data of the current user is acquired, the current behavior data is associated with the original feature list, so as to perform data processing on the current behavior data according to each dimension data corresponding to the original feature list and each preset time granularity.
  • the associated current behavior data is input to the detection model to output the judgment result, that is, the judgment result is generated according to the associated current behavior data and the detection model.
  • the detection result of the user operation detection is indicated by the judgment result.
  • the generated judgment result includes that the current user's online operation on the account is one of the current user's own behavior and the non-current user's own behavior, so as to detect whether the user's operation is the user's own behavior.
  • the detection model After the judgment of the detection model, if the judgment result is not the current user's own behavior, it indicates that the detection model determines that the current user's online operation is a fraudulent behavior, reports the judgment result, and iterates the detection model according to the judgment result and the current behavior data. Among them, since the detection model judges the current user's online operation as fraudulent behavior, it indicates that the detection model may have a judgment error, etc., so the judgment result is reported to the corresponding analyst for further analysis, and the judgment result and the current behavior data are used to compare the The detection model performs the next iteration to improve the recognition accuracy of the detection model and speed up the iteration frequency.
  • the user operation detection method provided by the embodiment of the present application first obtains multiple pieces of behavior data, the behavior data is used to represent the data generated when historical users perform online operations on the account, and then according to the obtained behavior data and each preset time granularity Determine the original feature list of each piece of behavior data, and then further adaptively filter out the target features required for generating the detection model according to the original feature list and the preset feature screening strategy, and finally, based on the preset cost matrix, according to the preset training model,
  • the target feature and the result data contained in the behavior data first generate a detection model, and then use the detection model to detect whether the user's operation is the user's behavior.
  • the target features required for the generation model are obtained by adaptive screening according to the behavior data, and do not completely rely on artificial experience to extract the corresponding features, so that the determination of the target features pays more attention to the behavior data itself, so that there is no detection model caused by the imbalance of feature selection.
  • a preset cost matrix is used to generate the detection model for the particularity of unbalanced samples in the reaction scene, which fully considers the loss cost of different types of samples.
  • the generated detection model pursues the minimum loss cost when detecting user operations. The purpose is to effectively safeguard the relevant rights and interests of users and financial institutions.
  • FIG. 5 is a schematic flowchart of another user operation detection method provided by an embodiment of the present application. As shown in Figure 5, this embodiment includes:
  • S401 For the original feature list of each dimension data, specify a preset feature from the feature set of the corresponding original feature list according to the preset business scenario, and determine the remaining features in the feature set except the preset features as the remaining features .
  • each feature represented by the original feature list is each dimension data and the feature value corresponding to the dimension data in each preset time granularity. It can be seen that for each dimension The original feature list of the dimensional data, which can represent multiple features, and the multiple features form a feature set of the original feature list corresponding to the dimensional data.
  • a preset feature can be specified from the feature set of the corresponding original feature list according to the preset business scenario, and correspondingly, in the feature set
  • the remaining features except the specified preset features are determined as remaining features.
  • the preset service scenario is determined by the actual service involved in the actual working condition, which is not limited in this embodiment.
  • the preset feature specified according to the preset business scenario is a corresponding feature that can best characterize the data performance of the dimension data at each preset time granularity.
  • the number of preset features can also be set according to actual working conditions, for example, it can be 1 to 3.
  • S402 Determine the correlation coefficient between the preset feature and each remaining feature according to the preset correlation algorithm, and select candidate features from the remaining features according to the correlation coefficient and the preset correlation threshold.
  • the preset feature screening strategy includes a preset correlation algorithm.
  • the correlation algorithm determines the correlation coefficient between each preset feature and the remaining features, where the correlation coefficient may be a Pearson Correlation Coefficient, and the preset correlation algorithm is the corresponding algorithm for determining the correlation coefficient , which is not limited in this embodiment.
  • candidate features are screened from the remaining features according to the correlation coefficient and the preset correlation threshold.
  • a preset correlation threshold is set, the obtained correlation coefficient is compared with the preset correlation threshold, the remaining features whose correlation coefficient is greater than the preset correlation threshold are eliminated, and the remaining features that are not eliminated are sorted according to the order of the correlation coefficient. , to determine the remaining feature with the largest correlation coefficient that has not been eliminated, and determine it as a candidate feature, so as to screen out candidate features from the remaining features according to the correlation coefficient and the preset correlation threshold, and complete the screening steps of candidate features.
  • S403 Designate the selected candidate feature as a new preset feature, and repeat the determination of the correlation coefficient between the preset feature and each remaining feature according to the preset correlation algorithm, and determine the correlation coefficient between the preset feature and each remaining feature according to the preset correlation algorithm. The step of filtering out candidate features from the remaining features until the feature set of each dimension data is empty.
  • step S402 is repeatedly performed, that is, step S402 is cyclically performed to screen candidate features. After multiple rounds of the screening steps shown in step S402, step S402 is repeated. The screening step of screening candidate features until the feature set of each dimension data is empty.
  • steps S401 to S403 are performed for each dimension data, all preset features specified for each dimension data are determined as the target features corresponding to the dimension data, and the adaptive screening of the target features is completed.
  • the preset feature screening strategy includes a preset correlation algorithm and the screening steps in the above screening process.
  • the user operation detection method provided by the embodiment of the present application can combine the actual business scenario when the behavior data is generated as much as possible by setting the preset feature screening strategy, and based on the correlation between the features in the original feature list and the relationship between each feature The corresponding relationship between the result data is carried out, so that the preset feature screening strategy can realize the adaptive screening of features, so as to pay more attention to the behavior data itself, reduce the dependence on manual experience, and thus avoid the imbalance of feature selection and the slow iteration of the detection model. situation occurs.
  • FIG. 6 is a schematic flowchart of another user operation detection method provided by an embodiment of the present application. As shown in Figure 6, this embodiment includes:
  • the behavior data is used to represent the data generated by historical users' online operations on the account.
  • S502 Determine the original feature list according to the behavior data and each preset time granularity, and determine the target feature according to the original feature list and the preset feature screening strategy.
  • the preset feature screening strategy is used to adaptively screen target features according to the original feature list.
  • steps S501 to S502 are similar to the specific implementations, principles and technical effects of steps S101 to S102 in the foregoing embodiments, and are not repeated here.
  • S503 Determine a positive sample and a negative sample according to the result data corresponding to the target feature.
  • the behavior data corresponding to the target features are determined by positive and negative samples according to the result data corresponding to the target features.
  • the online operation of the historical user represented by the result data corresponding to the target feature is determined as the target feature of the behavior of the historical user that is not the user's own behavior, and correspondingly, the historical user represented by the result data corresponding to the target feature is determined as a positive sample.
  • the target feature of the online operation of the historical user's own behavior is determined as a negative sample.
  • the preset cost matrix represented in Table 1 has the following characteristics:
  • the loss cost of correct prediction is usually less than that of wrong prediction
  • the loss cost of mispredicting fraudulent behavior as non-fraudulent behavior is due to the potential risk of financial loss to both users and financial institutions, and non-fraudulent behavior is predicted to be caused by fraudulent behavior.
  • the cost of loss can also be compensated by other auxiliary means, so That is, the loss cost caused by the actual fraudulent behavior but wrongly predicted as non-fraudulent behavior is greater than the loss cost caused by the actual non-fraudulent behavior but wrongly predicted as fraudulent behavior.
  • S504 Generate a cost model according to the positive samples, the negative samples and the preset cost matrix.
  • a cost model can be generated by combining the positive and negative samples, so as to represent the different types of results through the cost model.
  • the functional relationship between the loss costs of , the cost model can be expressed by the following relationship (2):
  • T represents the training set containing positive samples and negative samples
  • i represents any sample in the training set.
  • the cost threshold is set according to the cost model and the preset cost matrix. It is assumed that C(T) represents the cost threshold, that is, the maximum loss cost.
  • the loss cost caused by non-fraudulent behavior, f j (T) represents the loss cost of using the detection model to predict all training sets as the j-th class, and the value of j can be 0 and 1.
  • C(T) min ⁇ f 0 (T), f 1 (T) ⁇ , so that the relationship between the cost threshold and the cost function can be determined to obtain the cost threshold C(T) according to the cost function.
  • S506 Based on the preset training model, generate a detection model according to the cost model, the cost threshold and the target feature.
  • the corresponding model training work is completed based on the preset training model, thereby generating the detection model.
  • the preset training model can be various classification models, such as preset random forest model, preset regression algorithm (such as Logistic regression classifier), preset classification algorithm (such as Support Vector Machine, SVM, Support Vector Machine) , preset neural network model, etc.
  • preset classification models can use different strategies and corresponding implementation means to train the preset classification models to generate detection models.
  • the adopted strategy is to construct a decision tree based on information gain to generate a detection model.
  • the preset training model is a preset random forest model
  • a possible implementation manner of step S506 is shown in FIG. 7 , which is a flowchart of another user operation detection method provided by an embodiment of the present application Schematic. As shown in Figure 7, this embodiment includes:
  • S601 Determine the information gain of each target feature according to the cost threshold and a preset information gain algorithm, and determine the target feature with the largest information gain as the split node of the current split to generate a corresponding sub-decision tree.
  • the process of using the preset random forest model to generate the detection model is essentially the process of constructing a decision tree. For each split, firstly traverse each target feature in the training set according to the cost threshold and the preset information gain algorithm to obtain each target feature information gain. After that, the target feature with the largest information gain is determined as the classification node of the current split, and the current split is completed to generate a sub-decision tree corresponding to this split node.
  • the preset information gain algorithm can be represented by the following relational formula (5):
  • T 1 represents the subset of the training set T with the target feature Fi value less than or equal to k
  • T 2 represents the subset of the training set T with the target feature Fi value greater than k
  • k is the split node this time (take value is a natural number).
  • S602 Repeat the above splitting steps until the classification of all target features is completed, so as to obtain a detection model according to the decision tree formed by each sub-decision tree.
  • Step S601 is used to classify each target feature, that is, the above-mentioned classification steps are repeatedly performed until the classification of all target features is completed, and the construction of the decision tree is completed, so that the decision tree formed by each sub-decision tree is determined as the detection model. That is, the detection model is obtained according to the decision tree formed by each sub-decision tree, and the generation process of the detection model is completed.
  • the optimal splitting node is searched based on information gain, and different loss costs are caused by different misjudgments. Therefore, for each split, Traverse all the target features, and take the target feature with the largest information gain as the split node, without considering the maximum drop in data impurity as the split node, so as to achieve the policy motivation of pursuing the minimum loss cost and effectively maintain the user. and related interests of financial institutions.
  • FIG. 8 is a schematic flowchart of still another user operation detection method provided by an embodiment of the present application. As shown in Figure 8, this embodiment includes:
  • S701 Determine a preset condition to be satisfied by a loss function of a preset training model according to a preset cost matrix, a cost model, and a cost threshold.
  • a preset regression algorithm such as a logistic regression classifier, is used as an example to describe the embodiment.
  • h ⁇ (X i ) ⁇ (0,1) represents the preset function. It can be seen that the essence of logistic regression is to find a suitable parameter ⁇ to minimize the prediction function.
  • loss function J( ⁇ ) for the prediction function can be expressed by the following relation (7):
  • S702 Generate a target loss function according to the preset condition, the cost function, and the prediction function of the preset training model.
  • the target loss function is obtained according to the preset conditions, the prediction functions in relational formula (2) and relational formula (6), that is, according to the predetermined conditions.
  • the prediction functions in relational formula (2) and relational formula (6) that is, according to the predetermined conditions.
  • S703 Use all the target features and the result data corresponding to the target features as training samples, and obtain the training parameters so that the target loss function is the minimum value through a preset algorithm.
  • the preset algorithm includes a preset gradient descent algorithm or a preset maximum likelihood estimation algorithm.
  • S704 Obtain a classifier according to the training parameters and the prediction function, so as to obtain a detection model according to the classifier.
  • the classification training of the preset training model that is, the logistic regression classifier listed in this embodiment is completed, so that the classifier obtained according to the training parameters and the prediction function is completed. It is determined as the detection model, that is, the detection model is obtained according to the classifier to complete the generation process of the detection model.
  • the target loss function in the anti-fraud scenario is minimized based on
  • a classifier is obtained according to the obtained training parameters and the prediction function, and the classifier is determined as the detection model to complete the generation of the detection model. Therefore, the detection of user operations by the obtained detection model is an identification strategy aimed at pursuing the minimum loss cost, so as to effectively maintain the relevant rights and interests of users and financial institutions.
  • S507 Test the generated detection model with the test sample to determine the cost saving level of the detection model.
  • the detection model After the detection model is generated, further, the detection model can be tested and evaluated.
  • the evaluation of the detection model provided by the embodiment of the present application is different from the commonly used evaluation of the performance of the model, such as the accuracy and recall rate, but is aimed at the actual application scenario of the detection model, with the minimum loss cost as the indicator, to the detection model. to assess the level of cost savings.
  • FIG. 9 is a schematic flowchart of still another user operation detection method provided by an embodiment of the present application. As shown in Figure 9, this embodiment includes:
  • test sample can be understood as the corresponding online operation of the account to generate corresponding data during the operation, and the data and the operation results of the financial institution on the data are used as the test sample to test the detection model accordingly.
  • S802 Generate a first test result according to the test sample and the detection model, and determine a first cost according to the first test result and the cost model.
  • the test sample is used as the data to be identified for the detection model to identify it, and the obtained identification result is the first test result generated according to the test sample and the detection model. Then, according to the generated first test result and the cost model shown in the relational formula (2) in the foregoing embodiment, the first cost of the test sample is generated, which is represented by f(Test), for example.
  • S803 Generate a second test result according to the test sample and the artificial prediction strategy, and determine the second cost according to the second test result and the cost model.
  • step S802 a second test result is first generated according to the test sample and the artificial prediction strategy, and a second cost is determined according to the second test result and the cost model, for example, represented by C(Test).
  • C(Test) the cost model
  • S804 Determine a cost saving parameter according to the respective corresponding values of the first cost and the second cost, so that the cost saving level of the detection model is represented by the cost saving parameter.
  • the corresponding cost saving parameters can be determined by, for example, the determination method of the cost saving parameters represented by the relational formula (10), and the relational formula (10) is as follows:
  • the cost saving level of the detection model can be represented by the cost saving parameter, that is, when the value of the cost saving parameter (CS) is larger, it indicates that the cost saved by using the detection model is higher. The higher the level of cost savings for the detection model.
  • the user operation detection method provided by the embodiment of the present application first obtains multiple pieces of behavior data, then determines the original feature list of each piece of behavior data according to the obtained behavior data and each preset time granularity, and then further according to the original feature list and the preset time granularity.
  • the feature screening strategy is set to adaptively filter out the target features required to generate the detection model, and finally, based on the preset cost matrix, according to different preset training models, different training strategies are adopted to combine the target features and the result data contained in the behavior data. Train the preset training model to generate a detection model. And after the detection model is generated, test samples are also obtained to evaluate the cost saving level of the detection model through the test samples.
  • a preset feature screening algorithm is used to achieve adaptive screening based on behavioral data, instead of relying entirely on manual experience to extract corresponding features, and more attention is paid to the behavioral data itself to ensure that there is no model due to feature selection imbalance. Failure occurs, and the generated detection model can have an efficient iteration frequency, which can effectively identify unknown new account operation behaviors.
  • a preset cost matrix is set for the particularity of unbalanced samples in application scenarios, which fully considers the loss costs of different types of samples. The generated detection model aims to pursue the minimum loss cost when performing user operation detection, effectively maintaining users and financial institutions. related interests of the institution.
  • the evaluation strategies for the performance of the commonly used models such as model accuracy and recall rate are not adopted, but the minimum loss cost is used as the evaluation index to evaluate the cost saving level of the detection model.
  • the practicability and feasibility of the detection model provided by the embodiments of the present application are further improved.
  • FIG. 10 is a schematic structural diagram of a user operation detection apparatus provided by an embodiment of the present application.
  • the user operation detection device 900 provided in this embodiment includes:
  • the acquisition module 901 is used to acquire multiple copies of behavior data.
  • the behavior data is used to represent the data generated by historical users' online operations on the account.
  • the feature screening module 902 is configured to determine the original feature list according to the behavior data and each preset time granularity, and determine the target feature according to the original feature list and the preset feature screening strategy.
  • the preset feature screening strategy is used to adaptively screen target features according to the original feature list.
  • the processing module 903 is configured to generate a detection model based on the preset cost matrix, according to the preset training model, the target feature and the result data included in the behavior data, and detect whether the user operation is the user's behavior through the detection model.
  • the result data is used to characterize whether the online operation is a historical user behavior.
  • the processing module 903 in the user operation detection device 900 provided in this embodiment is specifically used for:
  • the current behavior data is used to represent the data generated by the current user's online operation of the account, and the operation result of the current behavior data is successful;
  • the current behavior data is associated with the original feature list to generate a judgment result according to the associated current behavior data and the detection model, and the judgment result includes the current user's online operation on the account as the current user's own behavior or non-current user's own behavior.
  • the user operation detection device 900 further includes:
  • the behavior determination module is used if the judgment result is not the current user's own behavior
  • the reporting and iteration module is used to report the judgment result, and iterate the detection model according to the judgment result and the current behavior data.
  • the feature screening module 902 includes respective sub-modules as shown in FIG. 11 , which is a schematic structural diagram of a feature screening module provided by an embodiment of the present application. As shown in FIG. 1 , the feature screening module 902 provided in this embodiment includes: a first processing submodule 9021 and a second processing submodule 9022 .
  • the first processing sub-module 9021 is used for:
  • each dimension data corresponding to the result data mapping in each piece of behavior data includes environmental dimension data and user operation dimension data;
  • the original feature list is generated according to each feature value corresponding to each dimension data.
  • the second processing sub-module 9022 is used for;
  • the preset feature screening strategy includes a preset correlation algorithm and a screening step.
  • FIG. 12 is a schematic structural diagram of a processing module provided by an embodiment of the present application.
  • the processing module 903 provided in this embodiment includes:
  • the third processing sub-module 9031 is used to determine positive samples and negative samples according to the result data corresponding to the target feature
  • the fourth processing sub-module 9032 configured to generate a cost model according to the positive samples, negative samples and a preset cost matrix
  • the fifth processing sub-module 9033 is used to determine the cost threshold according to the cost model and the preset cost matrix
  • the sixth processing sub-module 9034 is configured to generate a detection model according to the cost model, the cost threshold and the target feature based on the preset training model.
  • the sixth processing submodule 9034 is specifically used for:
  • the sixth processing submodule 9034 is specifically used for:
  • All target features and the result data corresponding to the target features are used as training samples, and the training parameters that make the target loss function a minimum value are obtained through a preset algorithm, and the preset algorithm includes a preset gradient descent algorithm or a preset maximum likelihood estimation algorithm;
  • a classifier is obtained according to the training parameters and the prediction function, so as to obtain a detection model according to the classifier.
  • the user operation detection device 900 further includes: a test module; the test module is specifically used for:
  • a cost saving parameter is determined according to the respective corresponding values of the first cost and the second cost, so that the cost saving level of the detection model is represented by the cost saving parameter.
  • the user operation detection device provided by the above embodiments can be used to execute the corresponding steps in the user operation detection method provided by any of the above embodiments.
  • modules division is only a logical function division, and there may be other division manners in actual implementation.
  • multiple modules can be combined or can be integrated into another system.
  • the coupling between the various modules may be implemented through some interfaces, which are usually electrical communication interfaces, but may be mechanical interfaces or other forms of interfaces.
  • modules described as separate components may or may not be physically separate, and may be located in one place or distributed in different locations on the same or different devices.
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device 1000 may include: at least one processor 1001 and a memory 1002 .
  • FIG. 13 shows a processor as an example.
  • the memory 1002 is used to store programs.
  • the program may include program code, and the program code includes computer operation instructions.
  • Memory 1002 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
  • the processor 1001 is configured to execute the computer program stored in the memory 1002, so as to implement the steps in the user operation detection method in the above method embodiments.
  • the processor 1001 may be a central processing unit (central processing unit, referred to as CPU), or a specific integrated circuit (application specific integrated circuit, referred to as ASIC), or is configured to implement one or more of the embodiments of the present application. multiple integrated circuits.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • the memory 1002 may be independent or integrated with the processor 1001 .
  • the electronic device 1000 may further include:
  • the bus 1003 is used to connect the processor 1001 and the memory 1002 .
  • the bus may be an industry standard architecture (abbreviated as ISA) bus, a peripheral component (PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, and the like. Buses can be divided into address bus, data bus, control bus, etc., but it does not mean that there is only one bus or one type of bus.
  • the memory 1002 and the processor 1001 can communicate through an internal interface.
  • the present application also provides a computer-readable storage medium
  • the computer-readable storage medium may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM) ), a magnetic disk or an optical disk and other media that can store program codes, specifically, a computer program is stored in the computer-readable storage medium, and when at least one processor of the above-mentioned electronic device executes the computer program, the electronic device executes the above-mentioned Various steps of the user operation detection method provided by various implementations of .
  • Embodiments of the present application further provide a computer program product, where the computer program product includes a computer program, and the computer program is stored in a readable storage medium.
  • At least one processor of the electronic device can read the computer program from the readable storage medium, and the at least one processor executes the computer program so that the device implements each step of the user operation detection method provided by the above-mentioned various embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种用户操作检测方法及程序产品。首先获取多份行为数据(S101),并根据行为数据和各预设时间粒度确定原始特征列表,再根据原始特征列表和预设特征筛选策略筛选目标特征(S102),最后基于预设代价矩阵结合预设训练模型、目标特征及结果数据生成检测模型,通过检测模型对用户操作是否为本人行为进行检测(S103)。自适应筛选得到目标特征,不完全依赖人工经验而更注重行为数据本身,从而不会因特征选取失衡导致检测模型失效,还能提高检测模型的迭代频率以对未知非本人行为有效识别。针对应用场景样本不均衡的特殊性,采用预设代价矩阵充分考虑不同类型样本的损失代价,检测模型对用户操作的检测以追求最小损失代价为目的,有效维护用户及相应机构权益。

Description

用户操作检测方法及程序产品
本申请要求于2021年03月26日提交中国专利局、申请号为202110326785.1、申请名称为“用户操作检测方法及程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及金融科技(Fintech)技术领域,尤其涉及一种用户操作检测方法及程序产品。
背景技术
随着计算机技术以及互联网技术的快速发展,金融科技(Fintech)作为金融与科技深度融合的产物,目前正成为金融行业创新发展的热点。与此同时,越来越多的金融机构将原有需要线下办理的业务逐渐迁移至线上办理。此举措虽然方便了广大用户,但却提高了金融欺诈风险。例如,当用户于线上对账户进行相应操作时,由于金融机构无法对线上的操作行为是否为本人行为的事实如同线下一样进行面核,因而极有可能存在他人攻击或非本人意愿的操作,而这些非本人的操作则存在一定的金融欺诈风险。
现有技术中,通常依赖人工经验对用户线上操作行为所产生的数据进行加工处理并提取相应特征,然后基于所提取的特征形成相应的检测模型以对金融欺诈风险进行识别。由于特征的选取依赖于人工经验,可能会造成特征选取失衡的情况,导致模型失效。另一方面,依赖人工经验选取特征所形成的检测模型通常迭代频率较低,无法对新出现的新型欺诈行为进行识别。再者,由于反欺诈场景的特殊性,欺诈行为的样本占比远小于非欺诈行为样本的占比,但现有技术形成模型的方案中,对样本训练常用的方法有多数类样本欠采样、少数类样本过采样、少数类样本人工合成数据(如Synthetic Minority Oversampling Technique,SMOTE)方法等,或者是利用比如代价敏感学习算法于训练样本数据,都是对原有数据的样本比例进行了机械式改变,造成训练结果出现偏差,容易忽略将欺诈样本错判为正常样本而带来的损失代价,对用户以及金融机构造成利益危害。
可见,针对存在的金融欺诈风险,亟需一种克服现有技术缺陷的用户检测方法以对用户操作行为进行检测。
发明内容
本申请提供一种用户操作检测方法及程序产品,用于解决现有技术进行欺诈行为检测时人工选取特征失衡造成检测所用模型迭代频率较低并会导致检测失效,以及样本比例不均容易忽略将欺诈样本错判为正常样本而危害用户或金融机构利益的技术问题。
第一方面,本申请提供一种用户操作检测方法,所述检测方法包括:
获取多份行为数据,所述行为数据用于表征历史用户对账户进行在线操作产生的 数据;
根据所述行为数据以及各预设时间粒度确定原始特征列表,并根据原始特征列表以及预设特征筛选策略确定目标特征,所述预设特征筛选策略用于根据所述原始特征列表自适应筛选所述目标特征;
基于预设代价矩阵,根据预设训练模型、所述目标特征以及所述行为数据包含的结果数据生成检测模型,以通过所述检测模型检测用户操作是否为本人行为,所述结果数据用于表征所述在线操作是否为所述历史用户本人行为。
在一种可能的设计中,所述通过所述检测模型检测用户操作是否为本人行为,包括:
获取当前用户的当前行为数据,所述当前行为数据用于表征所述当前用户对账户进行在线操作产生的数据,所述当前行为数据的操作结果为成功;
将所述当前行为数据与所述原始特征列表进行关联,以根据关联后的当前行为数据以及所述检测模型生成判断结果,所述判断结果包括所述当前用户对账户的在线操作为所述当前用户本人行为或者非所述当前用户本人行为。
在一种可能的设计中,在所述生成所述判断结果之后,还包括:
若所述判断结果为非所述当前用户本人行为;
上报所述判断结果,并根据所述判断结果以及所述当前行为数据对所述检测模型进行迭代。
在一种可能的设计中,所述根据所述行为数据以及各预设时间粒度确定原始特征列表,包括:
获取每份行为数据中与所述结果数据映射对应的各维度数据,所述维度数据包括环境维度数据和用户操作维度数据;
分别确定每个环境维度数据和每个用户操作维度数据于各预设时间粒度中对应的特征值;
根据各维度数据对应的各特征值生成所述原始特征列表。
在一种可能的设计中,所述根据原始特征列表以及预设特征筛选策略确定目标特征,包括:
针对各维度数据的所述原始特征列表,根据预设业务场景从对应的所述原始特征列表的特征集合中指定预设特征,并将所述特征集合中除过所述预设特征之外的其余特征确定为剩余特征;
根据预设相关性算法确定所述预设特征与每个剩余特征两两之间的相关系数,并根据所述相关系数以及预设相关性阈值从所述剩余特征中筛选出候选特征;
将筛选出的所述候选特征指定为新的所述预设特征,并重复执行所述根据预设相关性算法确定所述预设特征与每个剩余特征两两之间的相关系数,并根据所述相关系数以及预设相关性阈值从所述剩余特征中筛选出候选特征的步骤,直到各维度数据的所述特征集合为空;
将针对每个维度数据被指定的全部的所述预设特征确定为每个维度数据对应的所述目标特征;
其中,所述预设特征筛选策略包括所述预设相关性算法以及所述筛选步骤。
在一种可能的设计中,所述基于预设代价矩阵,根据预设训练模型、所述目标特征以及所述行为数据包含的结果数据生成检测模型,包括:
根据所述目标特征对应的所述结果数据确定正样本和负样本;
根据所述正样本、所述负样本以及所述预设代价矩阵生成代价模型;
根据所述代价模型以及所述预设代价矩阵确定代价阈值;
基于所述预设训练模型,根据所述代价模型、所述代价阈值以及所述目标特征生成所述检测模型。
在一种可能的设计中,当所述预设训练模型为预设随机森林模型时,所述基于所述预设训练模型,根据所述代价模型、所述代价阈值以及所述目标特征生成所述检测模型,包括:
根据所述代价阈值以及预设信息增益算法确定每个目标特征的信息增益,并将所述信息增益最大的所述目标特征确定为当前***的***节点,以生成对应的子决策树;
重复上述***步骤,直到完成对所有目标特征的分类,以根据各子决策树形成的决策树得到所述检测模型。
在一种可能的设计中,当所述预设训练模型为预设回归算法、预设分类算法以及预设神经网络模型中的任一种时,所述基于所述预设训练模型,根据所述代价模型、所述代价阈值以及所述目标特征生成所述检测模型,包括:
根据所述预设代价矩阵、所述代价模型以及所述代价阈值确定所述预设训练模型的损失函数需满足的预设条件;
根据所述预设条件、所述代价函数以及所述预设训练模型的预测函数生成目标损失函数;
将所有目标特征以及所述目标特征对应的所述结果数据作为训练样本,通过预设算法获取以使得所述目标损失函数为最小值的训练参数,所述预设算法包括预设梯度下降算法或者预设最大似然估计算法;
根据所述训练参数以及所述预测函数得到分类器,以根据所述分类器得到所述检测模型。
在一种可能的设计中,在所述生成所述检测模型之后,还包括:
获取测试样本;
根据测试样本以及所述检测模型生成第一测试结果,并根据所述第一测试结果以及所述代价模型确定第一代价;
根据所述测试样本以及人工预测策略生成第二测试结果,并根据所述第二测试结果以及所述代价模型确定第二代价;
根据所述第一代价以及所述第二代价各自的对应值确定代价节约参数,以通过所述代价节约参数表征所述检测模型的代价节约水平。
第二方面,本申请还提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现第一方面所提供的任意一种可能的检测模型生成。
本申请提供一种用户操作检测方法及程序产品。首先获取多份表征历史用户对账户进行在线操作时产生的行为数据,然后根据所获取到的行为数据以及各预设时间粒度确定每份行为数据的原始特征列表,再进一步根据原始特征列表以及预设特征筛选 策略自适应筛选出用于生成检测模型所需的目标特征,最后基于筛选出的目标特征、预设代价矩阵以及行为数据所包含的结果数据生成检测模型,以通过检测模型对用户操作是否为本人行为进行检测,其中,结果数据用于表征历史用户在线操作是否为历史用户本人行为。本申请提供的检测模型生成方法,首先对于生成模型所需的目标特征是根据行为数据自适应筛选得到,不完全依赖于人工经验提取相应特征,在确定目标特征时更注重行为数据本身,从而不存在因特征选取失衡而导致模型失效的情况发生,并且能够使得所生成的检测模型具备高效迭代频率,进而能够对未知的新型操作行为有效识别。另外,针对应用场景样本不均衡的特殊性采用了预设代价矩阵以生成检测模型,充分考虑了不同类型样本的损失代价,生成的检测模型以追求最小损失代价为目的,有效维护用户以及金融机构的权益。
附图说明
图1为本申请实施例提供的一种应用场景示意图;
图2为本申请实施例提供的一种用户操作检测方法的流程示意图;
图3为本申请实施例提供的另一种用户操作检测方法的流程示意图;
图4为本申请实施例提供的再一种用户操作检测方法的流程示意图;
图5为本申请实施例提供的又一种用户操作检测方法的流程示意图;
图6为本申请实施例提供的又一种用户操作检测方法的流程示意图;
图7为本申请实施例提供的又一种用户操作检测方法的流程示意图;
图8为本申请实施例提供的又一种用户操作检测方法的流程示意图;
图9为本申请实施例提供的又一种用户操作检测方法的流程示意图;
图10为本申请实施例提供的一种用户操作检测装置的结构示意图;
图11为本申请实施例提供的一种特征筛选模块的结构示意图;
图12为本申请实施例提供的一种生成模块的结构示意图;
图13为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的方法和装置的例子。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
随着互联网技术以及金融科技的快速发展,越来越多的金融机构将原有需要线下办理的业务逐渐迁移至线上办理。此举措虽方便了广大用户,但却提高了金融欺诈风险。具体 地,当用户于线上对账户进行相应操作,由于金融机构无法对用户当前的线上操作行为是否为本人所为进行面核,因而则极有可能存在他人攻击或非本人意愿的操作,而这些非本人的操作则存在一定的金融欺诈风险。针对这种现象,现有技术中通常会采用相应模型对其进行有效识别。但现有技术对于这种识别模型的生成存在一些技术缺陷,例如,目前通常依赖人工对线上操作所产生的数据进行加工处理以提取相应特征,然后基于所提取的特征形成检测模型。这些特征的提取依赖于人工经验,会造成特征选取失衡,进而导致模型识别功能失效。另一方面,人工选取的特征通常会被作为固定特征使用,导致模型迭代频率较低,无法对新出现的未知的欺诈模式进行识别。再者,由于应用场景的特殊性,欺诈行为的样本通常远小于非欺诈行为样本,而现有技术中对样本在模型的生成过程中对于样本训练通常采用多数类样本欠采样、少数类样本过采样、少数类样本人工合成数据方法或者利用代价敏感学习算法等,这些方法会对原有数据的样本比例进行机械式改变,造成训练结果出现偏差,容易忽略将欺诈样本错判为正常样本所带来的损失代价,而这种损失代价会严重危害用户及金融机构的相关利益。
针对现有技术存在的上述技术缺陷,本申请提供了一种用户操作检测方法及程序产品。本申请提供的用户操作检测方法的发明构思在于:针对现有技术人工选取特征所存在的技术缺陷,本申请通过设置预设特征筛选策略,以对生成检测模型所需的目标特征根据行为数据自适应筛选得到,不完全依赖于人工经验而更注重行为数据本身,从而不存在因特征选取失衡而导致模型失效的情况发生,并且能够使得所生成的检测模型具备高效迭代频率,以对未知的操作行为有效识别。另外,针对应用场景样本不均衡的特殊性,本申请设置了预设代价矩阵,充分考虑了不同类型样本的损失代价,在利用检测模型检测用户操作时以追求最小损失代价为目的,进而有效维护用户以及金融机构权益。
以下,对本申请实施例的示例性应用场景进行介绍。
图1为本申请实施例提供的一种应用场景示意图。如图1所示,网络用于为终端设备11和服务器12之间提供通信链路的介质,网络可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。终端设备11和服务器12之间可以通过网络进行交互,以接收或发送消息。其中,终端设备11可以为被配置于用户端的任意终端,以使得用户通过该终端设备11对账户进行在线操作。服务器12即为可以执行本申请实施例提供的用户操作检测方法的用户操作检测装置所对应的电子设备,服务器12可以被配置于金融机构端的任意终端或者服务器中,服务器12与终端设备11之间通过网络可以进行信息交互,以使得服务器12能够执行本申请实施例提供的用户操作检测方法,以对终端设备11上进行的用户操作进行有效识别,有效维护用户以及金融机构的相关利益。
需要说明的是,本申请实施例对于上述描述的终端设备11的类型不作限定,例如终端设备11可以是计算机、智能手机、智能眼镜、智能手环、智能手表、平板电脑等等,图1中的终端设备11以智能手机为例示出。而服务器12也可以为服务器集群,对此,本实施例不作限定。
需要说明的是,上述应用场景仅仅是示意性的,本申请实施例提供的用户操作检测方法包括但不仅限于上述应用场景。
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过 程可能在某些实施例中不再赘述。
图2为本申请实施例提供的一种用户操作检测方法的流程示意图。如图2所示,本实施例提供的用户操作检测方法,包括:
S101:获取多份行为数据。
其中,行为数据用于表征历史用户对账户进行在线操作产生的数据。
当用户对账户进行在线操作,例如注册账户、密码更改、开户等操作时,都会相应地产生一系列数据,所产生的这些数据则被定义为行为数据,即行为数据用于表征历史用户对账户进行在线操作产生的数据。
可以理解的是,一位用户对账户进行在线操作所产生的数据视为一份,则本步骤可以获取过去的一预设时长内的多位用户对账户进行在线操作所产生的数据,即为获取多份行为数据,过去的预设时长内的用户即为历史用户。
另外需要说明的是,考虑到行为数据涉及历史用户的敏感信息,因而,可以对获取到的行为数据进行哈希(hash)脱敏,并内嵌有相应的校验工具,以保证行为数据的准确性。
S102:根据行为数据以及各预设时间粒度确定原始特征列表,并根据原始特征列表以及预设特征筛选策略确定目标特征。
其中,预设特征筛选策略用于根据原始特征列表自适应筛选目标特征。
在获取到多份行为数据之后,对行为数据进行相应分析处理以及通过预设特征筛选策略进行自适应筛选,以得到用于生成检测模型的目标特征。
在一种可能的设计中,本步骤S102中根据行为数据以及各预设时间粒度确定原始特征列表的可能实现方式如图3所示。图3为本申请实施例提供的另一种用户操作检测方法的流程示意图,如图3所示,本实施例提供的用户操作检测方法中,根据行为数据以及各预设时间粒度确定原始特征列表的方法,包括:
S201:获取每份行为数据中与结果数据映射对应的各维度数据。
其中,维度数据包括环境维度数据和用户操作维度数据。
对于历史用户进行在线操作所产生的行为数据而言,该行为数据中包含有结果数据,所包含的该结果数据用于表征该行为数据对应的历史用户对账户进行的在线操作是否为历史用户本人行为,显然,该结果数据存在两种情况,一种为历史用户本人行为,另一种为非历史用户本人行为。相应地,每份行为数据中还包括有与结果数据映射对应的其他多个维度的各维度数据,这些维度数据包括有环境维度数据和用户操作维度数据,例如一个结果数据对应存在相应的各环节维度数据和各用户操作维度数据。
其中,环境维度数据是指行为数据中用于表征历史用户的在线操作环境的相应数据,例如,环境维度数据可以包括但不仅限于历史用户对账户的操作时间、操作设备的相关信息、操作时的网络信息以及地理位置等。用户操作维度数据是指行为数据中用于表征历史用户对账户进行在线操作时与历史用户的身份信息相关的一些相应数据,例如,用户操作维度数据可以包括但不仅限于历史用户的身份证OCR(Optical Character Recognition,光学字符识别)的识别情况、历史用户操作时所使用的姓名、手机号、银行***、登录密码、输入错误密码的次数等等。从而,可以针对每份行为数据,获取该份行为数据中与结果数据相映射对应的各环节维度数据和用户操作维度数据。
S202:分别确定每个环境维度数据和每个用户操作维度数据于各预设时间粒度中对应 的特征值。
针对每份行为数据,获取到该行为数据中的各维度数据之后,进一步地,通过设置预设时间粒度,以确定每个维度数据于各预设时间粒度中的特征值。例如,针对历史用户操作时所使用的姓名这一用户操作维度数据而言,可以设置多个不同的时间粒度,例如昨天、近7天、近一个月、近半年,然后确定在每个时间粒度中,历史用户操作时所使用的姓名这一维度数据所出现次数、最终的操作结果为成功的占比等对应值,这些对应值即为该维度数据于预设时间粒度中对应的特征值。
相应地,针对每个环节维度数据以及每个用户操作维度数据,分别确定其于各预设时间粒度中对应的特征值,从而得到各维度数据对应的各特征值。由此可以看出,每个维度数据对应的特征值其实质为每个维度数据于预设时间粒度中的数据表现情况。
可以理解的是,预设时间粒度的具体设置可以根据实际工况中行为数据的具体情况进行相应设置,对此,本实施例不作限定。
S203:根据各维度数据对应的各特征值生成原始特征列表。
在确定了每个维度数据所对应的特征值后,即得到了各维度数据对应的各特征值,若将每个维度数据以及该维度数据于预设时间粒度中对应的特征值都通过一列表进行表示,该列表即为原始特征列表,从而,针对每份行为数据中的各维度数据,根据各维度数据对应的各特征值生成原始特征列表。另外,针对每份行为数据而言,每个维度数据、以及该维度数据在各预设时间粒度中对应的特征值都可以通过原始特征列表体现,因而,原始特征列表所表征的各特征即为每个维度数据以及该维度数据于各预设时间粒度中所对应的特征值,例如,姓名“张三”为一维度数据,近7天为一预设时间粒度,该维度数据于该预设时间粒度中出现了三次,即特征值为“3”,则“张三这个姓名于近7天出现了三次”即为特征。
本申请实施例提供的用户操作检测方法中根据行为数据以及各预设时间粒度确定原始特征列表,首先获取每份行为数据中与结果数据映射对应的各维度数据,其中,维度数据包括环境维度数据和用户操作维度数据,然后分别确定每个环境维度数据和每个用户操作维度数据于各预设时间粒度中对应的特征值,最后针对每份行为数据中的每个维度数据,根据各维度数据对应的各特征值生成了原始特征列表。通过设置各预设时间粒度,以得到行为数据包括的各维度数据自身的数据表现情况,从而针对每份行为数据中的每个维度数据,都可以通过原始特征列表对该维度数据于各预设时间粒度中的对应特征值进行直观性地表示,以便于后续的特征筛选。
图3所示实施例描述了根据行为数据以及预设时间粒度确定原始特征列表,进一步地,则可以基于原始特征列表以及预设特征筛选策略进行特征的自适应筛选,以将筛选出的目标特征用于后续的检测模型生成过程的相应步骤。其中,预设特征筛选策略是尽可能地结合产生行为数据时的实际业务场景,并基于原始特征列表中各特征之间的相关性以及各特征与结果数据之间的对应关系进行,因而能够使得预设特征筛选策略实现特征的自适应筛选,以更加注重行为数据本身以及使得目标特征跟随行为数据的变化而呈动态,不但减少对人工经验的依赖避免特征选取失衡造成模型失效,还能够使得检测模型发现未知的新型非本人操作行为以进行有效识别。
S103:基于预设代价矩阵,根据预设训练模型、目标特征以及行为数据包含的结果数 据生成检测模型,以通过检测模型检测用户操作是否为本人行为。
其中,结果数据用于表征在线操作是否为历史用户本人行为。
筛选出目标特征之后,进一步地,根据目标特征、预设代价矩阵以及行为数据中所包含的结果数据生成检测模型,以对用户于账户上的操作是否为账户所属用户本人行为进行检测。其中,结果数据是指历史用户对账户的在线操作是否为历史用户本人行为的相应结果,即结果数据表征两种情况中的一种,这两种情况中的一种为历史用户对账户进行在线操作为历史用户本人行为,另一种则为历史用户对账户进行在线操作非历史用户本人行为。
而预设代价矩阵中的各元素用于表征对正确识别非本人行为和本人行为以及错误识别非本人行为和本人行为后对用户以及金融机构所造成的损失代价。在本申请利用检测模型对用户操作进行检测场景中,若检测模型识别到非本人行为,则将识别到的非本人行为的情况均视为欺诈行为。
通过设置预设代价矩阵,充分考虑了不同类型样本的损失代价,以追求最小损失代价为目的,而并非现有技术中以追求模型的识别准确率为目的而对样本不均衡的应用场景通过各种方法对样本比例进行机械式改变,以造成容易忽略将欺诈行为错判为非欺诈行为所带来的损失代价。而在实际工况中,往往将非欺诈行为识别为欺诈行为所造成的损失代价要小于将欺诈行为识别为非欺诈行为所造成的损失代价,因此,本申请实施例提供的用户操作检测方法中的检测模型在对欺诈行为进行识别的实现过程充分考虑各不同类型样本的损失代价,以追求最小损失代价为目的,能够有效维护用户以及金融机构的相关权益。
从而,基于预设代价矩阵,将筛选出的目标特征以及对应的结果数据作为预设训练模型的训练样本,对预设训练模型进行相应训练,以得到检测模型,完成检测模型的生成过程。
在生成检测模型之后,进一步还可以包括如图4所示步骤,以利用检测模型对用户操作是否为本人行为进行检测,并及时对检测模型进行下次迭代。图4为本申请实施例提供的再一种用户操作检测方法的流程示意图。如图4所示,本实施例提供的用户操作检测方法中通过检测模型检测用户操作是否为本人行为,包括:
S301:获取当前用户的当前行为数据。
其中,当前行为数据用于表征当前用户对账户进行在线操作产生的数据,当前行为数据的操作结果为成功。
例如,获取任意的当前用户对其账户进行在线操作产生的数据,即获取当前用户的当前行为数据。另外,该当前用户的当前行为数据的操作结果为成功,可以理解为,加入当前用户在线操作的业务为开户申请,则相应的操作结果即为开户申请成功。相应地,则表明该当前用户所进行的在线操作的实际情况为本人行为,即为非欺诈行为。
S302:将当前行为数据与原始特征列表进行关联,以根据关联后的当前行为数据以及检测模型生成判断结果。
其中,判断结果包括当前用户对账户的在线操作为当前用户本人行为或者非当前用户本人行为。
在获取到当前用户的当前行为数据之后,将当前行为数据与原始特征列表进行关联,以将当前行为数据按照原始特征列表对应的各维度数据以及各预设时间粒度进行数据处理。并将关联后的当前行为数据输入至检测模型,以输出判断结果,即根据关联后的当前 行为数据以及检测模型生成判断结果。通过判断结果表明用户操作检测的检测结果。
具体地,利用检测模型以及关联后的当前行为数据,对当前用户对应账户的用户操作是否为当前用户的本人行为还是非当前用户的本人行为进行判断,即利用检测模型对当前用户操作进行检测,得到的检测结果即为生成的判断结果。可见,生成的判断结果包括当前用户对账户的在线操作为当前用户本人行为和非当前用户本人行为中的一种,实现对用户操作是否为本人行为的检测。
其中,经过检测模型的判断,若判断结果为非当前用户本人行为,则表明检测模型判定当前用户的在线操作为欺诈行为,上报判断结果,并根据判断结果以及当前行为数据对检测模型进行迭代。其中,由于检测模型将当前用户的在线操作判断为欺诈行为,则表明检测模型可能存在判断失误等情况,因而将该判断结果上报给相应分析人员以进一步分析,并利用判断结果以及当前行为数据对检测模型进行下次迭代,以提高检测模型的识别准确率以及加快迭代频率。
本申请实施例提供的用户操作检测方法,首先获取多份行为数据,行为数据用于表征历史用户对账户进行在线操作时所产生的数据,然后根据所获取到的行为数据以及各预设时间粒度确定每份行为数据的原始特征列表,再进一步根据原始特征列表以及预设特征筛选策略自适应筛选出用于生成检测模型所需的目标特征,最后基于预设代价矩阵,根据预设训练模型、目标特征以及行为数据包含的结果数据首先生成检测模型,再通过检测模型检测用户操作是否为本人行为。其中,生成模型所需的目标特征是根据行为数据自适应筛选得到,不完全依赖于人工经验提取相应特征,使得目标特征的确定更注重行为数据本身,从而不存在因特征选取失衡而导致检测模型失效的情况发生,并且能够使得所生成的检测模型具备高效迭代频率,进而能够对未知的新型的用户操作有效识别。另外,针对反应用场景样本不均衡的特殊性采用了预设代价矩阵以生成检测模型,充分考虑了不同类型样本的损失代价,生成的检测模型在对用户操作进行检测时以追求最小损失代价为目的,有效维护用户以及金融机构的相关权益。
在一种可能的设计中,步骤S102中根据原始特征列表以及预设特征筛选策略确定目标特征的可能实现方式如图5所示。图5为本申请实施例提供的又一种用户操作检测方法的流程示意图。如图5所示,本实施例包括:
S401:针对各维度数据的原始特征列表,根据预设业务场景从对应的原始特征列表的特征集合中指定预设特征,并将特征集合中除过预设特征之外的其余特征确定为剩余特征。
通过前述图3实施例的步骤S203中的描述可知,原始特征列表所表征的各特征即为每个维度数据以及该维度数据于各预设时间粒度中所对应的特征值,可见,针对每个维度数据的原始特征列表,其可以表征有多个特征,该多个特征则形成该维度数据对应的原始特征列表的特征集合。
因而,可以针对每个维度数据的原始特征列表,即可以针对各维度数据的特征列表,根据预设业务场景从该对应的原始特征列表的特征集合中指定预设特征,相应地,在特征集合中除过所指定的预设特征之外的其余特征则都被确定为剩余特征。其中,预设业务场景由实际工况所涉及的实际业务决定,对此,本实施例不作限定。其中,根据预设业务场景指定的预设特征是最能够表征该维度数据于各预设时间粒度的数据表现情况的相应特征。
另外,预设特征的数量也可以根据实际工况设置,例如可以为1至3个。
S402:根据预设相关性算法确定预设特征与每个剩余特征两两之间的相关系数,并根据相关系数以及预设相关性阈值从剩余特征中筛选出候选特征。
其中,预设特征筛选策略包括预设相关性算法。
在指定了预设特征之后,进一步,确定每个预设特征与原始特征列表的特征集合中除过预设特征之外的每个剩余特征两两之间的相关系数,例如,可以通过预设相关性算法确定每个预设特征与剩余特征两两之间的相关系数,其中,该相关系数可以为Pearson相关系数(Pearson Correlation Coefficient),预设相关性算法即为确定该相关系数的相应算法,对此,本实施例不作限定。
在得到各对应的相关系数之后,进一步地,根据该相关系数和预设相关性阈值从剩余特征中筛选出候选特征。具体地,设置预设相关阈值,将得到的相关系数与预设相关阈值进行比较,剔除相关系数大于预设相关阈值的该剩余特征,将未被剔除的剩余特征按照相关系数的大小顺序进行排序,以确定出相关系数最大的该未被剔除的剩余特征,将其确定为候选特征,从而实现根据相关系数以及预设相关性阈值从剩余特征中筛选出候选特征,完成候选特征的筛选步骤。
S403:将筛选出的候选特征指定为新的预设特征,并重复执行根据预设相关性算法确定预设特征与每个剩余特征两两之间的相关系数,并根据相关系数以及预设相关性阈值从剩余特征中筛选出候选特征的步骤,直到各维度数据的特征集合为空。
将筛选出的候选特征指定为新的预设特征,然后在重复执行步骤S402,即循环执行步骤S402筛选候选特征的筛选步骤,经过多轮如步骤S402所示的筛选步骤后,即重复步骤S402筛选候选特征的筛选步骤,直到各维度数据的特征集合为空。
S404:将针对每个维度数据被指定的全部的预设特征确定为每个维度数据对应的目标特征。
针对每个维度数据都执行上述步骤S401至步骤S403,将针对每个维度数据被指定的全部预设特征确定为该维度数据对应的目标特征,完成对目标特征的自适应筛选。
需要说明的是,预设特征筛选策略包括预设相关性算法以及上述筛选过程中的筛选步骤。
本申请实施例提供的用户操作检测方法,通过设置的预设特征筛选策略能够尽可能地结合产生行为数据时的实际业务场景,并基于原始特征列表中各特征之间的相关性以及各特征与结果数据之间的对应关系进行,因而能够使得预设特征筛选策略实现特征的自适应筛选,以更加注重行为数据本身,减少对人工经验的依赖,进而避免特征选取失衡以及造成检测模型迭代较慢的情况发生。
图6为本申请实施例提供的又一种用户操作检测方法的流程示意图。如图6所示,本实施例,包括:
S501:获取多份行为数据。
其中,行为数据用于表征历史用户对账户进行在线操作产生的数据。
S502:根据行为数据以及各预设时间粒度确定原始特征列表,并根据原始特征列表以及预设特征筛选策略确定目标特征。
其中,预设特征筛选策略用于根据原始特征列表自适应筛选目标特征。
步骤S501至步骤S502的具体实现方式和原理以及技术效果与前述实施例中的步骤S101至步骤S102的具体实现方式和原理以及技术效果相类似,在此不再赘述。
S503:根据目标特征对应的结果数据确定正样本和负样本。
在筛选出目标特征之后,将目标特征所对应的行为数据根据目标特征所对应的结果数据进行正负样本确定。具体地,将目标特征所对应的结果数据表征的历史用户的在线操作非历史用户本人行为的目标特征确定为正样本(positive sample),相应地,将目标特征所对应的结果数据表征的历史用户的在线操作为历史用户本人行为的目标特征确定为负样本(negative sample)。若以y表示结果数据,则正样本可以采用表达式y i=1进行表示,负样本可以采用表达式y i=0进行表示,其中i的取值可以为行为数据的份数,为任意整数。
另外可以理解的是,行为数据中的结果数据为历史用户对账户进行在线操作的实际情况,因而,实际为正样本即用y i=1进行表示,实际为负样本即用y i=1进行表示。
进一步地,假设以c表示预测的结果数据,则预测为正样本以c i=1进行表示,相应地,预测为负样本则以c i=0进行表示,从而针对则可以通过如下表1表示预设代价矩阵:
表1
Figure PCTCN2021142701-appb-000001
其中,
Figure PCTCN2021142701-appb-000002
表示正确预测成欺诈样本的代价,
Figure PCTCN2021142701-appb-000003
表示错误预测成欺诈样本的代价,
Figure PCTCN2021142701-appb-000004
表示错误预测成正常样本的代价,
Figure PCTCN2021142701-appb-000005
表示正确预测成正常样本的代价。
在金融业务的反欺诈行为场景中,不同的预测结果所造成的损失代价是完全不同的。具体地,表1表示的预设代价矩阵具有如下特征:
第一,通常正确预测的损失代价要小于错误预测;
第二,将欺诈行为错误预测成非欺诈行为(即为本人行为的正常行为)的损失代价由于对用户和金融机构均存在潜在的资金损失的风险,而将非欺诈行为预测成欺诈行为所造成的损失代价还可以借助其他辅助手段进行相应补偿,因此
Figure PCTCN2021142701-appb-000006
即实际为欺诈行为但被错误预测为非欺诈行为所造成的损失代价要大于实际为非欺诈行为但被错误预测为欺诈行为所造成的损失代价。
结合以上两点,针对预设代价矩阵,则各不同类型的结果所造成的损失代价满足如下关系式(1):
Figure PCTCN2021142701-appb-000007
S504:根据正样本、负样本以及预设代价矩阵生成代价模型。
在确定了正样本、负样本以及预设代价矩阵之后,进一步地,在考虑预设代价矩阵的前提下,结合正负样本可以生成一代价模型,以通过代价模型来表征不同类型的结果所造成的损失代价之间的函数关系,代价模型可以通过如下关系式(2)表示:
Figure PCTCN2021142701-appb-000008
其中,T表示包含正样本和负样本的训练集,i表示训练集中任意样本。
S505:根据代价模型以及预设代价矩阵确定代价阈值。
根据代价模型以及预设代价矩阵设置代价阈值,假设以C(T)表示代价阈值,即最大的 损失代价,换言之,C(T)表示如不使用检测模型人为将所有训练集判定为欺诈行为或非欺诈行为所造成的损失代价,f j(T)表示使用检测模型将所有训练集均预测为第j类的损失代价,j的取值可以为0和1,当j取值为0时,即表示都预测为负样本(非欺诈行为),相应地,当j取值为1时,即表示都预测为正样本(欺诈行为),因而,将预设代价矩阵中的相应取值带入至代价函数(2)得到如下关系式(3)和(4),以分别表示均预测为负样本和均预测为正样本时的代价函数:
Figure PCTCN2021142701-appb-000009
Figure PCTCN2021142701-appb-000010
其中,C(T)=min{f 0(T),f 1(T)},从而可以确定代价阈值与代价函数之间的关系,以根据代价函数得到代价阈值C(T)。
S506:基于预设训练模型,根据代价模型、代价阈值以及目标特征生成检测模型。
在完成代价模型的生成以及确定了相应代价阈值之后,进一步地,则基于预设训练模型完成相应的模型训练工作,从而生成检测模型。
其中,预设训练模型可以为各种不同的分类模型,例如预设随机森林模型、预设回归算法(比如Logistic回归分类器)、预设分类算法(比如Support Vector Machine,SVM,支持向量机)、预设神经网络模型等等,不同的预设分类模型则可以采用不同的策略以及相应的实现手段对该预设分类模型进行训练,以生成检测模型。
例如,当预设训练模型为预设随机森林模型时,采用的策略即为基于信息增益构建决策树,以生成检测模型。在一种可能的设计中,当预设训练模型为预设随机森林模型,步骤S506可能的实现方式如图7所示,图7为本申请实施例提供的又一种用户操作检测方法的流程示意图。如图7所示,本实施例包括:
S601:根据代价阈值以及预设信息增益算法确定每个目标特征的信息增益,并将信息增益最大的目标特征确定为当前***的***节点,以生成对应的子决策树。
采用预设随机森林模型生成检测模型的过程实质为构建决策树的过程,其中,针对每次***,首先根据代价阈值以及预设信息增益算法遍历训练集中的各目标特征,以得到每个目标特征的信息增益。之后,将其中信息增益最大的目标特征确定为当前本次***的分类节点,完成当前本次的***,以生成与这个***节点对应的子决策树。
其中,预设信息增益算法可以通过如下所示的关系式(5)表示:
Figure PCTCN2021142701-appb-000011
其中,T 1表示训练集T中目标特征F i取值小于等于k的子集,T 2表示训练集T中目标特征F i取值大于k的子集,k为本次的***节点(取值为自然数)。
S602:重复上述***步骤,直到完成对所有目标特征的分类,以根据各子决策树形成的决策树得到检测模型。
采用步骤S601对每个目标特征进行分类,即重复执行上述分类步骤,直到完成对所有目标特征的分类,完成决策树的构建,从而将各子决策树所形成的该决策树确定为检测模型,即根据各子决策树形成的决策树得到检测模型,完成检测模型的生成过程。
本申请实施例提供的用户操作检测方法,当预设训练模型为预设随机森林模型时,基于信息增益寻找最佳***节点,由于不同的错判造成不同的损失代价,因而针对每次***,遍历所有的目标特征,以将信息增益最大的目标特征作为***节点,而不考虑将数据的不 纯度下降最大作为***节点,从而可以达到以追求最小损失代价为目的的策略动机,以有效维护用户以及金融机构的相关权益。
而当预设训练模型为预设回归算法、预设分类算法以及预设神经网络模型中的任一种时,采用的策略即为基于最小化损失函数的训练参数构建训练模型,以生成检测模型。在一种可能的设计中,当预设训练模型为预设回归算法、预设分类算法以及预设神经网络模型中的任一种,步骤S506可能的另一种实现方式如图8所示,图8为本申请实施例提供的又一种用户操作检测方法的流程示意图。如图8所示,本实施例包括:
S701:根据预设代价矩阵、代价模型以及代价阈值确定预设训练模型的损失函数需满足的预设条件。
本实施例中以预设回归算法,例如Logistic回归分类器为例进行实施例的描述。
对于Logistic回归分类器而言,其预测函数为sigmoid函数,即可以通如下关系式(6)表示:
Figure PCTCN2021142701-appb-000012
其中,h θ(X i)∈(0,1)表示预设函数,可见,Logistic回归的实质为寻找到合适的参数θ,以使预测函数最小化。
另外,针对预测函数的损失函数J(θ)可以如下关系式(7)表示:
J i(θ)=y ilog(h θ(X i))+(1-y i)log(1-h θ(X i))   (7)
针对反欺诈行为的识别场景,由于不同的错判结果会造成不同的损失代价,因此,上述关系式(7)无法满足预设代价矩阵的定义。因此,需根据预设代价矩阵、代价模型以及代价阈值确定关系是(7)需要满足的预设条件。具体地,当y i≈h θ(X i)时,J i(θ)=0,当y i≈1-h θ(X i)时,J i(θ)=inf。因此,上述损失函数需要满足以下如关系式(8)所示的预设条件:
Figure PCTCN2021142701-appb-000013
S702:根据预设条件、代价函数以及预设训练模型的预测函数生成目标损失函数。
结合本申请实施例的应用场景,在确定了损失函数需满足的预设条件之后,根据预设条件、关系式(2)以及关系式(6)中的预测函数得到目标损失函数,即根据预设条件、代价函数以及预设训练模型的预测函数生成目标损失函数,生成的目标损失函数如关系式(9)所示:
Figure PCTCN2021142701-appb-000014
S703:将所有目标特征以及目标特征对应的结果数据作为训练样本,通过预设算法获取以使得目标损失函数为最小值的训练参数。
其中,预设算法包括预设梯度下降算法或者预设最大似然估计算法。
将所有目标特征以及目标特征对应的结果数据作为训练样本,通过预设算法,例如预 设梯度下降算法或预设最大似然估计算法找到使得关系式(9)表示目标损失函数J(θ)取得最小值的训练参数θ,以完成对预设训练模型的训练。
S704:根据训练参数以及预测函数得到分类器,以根据分类器得到检测模型。
在得到使得目标损失函数取得最小值的训练参数之后,即完成对预设训练模型,即本实施例中列举的Logistic回归分类器的分类训练,从而将根据训练参数以及预测函数所得到的分类器确定为检测模型,即以根据分类器得到检测模型,完成对检测模型的生成过程。
本申请实施例提供的用户操作检测方法,当预设训练模型为预设回归算法、预设分类算法以及预设神经网络模型中的任一种时,基于使得反欺诈场景下的目标损失函数最小化时的最优训练参数,以完成对训练模型的训练过程,进而根据所得到的训练参数以及预测函数得到分类器,将该分类器确定为检测模型,完成检测模型的生成。从而使得得到的检测模型对于用户操作的检测是以追求最小损失代价为目的的识别策略,以有效维护用户以及金融机构的相关权益。
S507:对生成的检测模型通过测试样本进行测试,以确定检测模型的代价节约水平。
在生成检测模型之后,进一步地,还可以对检测模型进行测试评估。其中,本申请实施例提供的对于检测模型的评估不同于常用的对模型的精确度、召回率等性能的评估,而是针对检测模型的实际应用场景,以最小损失代价为指标,对检测模型的代价节约水平进行评估。
在一种可能的设计中,本步骤S507可能的实现方式如图9所示。图9为本申请实施例提供的又一种用户操作检测方法的流程示意图。如图9所示,本实施例,包括:
S801:获取测试样本。
首先获取测试样本,测试样本可以理解为对账户进行相应在线操作,以在操作过程中产生相应数据,将这些数据以及金融机构针对这些数据的操作结果作为测试样本,以对检测模型进行相应测试。
S802:根据测试样本以及检测模型生成第一测试结果,并根据第一测试结果以及代价模型确定第一代价。
基于获取到的测试样本,将测试样本作为检测模型的待识别数据,以对其进行识别,得到的识别结果即为根据测试样本以及检测模型生成的第一测试结果。然后根据所生成的第一测试结果以及前述实施例中关系式(2)所示的代价模型,生成该测试样本的第一代价,例如以f(Test)表示。
S803:根据测试样本以及人工预测策略生成第二测试结果,并根据第二测试结果以及代价模型确定第二代价。
与步骤S802相类似地,首先根据测试样本以及人工预测策略生成第二测试结果,并根据第二测试结果以及代价模型确定第二代价,例如以C(Test)表示。其中,本步骤与步骤S802的区别在于,本步骤采用人工策略对测试样本是否为欺诈行为进行预测,以得到第二测试结果。
S804:根据第一代价以及第二代价各自的对应值确定代价节约参数,以通过代价节约参数表征检测模型的代价节约水平。
在得到第一代价和第二代价各自的对应值后,则可以通过例如关系式(10)所表示的代价节约参数的确定方式来确定相应的代价节约参数,关系式(10)如下所示:
Figure PCTCN2021142701-appb-000015
其中,CS表示代价节约参数。
在得到代价节约参数后,则可以通过代价节约参数来表征检测模型的代价节约水平,即当代价节约参数(CS)的取值越大,表明使用该检测模型所节约的代价越高,换言之该检测模型的代价节约水平越高。
本申请实施例提供的用户操作检测方法,首先获取多份行为数据,然后根据所获取到的行为数据以及各预设时间粒度确定每份行为数据的原始特征列表,再进一步根据原始特征列表以及预设特征筛选策略自适应筛选出用于生成检测模型所需的目标特征,最后基于预设代价矩阵根据不同的预设训练模型,采用不同的训练策略,以结合目标特征以及行为数据包含的结果数据对预设训练模型进行训练,生成检测模型。并在生成检测模型后,还获取测试样本,以通过测试样本对检测模型的代价节约水平进行评估。其中,在筛选目标特征时通过预设特征筛选算法以实现根据行为数据的自适应筛选,不完全依赖于人工经验提取相应特征,更注重行为数据本身,以保证不存在因特征选取失衡而导致模型失效的情况发生,并且能够使得所生成的检测模型具备高效迭代频率,进而能够对未知的新型账户操作行为进行有效识别。另外,针对应用场景样本不均衡的特殊性设置预设代价矩阵,充分考虑了不同类型样本的损失代价,生成的检测模型在进行用户操作检测时以追求最小损失代价为目的,有效维护用户以及金融机构的相关权益。再者,针对检测模型的实际应用场景,未采用常用模型中对模型精确度、召回率等性能的评估策略,而是以最小损失代价为评价指标,以对检测模型的代价节约水平进行评估,进一步提高本申请实施例所提供的检测模型的实用性和可行性。
下述为本申请装置实施例,可以用于执行本申请对应的方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请对应的方法实施例。
图10为本申请实施例提供的一种用户操作检测装置的结构示意图。如图10所示,本实施例提供的用户操作检测装置900,包括:
获取模块901,用于获取多份行为数据.
其中,行为数据用于表征历史用户对账户进行在线操作产生的数据。
特征筛选模块902,用于根据行为数据以及各预设时间粒度确定原始特征列表,并根据原始特征列表以及预设特征筛选策略确定目标特征。
其中,预设特征筛选策略用于根据原始特征列表自适应筛选目标特征。
处理模块903,用于基于预设代价矩阵,根据预设训练模型、目标特征以及行为数据包含的结果数据生成检测模型,通过检测模型检测用户操作是否为本人行为。
其中,结果数据用于表征在线操作是否为历史用户本人行为。
在一种可能的设计中,本实施例提供的用户操作检测装置900中的处理模块903,具体用于:
获取当前用户的当前行为数据,当前行为数据用于表征当前用户对账户进行在线操作产生的数据,当前行为数据的操作结果为成功;
将当前行为数据与原始特征列表进行关联,以根据关联后的当前行为数据以及检测模型生成判断结果,判断结果包括当前用户对账户的在线操作为当前用户本人行为或者非当前用户本人行为。
在一种可能的设计中,用户操作检测装置900,还包括:
行为确定模块,用于若判断结果为非当前用户本人行为;
上报与迭代模块,用于上报判断结果,并根据判断结果以及当前行为数据对检测模型进行迭代。
在一种可能的设计中,特征筛选模块902包括如图11所示的各自子模块,图11为本申请实施例提供的一种特征筛选模块的结构示意图。如图1所示,本实施例提供的特征筛选模块902包括:第一处理子模块9021和第二处理子模块9022。
第一处理子模块9021,用于:
获取每份行为数据中与结果数据映射对应的各维度数据,维度数据包括环境维度数据和用户操作维度数据;
分别确定每个环境维度数据和每个用户操作维度数据于各预设时间粒度中对应的特征值;
根据各维度数据对应的各特征值生成原始特征列表。
在一种可能的设计中,第二处理子模块9022,用于;
针对各维度数据的原始特征列表,根据预设业务场景从对应的原始特征列表的特征集合中指定预设特征,并将特征集合中除过预设特征之外的其余特征确定为剩余特征;
根据预设相关性算法确定预设特征与每个剩余特征两两之间的相关系数,并根据相关系数以及预设相关性阈值从剩余特征中筛选出候选特征;
将筛选出的候选特征指定为新的预设特征,并重复执行根据预设相关性算法确定预设特征与每个剩余特征两两之间的相关系数,并根据相关系数以及预设相关性阈值从剩余特征中筛选出候选特征的步骤,直到各维度数据的特征集合为空;
将针对每个维度数据被指定的全部的预设特征确定为每个维度数据对应的目标特征;
其中,预设特征筛选策略包括预设相关性算法以及筛选步骤。
在一种可能的设计中,图12为本申请实施例提供的一种处理模块的结构示意图。如图12所示,本实施例提供的处理模块903,包括:
第三处理子模块9031,用于根据目标特征对应的结果数据确定正样本和负样本;
第四处理子模块9032,用于根据正样本、负样本以及预设代价矩阵生成代价模型;
第五处理子模块9033,用于根据代价模型以及预设代价矩阵确定代价阈值;
第六处理子模块9034,用于基于预设训练模型,根据代价模型、代价阈值以及目标特征生成检测模型。
在一种可能的设计中,当预设训练模型为预设随机森林模型时,第六处理子模块9034,具体用于:
根据代价阈值以及预设信息增益算法确定每个目标特征的信息增益,并将信息增益最大的目标特征确定为当前***的***节点,以生成对应的子决策树;
重复上述***步骤,直到完成对所有目标特征的分类,以根据各子决策树形成的决策树得到检测模型。
在一种可能的设计中,当预设训练模型为预设回归算法、预设分类算法以及预设神经网络模型中的任一种时,第六处理子模块9034,具体用于:
根据预设代价矩阵、代价模型以及代价阈值确定预设训练模型的损失函数需满足的预 设条件;
根据预设条件、代价函数以及预设训练模型的预测函数生成目标损失函数;
将所有目标特征以及目标特征对应的结果数据作为训练样本,通过预设算法获取以使得目标损失函数为最小值的训练参数,预设算法包括预设梯度下降算法或者预设最大似然估计算法;
根据训练参数以及预测函数得到分类器,以根据分类器得到检测模型。
在一种可能的设计中,用户操作检测装置900,还包括:测试模块;该测试模块,具体用于:
获取测试样本;
根据测试样本以及检测模型生成第一测试结果,并根据第一测试结果以及代价模型确定第一代价;
根据测试样本以及人工预测策略生成第二测试结果,并根据第二测试结果以及代价模型确定第二代价;
根据第一代价以及第二代价各自的对应值确定代价节约参数,以通过代价节约参数表征检测模型的代价节约水平。
值得说明的,上述各实施例提供的用户操作检测装置,可用于执行上述任一实施例提供的用户操作检测方法中的相应步骤,具体实现方式和技术效果类似,这里不再赘述。
本申请所提供的上述各装置实施例仅仅是示意性的,其中的模块划分仅仅是一种逻辑功能划分,实际实现时可以有另外的划分方式。例如多个模块可以结合或者可以集成到另一个***。各个模块相互之间的耦合可以是通过一些接口实现,这些接口通常是电性通信接口,但是也不排除可能是机械接口或其它的形式接口。因此,作为分离部件说明的模块可以是或者也可以不是物理上分开的,既可以位于一个地方,也可以分布到同一个或不同设备的不同位置上。
图13为本申请实施例提供的一种电子设备的结构示意图。如图13所示,该电子设备1000可以包括:至少一个处理器1001以及存储器1002。图13以一个处理器为例示出。
存储器1002,用于存放程序。具体地,程序可以包括程序代码,程序代码包括计算机操作指令。
存储器1002可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
处理器1001配置为用于执行存储器1002存储的计算机程序,以实现以上各方法实施例中用户操作检测方法中的各步骤。
其中,处理器1001可能是一个中央处理器(central processing unit,简称为CPU),或者是特定集成电路(application specific integrated circuit,简称为ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路。
可选地,存储器1002既可以是独立的,也可以跟处理器1001集成在一起。当存储器1002是独立于处理器1001之外的器件时,该电子设备1000,还可以包括:
总线1003,用于连接处理器1001以及存储器1002。总线可以是工业标准体系结构(industry standard architecture,简称为ISA)总线、外部设备互连(peripheral component,PCI)总线或扩展工业标准体系结构(extended industry standard architecture,EISA)总线 等。总线可以分为地址总线、数据总线、控制总线等,但并不表示仅有一根总线或一种类型的总线。
可选的,在具体实现上,如果存储器1002和处理器1001集成在一块芯片上实现,则存储器1002和处理器1001可以通过内部接口完成通信。
本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁盘或者光盘等各种可以存储程序代码的介质,具体的,该计算机可读存储介质中存储有计算机程序,当上述电子设备的至少一个处理器执行该计算机程序时,该电子设备执行上述的各种实施方式提供的用户操作检测方法的各个步骤。
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在可读存储介质中。电子设备的至少一个处理器可以从可读存储介质读取该计算机程序,至少一个处理器执行该计算机程序使得设备实施上述的各种实施方式提供的用户操作检测方法的各个步骤。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由权利要求书指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求书来限制。

Claims (10)

  1. 一种用户操作检测方法,其特征在于,所述检测方法包括:
    获取多份行为数据,所述行为数据用于表征历史用户对账户进行在线操作产生的数据;
    根据所述行为数据以及各预设时间粒度确定原始特征列表,并根据原始特征列表以及预设特征筛选策略确定目标特征,所述预设特征筛选策略用于根据所述原始特征列表自适应筛选所述目标特征;
    基于预设代价矩阵,根据预设训练模型、所述目标特征以及所述行为数据包含的结果数据生成检测模型,以通过所述检测模型检测用户操作是否为本人行为,所述结果数据用于表征所述在线操作是否为所述历史用户本人行为。
  2. 根据权利要求1所述的用户操作检测方法,其特征在于,所述通过所述检测模型检测用户操作是否为本人行为,包括:
    获取当前用户的当前行为数据,所述当前行为数据用于表征所述当前用户对账户进行在线操作产生的数据,所述当前行为数据的操作结果为成功;
    将所述当前行为数据与所述原始特征列表进行关联,以根据关联后的当前行为数据以及所述检测模型生成判断结果,所述判断结果包括所述当前用户对账户的在线操作为所述当前用户本人行为或者非所述当前用户本人行为。
  3. 根据权利要求2所述的用户操作检测方法,其特征在于,在所述生成所述判断结果之后,还包括:
    若所述判断结果为非所述当前用户本人行为;
    上报所述判断结果,并根据所述判断结果以及所述当前行为数据对所述检测模型进行迭代。
  4. 根据权利要求1-3任一项所述的用户操作检测方法,其特征在于,所述根据所述行为数据以及各预设时间粒度确定原始特征列表,包括:
    获取每份行为数据中与所述结果数据映射对应的各维度数据,所述维度数据包括环境维度数据和用户操作维度数据;
    分别确定每个环境维度数据和每个用户操作维度数据于各预设时间粒度中对应的特征值;
    根据各维度数据对应的各特征值生成所述原始特征列表。
  5. 根据权利要求1-4任一项所述的用户操作检测方法,其特征在于,所述根据原始特征列表以及预设特征筛选策略确定目标特征,包括:
    针对各维度数据的所述原始特征列表,根据预设业务场景从对应的所述原始特征列表的特征集合中指定预设特征,并将所述特征集合中除过所述预设特征之外的其余特征确定为剩余特征;
    根据预设相关性算法确定所述预设特征与每个剩余特征两两之间的相关系数,并根据所述相关系数以及预设相关性阈值从所述剩余特征中筛选出候选特征;
    将筛选出的所述候选特征指定为新的所述预设特征,并重复执行所述根据预设相关性算法确定所述预设特征与每个剩余特征两两之间的相关系数,并根据所述相关系数以及预设相关性阈值从所述剩余特征中筛选出候选特征的步骤,直到各维度数据的 所述特征集合为空;
    将针对每个维度数据被指定的全部的所述预设特征确定为每个维度数据对应的所述目标特征;
    其中,所述预设特征筛选策略包括所述预设相关性算法以及所述筛选步骤。
  6. 根据权利要求1-5任一项所述的用户操作检测方法,其特征在于,所述基于预设代价矩阵,根据预设训练模型、所述目标特征以及所述行为数据包含的结果数据生成检测模型,包括:
    根据所述目标特征对应的所述结果数据确定正样本和负样本;
    根据所述正样本、所述负样本以及所述预设代价矩阵生成代价模型;
    根据所述代价模型以及所述预设代价矩阵确定代价阈值;
    基于所述预设训练模型,根据所述代价模型、所述代价阈值以及所述目标特征生成所述检测模型。
  7. 根据权利要求6所述的用户操作检测方法,其特征在于,当所述预设训练模型为预设随机森林模型时,所述基于所述预设训练模型,根据所述代价模型、所述代价阈值以及所述目标特征生成所述检测模型,包括:
    根据所述代价阈值以及预设信息增益算法确定每个目标特征的信息增益,并将所述信息增益最大的所述目标特征确定为当前***的***节点,以生成对应的子决策树;
    重复上述***步骤,直到完成对所有目标特征的分类,以根据各子决策树形成的决策树得到所述检测模型。
  8. 根据权利要求6所述的用户操作检测方法,其特征在于,当所述预设训练模型为预设回归算法、预设分类算法以及预设神经网络模型中的任一种时,所述基于所述预设训练模型,根据所述代价模型、所述代价阈值以及所述目标特征生成所述检测模型,包括:
    根据所述预设代价矩阵、所述代价模型以及所述代价阈值确定所述预设训练模型的损失函数需满足的预设条件;
    根据所述预设条件、所述代价函数以及所述预设训练模型的预测函数生成目标损失函数;
    将所有目标特征以及所述目标特征对应的所述结果数据作为训练样本,通过预设算法获取以使得所述目标损失函数为最小值的训练参数,所述预设算法包括预设梯度下降算法或者预设最大似然估计算法;
    根据所述训练参数以及所述预测函数得到分类器,以根据所述分类器得到所述检测模型。
  9. 根据权利要求6-8任一项所述的用户操作检测方法,其特征在于,在所述生成所述检测模型之后,还包括:
    获取测试样本;
    根据测试样本以及所述检测模型生成第一测试结果,并根据所述第一测试结果以及所述代价模型确定第一代价;
    根据所述测试样本以及人工预测策略生成第二测试结果,并根据所述第二测试结果以及所述代价模型确定第二代价;
    根据所述第一代价以及所述第二代价各自的对应值确定代价节约参数,以通过所述代价节约参数表征所述检测模型的代价节约水平。
  10. 一种计算机程序产品,包括计算机程序,其特征在于,该计算机程序被处理器执行时实现权利要求1至9任一项所述的用户操作检测方法。
PCT/CN2021/142701 2021-03-26 2021-12-29 用户操作检测方法及程序产品 WO2022199185A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110326785.1 2021-03-26
CN202110326785.1A CN112927061B (zh) 2021-03-26 2021-03-26 用户操作检测方法及程序产品

Publications (1)

Publication Number Publication Date
WO2022199185A1 true WO2022199185A1 (zh) 2022-09-29

Family

ID=76176200

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142701 WO2022199185A1 (zh) 2021-03-26 2021-12-29 用户操作检测方法及程序产品

Country Status (2)

Country Link
CN (1) CN112927061B (zh)
WO (1) WO2022199185A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658860A (zh) * 2022-10-17 2023-01-31 吉林大学 一种教师自主支持性教学行为自动识别方法
CN116109630A (zh) * 2023-04-10 2023-05-12 创域智能(常熟)网联科技有限公司 基于传感器采集和人工智能的图像分析方法及***

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927061B (zh) * 2021-03-26 2024-03-12 深圳前海微众银行股份有限公司 用户操作检测方法及程序产品
CN113506167A (zh) * 2021-07-23 2021-10-15 北京淇瑀信息科技有限公司 基于排序的风险预测方法、装置、设备和介质
CN113468823B (zh) * 2021-07-26 2023-11-14 中兴飞流信息科技有限公司 一种基于机器学习的光模块损坏检测方法及***

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768772A (zh) * 2018-05-29 2018-11-06 南京航空航天大学 基于代价敏感的自组织网络的故障探测方法
CN109300029A (zh) * 2018-10-25 2019-02-01 北京芯盾时代科技有限公司 借贷欺诈检测模型训练方法、借贷欺诈检测方法及装置
CN109767308A (zh) * 2018-11-30 2019-05-17 连连银通电子支付有限公司 金融欺诈检测中时间与成本特征选择方法、设备、介质
US20200175421A1 (en) * 2018-11-29 2020-06-04 Sap Se Machine learning methods for detection of fraud-related events
CN112365202A (zh) * 2021-01-15 2021-02-12 平安科技(深圳)有限公司 一种多目标对象的评价因子筛选方法及其相关设备
CN112927061A (zh) * 2021-03-26 2021-06-08 深圳前海微众银行股份有限公司 用户操作检测方法及程序产品

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020250730A1 (ja) * 2019-06-11 2020-12-17 日本電気株式会社 不正検知装置、不正検知方法および不正検知プログラム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108768772A (zh) * 2018-05-29 2018-11-06 南京航空航天大学 基于代价敏感的自组织网络的故障探测方法
CN109300029A (zh) * 2018-10-25 2019-02-01 北京芯盾时代科技有限公司 借贷欺诈检测模型训练方法、借贷欺诈检测方法及装置
US20200175421A1 (en) * 2018-11-29 2020-06-04 Sap Se Machine learning methods for detection of fraud-related events
CN109767308A (zh) * 2018-11-30 2019-05-17 连连银通电子支付有限公司 金融欺诈检测中时间与成本特征选择方法、设备、介质
CN112365202A (zh) * 2021-01-15 2021-02-12 平安科技(深圳)有限公司 一种多目标对象的评价因子筛选方法及其相关设备
CN112927061A (zh) * 2021-03-26 2021-06-08 深圳前海微众银行股份有限公司 用户操作检测方法及程序产品

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658860A (zh) * 2022-10-17 2023-01-31 吉林大学 一种教师自主支持性教学行为自动识别方法
CN116109630A (zh) * 2023-04-10 2023-05-12 创域智能(常熟)网联科技有限公司 基于传感器采集和人工智能的图像分析方法及***

Also Published As

Publication number Publication date
CN112927061A (zh) 2021-06-08
CN112927061B (zh) 2024-03-12

Similar Documents

Publication Publication Date Title
WO2022199185A1 (zh) 用户操作检测方法及程序产品
US10943186B2 (en) Machine learning model training method and device, and electronic device
CN107633265B (zh) 用于优化信用评估模型的数据处理方法及装置
US20200167466A1 (en) Data type recognition, model training and risk recognition methods, apparatuses and devices
CN109922032B (zh) 用于确定登录账户的风险的方法、装置、设备及存储介质
RU2708356C1 (ru) Система и способ двухэтапной классификации файлов
RU2697955C2 (ru) Система и способ обучения модели обнаружения вредоносных контейнеров
WO2021037280A2 (zh) 基于rnn的反洗钱模型的训练方法、装置、设备及介质
WO2020176977A1 (en) Multi-page online application origination (oao) service for fraud prevention systems
CN107633030B (zh) 基于数据模型的信用评估方法及装置
CN110442712B (zh) 风险的确定方法、装置、服务器和文本审理***
WO2021164232A1 (zh) 用户识别方法、装置、设备及存储介质
CN110111113B (zh) 一种异常交易节点的检测方法及装置
JP7173332B2 (ja) 不正検知装置、不正検知方法および不正検知プログラム
WO2021143478A1 (zh) 识别对抗样本以保护模型安全的方法及装置
US11250368B1 (en) Business prediction method and apparatus
Zhou et al. Fraud detection within bankcard enrollment on mobile device based payment using machine learning
CN113343123B (zh) 一种生成对抗多关系图网络的训练方法和检测方法
WO2023029065A1 (zh) 数据集质量评估方法、装置、计算机设备及存储介质
CN112750038A (zh) 交易风险的确定方法、装置和服务器
CN115204322B (zh) 行为链路异常识别方法和装置
CN115563288B (zh) 一种文本检测的方法、装置、电子设备及存储介质
US20230259631A1 (en) Detecting synthetic user accounts using synthetic patterns learned via machine learning
CN110880117A (zh) 虚假业务识别方法、装置、设备和存储介质
CN113988226B (zh) 数据脱敏有效性验证方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932792

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29-01-2024)