CN112784888A - User identification method, device, equipment and storage medium - Google Patents

User identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN112784888A
CN112784888A CN202110038063.6A CN202110038063A CN112784888A CN 112784888 A CN112784888 A CN 112784888A CN 202110038063 A CN202110038063 A CN 202110038063A CN 112784888 A CN112784888 A CN 112784888A
Authority
CN
China
Prior art keywords
user
sample set
classifier
samples
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110038063.6A
Other languages
Chinese (zh)
Inventor
沈易栋
万高峰
刘清
刘阳
刘子龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202110038063.6A priority Critical patent/CN112784888A/en
Publication of CN112784888A publication Critical patent/CN112784888A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Biology (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a user identification method, a user identification device, user identification equipment and a storage medium. The method comprises the following steps: acquiring a user characteristic sample set, wherein the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples; acquiring at least one subsample set according to the user characteristic sample set, wherein the subsample set comprises all positive samples and part negative samples in the user characteristic sample set, and the number of the positive samples in the subsample set is greater than or equal to that of the negative samples; respectively training at least one preset cost-sensitive classifier according to at least one sub-sample set to obtain at least one target classifier; and integrating at least one target classifier to obtain a user identification model with stronger identification capability. And further, the normality or abnormality of the target user can be effectively identified based on the model and the user characteristics of the target user, and the user identification effect is improved.

Description

User identification method, device, equipment and storage medium
Technical Field
The present application relates to the field of deep learning technologies, and in particular, to a user identification method, apparatus, device, and storage medium.
Background
To better maintain the relationship with the user, increasing user stickiness, merchants typically launch a series of offers. However, while the merchant develops such preferential activities, the risk of the activities must be prevented, so as to prevent the benefits of the activities from being acquired by abnormal users such as the "woollen party".
In a conventional user identification scheme, a preset model is trained according to a user feature sample set to obtain a user identification model, and the user identification model is used for identifying whether a user is normal or abnormal. However, the proportion of the abnormal users in the whole user group is small, the number of the positive samples, namely the abnormal user characteristics, in the user characteristic sample set is far lower than that of the negative samples, namely the normal user characteristics, and the positive and negative samples are unbalanced, so that the user identification model cannot effectively identify the abnormal users, and the identification effect is poor.
Disclosure of Invention
The embodiment of the application provides a user identification method, a device, equipment and a storage medium, which can effectively identify the normality or abnormality of a target user and improve the user identification effect.
In a first aspect, an embodiment of the present application provides a method for training a user recognition model, where the method includes:
acquiring a user characteristic sample set, wherein the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples;
acquiring at least one subsample set according to the user characteristic sample set, wherein the subsample set comprises all positive samples and part negative samples in the user characteristic sample set, and the number of the positive samples in the subsample set is greater than or equal to that of the negative samples;
respectively training at least one preset cost-sensitive classifier according to at least one sub-sample set to obtain at least one target classifier;
and integrating at least one target classifier to obtain a user identification model.
In a second aspect, an embodiment of the present application provides a user identification method, where the method includes:
acquiring user characteristics of a target user;
and identifying the user characteristics based on a user identification model to obtain an identification result of the target user, wherein the identification result comprises normal or abnormal target users, and the user identification model is obtained based on the first aspect or the training method of the user identification model in any realizable mode of the first aspect.
In a third aspect, an embodiment of the present application provides an apparatus for training a user recognition model, where the apparatus includes:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a user characteristic sample set, the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is less than that of the negative samples;
the acquisition module is further used for acquiring at least one sub-sample set according to the user characteristic sample set, wherein the sub-sample set comprises all positive samples and part of negative samples in the user characteristic sample set, and the number of the positive samples in the sub-sample set is greater than or equal to that of the negative samples;
the training module is used for respectively training at least one preset cost-sensitive classifier according to at least one sub-sample set to obtain at least one target classifier;
and the integration module is used for integrating at least one target classifier to obtain the user identification model.
In a fourth aspect, an embodiment of the present application provides a user identification apparatus, including:
the acquisition module is used for acquiring the user characteristics of the target user;
the identification module is configured to identify the user characteristics based on a user identification model to obtain an identification result of the target user, where the identification result includes a normal target user or an abnormal target user, and the user identification model is obtained based on the first aspect or the training method of the user identification model described in any one of the realizable manners of the first aspect.
In a fifth aspect, an embodiment of the present application provides a user identification device, where the user identification device includes: a processor and a memory storing computer program instructions; the method for training a user recognition model according to the first aspect is implemented when the processor executes the computer program instructions, or the method for user recognition according to the second aspect is implemented when the processor executes the computer program instructions.
In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, on which computer program instructions are stored, and the computer program instructions, when executed by a processor, implement the training method for the user recognition model according to the first aspect, or the computer program instructions, when executed by the processor, implement the user recognition method according to the second aspect.
According to the user identification method, the user identification device, the user identification equipment and the storage medium, a user characteristic sample set can be obtained, the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples, at least one sub sample set is obtained according to the user characteristic sample set, the sub sample set comprises all the positive samples in the user characteristic sample set and part of the negative samples, the number of the positive samples in the sub sample set is larger than or equal to that of the negative samples, at least one preset cost sensitive classifier is respectively trained according to the at least one sub sample set, at least one target classifier is obtained, the at least one target classifier is integrated, and a user identification model with strong identification capacity is obtained. And further, the normality or abnormality of the target user can be effectively identified based on the model and the user characteristics of the target user, and the user identification effect is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of an architecture of a training system for a user recognition model according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for training a user recognition model according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart illustrating another method for training a user recognition model according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a user identification method according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an apparatus for training a user recognition model according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a user identification device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a user identification device according to an embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the application and do not limit the application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In order to solve the problem of the prior art, embodiments of the present application provide a user identification method, apparatus, device, and storage medium. The method comprises the steps of obtaining a user characteristic sample set, wherein the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples, obtaining at least one sub-sample set according to the user characteristic sample set, the sub-sample set comprises all positive samples and part of the negative samples in the user characteristic sample set, the number of the positive samples in the sub-sample set is larger than or equal to that of the negative samples, respectively training at least one preset cost sensitive classifier according to the at least one sub-sample set to obtain at least one target classifier, and integrating the at least one target classifier to obtain a user identification model with strong identification capability. And further, the normality or abnormality of the target user can be effectively identified based on the model and the user characteristics of the target user, and the user identification effect is improved.
The user identification method, apparatus, device and storage medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
As an example, the user identification method may be applied to a user identification scenario of an abnormal user such as a user in wool, a user who is overdue, or a user who is fraudulent, and is not limited herein. Since the user identification method provided by the embodiment of the present application needs to use the user identification model, the following first introduces a training method of the user identification model provided by the embodiment of the present application with reference to the drawings.
Fig. 1 is a schematic architecture diagram of a training system for a user recognition model according to an embodiment of the present application. As shown in fig. 1, the training system may include an electronic device 110, a server 120. The electronic device 110 may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), or the like. The server 120 stores a user feature sample set, where the sample set includes a plurality of positive samples and a plurality of negative samples, and the number of the positive samples is smaller than the number of the negative samples. The electronic device 110 may communicate with the server 120 through a network, wherein the network may be a wired communication network or a wireless communication network.
Referring to fig. 1, the electronic device 110 may obtain a user feature sample set from the server 120 and obtain at least one sub-sample set according to the user feature sample set. The sub-sample set comprises all positive samples in the user characteristic sample set and partial negative samples, and the number of the positive samples in the sub-sample set is larger than or equal to the number of the negative samples. And then respectively training at least one preset cost-sensitive classifier according to at least one sub-sample set to obtain at least one target classifier. And then at least one target classifier is integrated to obtain a user identification model with stronger identification capability.
The following describes in detail a training method of a user recognition model provided in an embodiment of the present application. The execution subject of the training method may be the electronic device 110 in the training system shown in fig. 1, or a module in the electronic device 110.
Fig. 2 is a schematic flowchart of a training method for a user recognition model according to an embodiment of the present disclosure. As shown in fig. 2, the training method may include the steps of:
and S210, acquiring a user feature sample set.
The user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples. It is understood that the positive samples are the few classes of samples, i.e., abnormal user features, and the negative samples are the most classes of samples, i.e., normal user features. Illustratively, a pre-stored sample set of user characteristics may be obtained from a server.
And S220, acquiring at least one sub-sample set according to the user characteristic sample set.
The sub-sample set comprises all positive samples in the user characteristic sample set and partial negative samples, and the number of the positive samples in the sub-sample set is larger than or equal to the number of the negative samples. Illustratively, the number of negative examples in each of the two subsets may be the same. I.e. the number of positive and negative samples of any two subsample sets is the same.
In one embodiment, the negative samples in the user feature sample set may be randomly sampled to obtain partial negative samples, and the partial negative samples and all positive samples are combined to obtain a sub-sample set with a smaller Imbalance Ratio (IR). Where IR is the ratio of the number of negative samples to the number of positive samples, and IR e (0, 1) of the subsample set, it can be ensured by random sampling that each negative sample is equally likely to be decimated, making the decimated negative samples better representative.
As an example, the user feature sample set includes 10000 samples, wherein the number of positive samples is 600, the number of negative samples is 9400, and the IR of the user feature sample set is 15.67, 400 negative samples can be randomly drawn from the 9400 negative samples, and the 600 positive samples are combined with the 400 negative samples drawn to obtain a sub-sample set, and the IR of the sub-sample set is 0.67.
And S230, respectively training at least one preset cost-sensitive classifier according to at least one subsample set to obtain at least one target classifier.
Specifically, one sub-sample set corresponds to one cost-sensitive classifier, and the cost-sensitive classifiers corresponding to at least one sub-sample set can be trained according to at least one sub-sample set, so as to obtain a target classifier, i.e., a trained cost-sensitive classifier. For example, the penalty coefficients of the cost-sensitive classifier may include a positive sample penalty coefficient and a negative sample penalty coefficient, and a ratio of the negative sample penalty coefficient to the positive sample penalty coefficient is equal to a ratio of the number of negative samples to the number of positive samples in the corresponding subset of the cost-sensitive classifier. That is, the ratio of the negative sample penalty coefficient to the positive sample penalty coefficient of the cost sensitive classifier is equal to the IR of the sub-sample set to which it corresponds. Therefore, different punishment coefficients of the positive sample and the negative sample can be given to the algorithm level, and the higher punishment coefficient of the positive sample is given to the positive sample, so that the deviation of the positive sample is reduced, and the identification capability of the user identification model is further improved.
Alternatively, the cost-sensitive classifier may comprise an Adaboost classifier or a cost-sensitive decision tree classifier, or the like. Illustratively, the weak classifier in the Adaboost classifier may be a Cost-Sensitive-Support-Vector Machines (CS-SVM).
S240, integrating at least one target classifier to obtain a user identification model.
Specifically, each target classifier may be integrated to obtain a total classifier, i.e., a user identification model, commonly characterized by all the target classifiers.
In the embodiment of the application, a user feature sample set can be obtained, the user feature sample set comprises a plurality of positive samples and a plurality of negative samples, the number of the positive samples in the user feature sample set is smaller than that of the negative samples, at least one sub-sample set is obtained according to the user feature sample set, the sub-sample set comprises all the positive samples in the user feature sample set and part of the negative samples, the number of the positive samples in the sub-sample set is larger than or equal to that of the negative samples, at least one preset cost sensitive classifier is respectively trained according to the at least one sub-sample set, at least one target classifier is obtained, and at least one target classifier is integrated to obtain a user identification model. Therefore, a sub-sample set with smaller IR can be obtained, and a cost-sensitive classifier is trained on the basis of the sub-sample set to obtain a user identification model with stronger identification capability and quicker and more stability.
The Easy Ensemble algorithm and CS-SVM fusion are taken as an example, and a detailed description is given below to the training method of the user recognition model provided in the embodiment of the present application, with reference to fig. 3.
As shown in FIG. 3, the user feature sample set includes N positive samples and P negative samples, P < N, and the imbalance ratio of the user feature sample set is IR0,IR0=N/P。
Step 1, the number T of preset Adaboost classifiers input by a user and the imbalance ratio of the sub-sample set can be received. Wherein the imbalance ratio of each subsample set is consistent and is denoted as IR0,IR0∈(0,1]The sample counter is set to i-0.
And 2, repeating sampling, namely i +1 when sampling each time, stopping sampling when i is T, and obtaining negative samples of the user characteristic sample setThe number of middle random sampling is N0Negative example of (1), N0=IR0P, combining P positive samples with N0And obtaining a subsample set 1, a subsample set 2 and a subsample set i of … … by using the negative samples. Respectively training an Adaboost classifier G according to the sub-sample set 1, the sub-sample set 2 and the sub-sample set i of … …1Adaboost classifier G2… … Adaboost classifier Gi. Wherein G isiThis can be shown as follows:
Figure BDA0002894109420000071
wherein x represents a sample, Gi(x) Represents GiResult of classification of sample x, gi,j(x) Represents GiJ-th weak classifier g in (1)i,j(. results of classification of sample x, β)i,jDenotes gi,jWeight of (c), siRepresents GiThe number of weak classifiers in (1), sgn, represents a sign function.
In the training process, the weight of a weak classifier in the Adaboost classifier is continuously updated in an iterative mode until convergence is achieved, and the Adaboost classifier in the convergence process is used as a target classifier.
And step 3, integrating all the target classifiers together to obtain a user identification model G. Wherein G may be as follows:
Figure BDA0002894109420000072
wherein x represents a sample, G (x) represents the classification result of G on the sample x, Gi,j(x) Represents GiJ-th weak classifier g in (1)i,j(. results of classification of sample x, β)i,jDenotes gi,jWeight of (c), siRepresents GiT denotes the number of Adaboost classifiers, sgn denotes the sign function.
Illustratively, the weak classifier in the Adaboost classifier in step 2 may be a CS-SVM, typically a generalized SVM including soft-spaced, kernel functions, and illustratively, the CS-SVM may be as follows:
Figure BDA0002894109420000081
wherein w represents a normal vector, C represents a penalty coefficient, ξiRepresenting a relaxation variable, m representing the number of samples of the sub-sample set, min representing the minimum, s.t. representing a constraint, xiRepresents the ith sample in the subset, yiRepresents a sample xiLabel of (a), wTDenotes the transpose of w, phi denotes the nonlinear transformation function, and b denotes the intercept.
Optionally, on the basis of the formula (3), different penalty coefficients C can be given to the positive and negative classes+,C-The deviation of a few classes is reduced, the deviation problem of a division plane is solved, and the influence of unbalance of positive and negative samples is reduced. In particular, the CS-SVM can be set to
Figure BDA0002894109420000082
Wherein w represents a normal vector, C+Represents a positive sample penalty coefficient, C-Representing a negative sample penalty factor, I+Represents the set of positive samples in the subset, I-Indicates the set, ξ, in which the negative samples in the set of subsamples are locatediRepresenting a relaxation variable, m representing the number of samples of the sub-sample set, min representing the minimum, s.t. representing a constraint, xiRepresents the ith sample in the subset, yiRepresents a sample xiLabel of (a), wTDenotes the transpose of w, phi denotes the nonlinear transformation function, and b denotes the intercept. Wherein, C+,C-The values of (a) can be as follows:
Figure BDA0002894109420000083
at this time, the process of the present invention,
Figure BDA0002894109420000091
higher penalty coefficient can be given to the positive sample, so that the deviation of the positive sample is reduced, and the identification capability of the user identification model is further improved.
The dual formula corresponding to formula (4) can be as follows:
Figure BDA0002894109420000092
wherein, κ (x)i,xj) Represents a sample xiAnd sample xjInner product of, yiAnd yjRespectively represent samples xiAnd sample xjA denotes a Lagrange multiplier, aiAnd alphajRespectively represent samples xiAnd sample xjM represents the number of samples in the subsample set, max represents the maximum value, s.t. represents the constraint condition, C+Represents a positive sample penalty coefficient, C-Representing a negative sample penalty factor, I+Represents the set of positive samples in the subset, I-Representing the set of negative examples in the set of subsamples. Based on the formula (5) and the formula (6), the lagrange multiplier α can be obtained, so that w and b in the formula (4) are calculated, and the dividing hyperplane g (x) of the CS-SVM is obtainedTφ(x)+b。
Based on the training method for the user recognition model provided by the embodiment of the present application, the embodiment of the present application further provides a user recognition method, as shown in fig. 4, the user recognition method may include the following steps:
and S410, acquiring the user characteristics of the target user.
Wherein the target user is a user to be identified. For example, data related to the target user, such as merchant data, user data, transaction data, and other data, may be acquired from the data server, and feature extraction may be performed on the acquired data to obtain the user feature of the target user.
Alternatively, the user characteristics of the target user may be acquired periodically, for example, daily or weekly, for user identification.
And S420, identifying the user characteristics based on the user identification model to obtain an identification result of the target user.
The recognition result includes that the target user is normal or the target user is abnormal, and the user recognition model is obtained based on the training method of the user recognition model shown in fig. 2. For example, the user characteristics may be input into each target classifier in the user recognition model, each target classifier is used for classifying to obtain a classification result corresponding to the target classifier, and each classification result is integrated to obtain a recognition result.
In the embodiment of the application, the target user can be effectively identified to be normal or abnormal based on the user identification model and the user characteristics of the target user, and the user identification effect is improved.
In one embodiment, a verification result may also be obtained, and in particular, the recognition result may be verified to obtain a verification result. And the detection result represents whether the identification result is consistent with the labeling result corresponding to the target user. That is, the check result may indicate whether the recognition result is correct or not and whether the user recognition model is misjudged or not. And under the condition that the identification result is inconsistent with the labeling result, namely under the condition that the user identification model judges by mistake, updating the user characteristics to a user characteristic sample set corresponding to the user identification model. And training the user identification model according to the updated user characteristic sample set. Therefore, the user identification model can be continuously updated according to the feedback of the identification result, and the long-term effectiveness of the model is ensured.
As an example, the target user is actually a normal user, but the user identification model identifies the target user as an abnormal user by mistake, at this time, the user characteristics of the target user may be obtained, the user characteristics are updated to the user characteristic sample set as a negative sample, and the user identification model is retrained according to the updated user characteristic sample set. Optionally, the parameter threshold in the user identification model may be set according to the service requirement and the existing experience, or may be iteratively optimized by a big data technology.
Based on the training method for the user recognition model provided in the embodiment of the present application, an embodiment of the present application further provides a training apparatus for the user recognition model, as shown in fig. 5, the training apparatus 500 for the user recognition model may include: an acquisition module 510, a training module 520, and an integration module 530.
The obtaining module 510 is configured to obtain a user feature sample set. The user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples.
The obtaining module 510 is further configured to obtain at least one subsample set according to the user feature sample set. The sub-sample set comprises all positive samples in the user characteristic sample set and partial negative samples, and the number of the positive samples in the sub-sample set is larger than or equal to the number of the negative samples.
The training module 520 is configured to train at least one preset cost-sensitive classifier according to at least one sub-sample set, so as to obtain at least one target classifier.
An integrating module 530, configured to integrate at least one object classifier to obtain a user identification model.
In one embodiment, the obtaining module 510 includes:
and the extraction unit is used for randomly sampling the negative samples in the user characteristic sample set to obtain partial negative samples.
And the combining unit is used for combining part of negative samples and all positive samples to obtain a subsample set.
In one embodiment, the penalty coefficients of the cost-sensitive classifier include a positive sample penalty coefficient and a negative sample penalty coefficient, and a ratio of the negative sample penalty coefficient to the positive sample penalty coefficient is equal to a ratio of the number of negative samples to the number of positive samples in the subset corresponding to the cost-sensitive classifier.
In one embodiment, the number of negative samples in each of the two sets of subsamples is the same.
In one embodiment, the cost-sensitive classifier includes an Adaboost classifier or a cost-sensitive decision tree classifier, and a weak classifier in the Adaboost classifier is a cost-sensitive support vector machine CS-SVM.
It can be understood that each module/unit in the training apparatus 500 for a user recognition model shown in fig. 5 has a function of implementing each step in the training method for a user recognition model provided in the embodiment of the present application, and can achieve the corresponding technical effect, and for brevity, no further description is provided here.
Based on the user identification method provided in the embodiment of the present application, an embodiment of the present application further provides a user identification apparatus, as shown in fig. 6, the user identification apparatus 600 may include: an acquisition module 610 and an identification module 620.
The obtaining module 610 is configured to obtain a user characteristic of a target user.
And the identifying module 620 is configured to identify the user characteristics based on the user identification model to obtain an identification result of the target user. The recognition result includes that the target user is normal or the target user is abnormal, and the user recognition model is obtained based on the training method of the user recognition model shown in fig. 2.
In one embodiment, the obtaining module 610 is further configured to obtain a test result. And the detection result represents whether the identification result is consistent with the labeling result corresponding to the target user.
The user identification apparatus 600 further includes: and the updating module is used for updating the user characteristics to the user characteristic sample set corresponding to the user identification model under the condition that the identification result is inconsistent with the labeling result.
And the training module is used for training the user identification model according to the updated user characteristic sample set.
It can be understood that each module/unit in the user identification apparatus 600 shown in fig. 6 has a function of implementing each step in the user identification method provided in the embodiment of the present application, and can achieve the corresponding technical effect, and for brevity, no further description is provided here.
Fig. 7 is a schematic structural diagram of a user identification device according to an embodiment of the present application.
As shown in fig. 7, the user identification apparatus 700 in the present embodiment includes an input apparatus 701, an input interface 702, a central processing unit 703, a memory 704, an output interface 705, and an output apparatus 706. The input interface 702, the central processing unit 703, the memory 704, and the output interface 705 are connected to each other via a bus 710, and the input device 701 and the output device 706 are connected to the bus 710 via the input interface 702 and the output interface 705, respectively, and further connected to other components of the user identification device 700.
Specifically, the input device 701 receives input information from the outside, and transmits the input information to the central processor 703 through the input interface 702; the central processor 703 processes input information based on computer-executable instructions stored in the memory 704 to generate output information, stores the output information temporarily or permanently in the memory 704, and then transmits the output information to the output device 706 through the output interface 705; the output device 706 outputs the output information to the outside of the user recognition device 700 for use by the user.
In one embodiment, the user identification device 700 shown in fig. 7 includes: a memory 704 for storing programs; the processor 703 is configured to run a program stored in the memory to implement the user identification method provided in the embodiment of the present application.
Embodiments of the present application further provide a computer-readable storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement the user identification method provided by the embodiments of the present application.
It should be clear that each embodiment in this specification is described in a progressive manner, and the same or similar parts among the embodiments may be referred to each other, and for brevity, the description is omitted. The present application is not limited to the specific configurations and processes described above and shown in the figures. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuits, semiconductor Memory devices, Read-Only memories (ROMs), flash memories, erasable ROMs (eroms), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims (16)

1. A method for training a user recognition model, the method comprising:
acquiring a user feature sample set, wherein the user feature sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user feature sample set is smaller than that of the negative samples;
acquiring at least one subsample set according to the user feature sample set, wherein the subsample set comprises all positive samples and part negative samples in the user feature sample set, and the number of the positive samples in the subsample set is greater than or equal to that of the negative samples;
respectively training at least one preset cost-sensitive classifier according to the at least one sub-sample set to obtain at least one target classifier;
and integrating the at least one target classifier to obtain a user identification model.
2. The method of claim 1, wherein obtaining at least one subsample set according to the user feature sample set comprises:
randomly sampling the negative samples in the user characteristic sample set to obtain the partial negative samples;
and combining the partial negative samples and the all positive samples to obtain the sub-sample set.
3. The method of claim 1, wherein the penalty coefficients of the cost-sensitive classifier include a positive sample penalty coefficient and a negative sample penalty coefficient, and a ratio of the negative sample penalty coefficient to the positive sample penalty coefficient is equal to a ratio of the number of negative samples to the number of positive samples in the subset corresponding to the cost-sensitive classifier.
4. A method according to any of claims 1-3, characterized in that the number of negative samples in each two sub-sample sets is the same.
5. The method according to any one of claims 1 to 3, wherein the cost-sensitive classifier comprises an Adaboost classifier or a cost-sensitive decision tree classifier, and a weak classifier in the Adaboost classifier is a cost-sensitive support vector machine (CS-SVM).
6. A method for identifying a user, the method comprising:
acquiring user characteristics of a target user;
and identifying the user characteristics based on a user identification model to obtain an identification result of the target user, wherein the identification result comprises normal target users or abnormal target users, and the user identification model is obtained based on the training method of the user identification model as claimed in any one of claims 1 to 5.
7. The method of claim 6, further comprising:
obtaining a checking result, wherein the checking result represents whether the identification result is consistent with a labeling result corresponding to a target user;
under the condition that the identification result is inconsistent with the labeling result, updating the user characteristics to a user characteristic sample set corresponding to the user identification model;
and training the user identification model according to the updated user characteristic sample set.
8. An apparatus for training a user recognition model, the apparatus comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a user characteristic sample set, the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples;
the obtaining module is further configured to obtain at least one subsample set according to the user feature sample set, where the subsample set includes all positive samples and part negative samples in the user feature sample set, and the number of positive samples in the subsample set is greater than or equal to the number of negative samples;
the training module is used for respectively training at least one preset cost-sensitive classifier according to the at least one sub-sample set to obtain at least one target classifier;
and the integration module is used for integrating the at least one target classifier to obtain a user identification model.
9. The apparatus of claim 8, wherein the obtaining module comprises:
the extraction unit is used for randomly sampling the negative samples in the user characteristic sample set to obtain the partial negative samples;
and the combining unit is used for combining the partial negative samples and the all positive samples to obtain the subsample set.
10. The apparatus of claim 8, wherein the penalty coefficients of the cost-sensitive classifier comprise a positive sample penalty coefficient and a negative sample penalty coefficient, and a ratio of the negative sample penalty coefficient to the positive sample penalty coefficient is equal to a ratio of the number of negative samples to the number of positive samples in the subset corresponding to the cost-sensitive classifier.
11. The apparatus according to any of claims 8-10, wherein the number of negative samples in each of the two subsets is the same.
12. The apparatus according to any one of claims 8-10, wherein the cost-sensitive classifier comprises an Adaboost classifier or a cost-sensitive decision tree classifier, and a weak classifier in the Adaboost classifier is a cost-sensitive support vector machine (CS-SVM).
13. A user identification device, the device comprising:
the acquisition module is used for acquiring the user characteristics of the target user;
an identification module, configured to identify the user characteristics based on a user identification model to obtain an identification result of a target user, where the identification result includes that the target user is normal or that the target user is abnormal, and the user identification model is obtained based on the training method of the user identification model according to any one of claims 1 to 5.
14. The apparatus according to claim 13, wherein the obtaining module is further configured to obtain a verification result, wherein the verification result indicates whether the recognition result is consistent with the labeling result corresponding to the target user;
the device further comprises: the updating module is used for updating the user characteristics to a user characteristic sample set corresponding to the user identification model under the condition that the identification result is inconsistent with the labeling result;
and the training module is used for training the user identification model according to the updated user characteristic sample set.
15. A user identification device, the device comprising: a processor and a memory storing computer program instructions; the processor when executing the computer program instructions implements a training method for a user recognition model according to any one of claims 1 to 5, or the processor when executing the computer program instructions implements a user recognition method according to any one of claims 6 to 7.
16. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the training method of the user recognition model according to any one of claims 1 to 5, or which, when executed by a processor, implement the user recognition method according to any one of claims 6 to 7.
CN202110038063.6A 2021-01-12 2021-01-12 User identification method, device, equipment and storage medium Pending CN112784888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110038063.6A CN112784888A (en) 2021-01-12 2021-01-12 User identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110038063.6A CN112784888A (en) 2021-01-12 2021-01-12 User identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112784888A true CN112784888A (en) 2021-05-11

Family

ID=75755331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110038063.6A Pending CN112784888A (en) 2021-01-12 2021-01-12 User identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112784888A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102147866A (en) * 2011-04-20 2011-08-10 上海交通大学 Target identification method based on training Adaboost and support vector machine
CN109063787A (en) * 2018-08-28 2018-12-21 齐齐哈尔大学 It is a kind of for unbalanced data based on X-mean and sample misclassification rate Ensemble classifier method
CN109934280A (en) * 2019-03-07 2019-06-25 贵州大学 A kind of unbalanced data classification method based on PSO-DEC-IFSVM sorting algorithm
CN111651500A (en) * 2020-05-29 2020-09-11 中国平安财产保险股份有限公司 User identity recognition method, electronic device and storage medium
CN112183729A (en) * 2020-09-30 2021-01-05 腾讯音乐娱乐科技(深圳)有限公司 Neural network model training method and device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102147866A (en) * 2011-04-20 2011-08-10 上海交通大学 Target identification method based on training Adaboost and support vector machine
CN109063787A (en) * 2018-08-28 2018-12-21 齐齐哈尔大学 It is a kind of for unbalanced data based on X-mean and sample misclassification rate Ensemble classifier method
CN109934280A (en) * 2019-03-07 2019-06-25 贵州大学 A kind of unbalanced data classification method based on PSO-DEC-IFSVM sorting algorithm
CN111651500A (en) * 2020-05-29 2020-09-11 中国平安财产保险股份有限公司 User identity recognition method, electronic device and storage medium
CN112183729A (en) * 2020-09-30 2021-01-05 腾讯音乐娱乐科技(深圳)有限公司 Neural network model training method and device and computer readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
巩莉: "面向智慧家庭的用户日常行为识别算法的研究", 《CNKI优秀硕士论文 信息科技辑》, pages 14 - 24 *
谭洁帆: "基于卷积神经网络和代价敏感的不平衡图像分类方法", 《计算机应用》, vol. 38, no. 7, pages 1864 *
金鑫,李玉鑑: "不平衡支持向量机的惩罚因子选择方法", 《计算机工程与应用》, vol. 47, no. 33, pages 131 *
陈志茹,洪文学: "嵌入欠采样技术的支持向量机集成分类算法的MicroRNA靶标预测", 《生物医学工程学杂志》, vol. 33, no. 1, pages 72 - 77 *

Similar Documents

Publication Publication Date Title
US11055395B2 (en) Step-up authentication
US20230045378A1 (en) Non-repeatable challenge-response authentication
CN110659318B (en) Big data-based policy pushing method, system and computer equipment
CN110147967B (en) Risk prevention and control method and device
US20150213365A1 (en) Methods and systems for classification of software applications
CN110782333B (en) Equipment risk control method, device, equipment and medium
US10685347B1 (en) Activating a transaction card
CN111612037B (en) Abnormal user detection method, device, medium and electronic equipment
CN107563757A (en) The method and device of data risk control
CN110414581B (en) Picture detection method and device, storage medium and electronic device
CN111460446A (en) Malicious file detection method and device based on model
CN113691556A (en) Big data processing method and server applied to information protection detection
CN106485261A (en) A kind of method and apparatus of image recognition
CN112330382B (en) Item recommendation method, device, computing equipment and medium
CN110796269A (en) Method and device for generating model, and method and device for processing information
US20210021553A1 (en) System and method for identifying spam email
WO2022152018A1 (en) Method and device for identifying multiple accounts belonging to the same person
CN109598513B (en) Risk identification method and risk identification device
CN110543756A (en) Device identification method and device, storage medium and electronic device
CN112784888A (en) User identification method, device, equipment and storage medium
CN111245815B (en) Data processing method and device, storage medium and electronic equipment
US20230333720A1 (en) Generating presentation information associated with one or more objects depicted in image data for display via a graphical user interface
CN113052197B (en) Method, device, equipment and medium for identity recognition
CN112328881A (en) Article recommendation method and device, terminal device and storage medium
CN109660676B (en) Abnormal object identification method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination