CN112784888A

CN112784888A - User identification method, device, equipment and storage medium

Info

Publication number: CN112784888A
Application number: CN202110038063.6A
Authority: CN
Inventors: 沈易栋; 万高峰; 刘清; 刘阳; 刘子龙
Original assignee: China Unionpay Co Ltd
Current assignee: China Unionpay Co Ltd
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-05-11

Abstract

The embodiment of the application provides a user identification method, a user identification device, user identification equipment and a storage medium. The method comprises the following steps: acquiring a user characteristic sample set, wherein the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples; acquiring at least one subsample set according to the user characteristic sample set, wherein the subsample set comprises all positive samples and part negative samples in the user characteristic sample set, and the number of the positive samples in the subsample set is greater than or equal to that of the negative samples; respectively training at least one preset cost-sensitive classifier according to at least one sub-sample set to obtain at least one target classifier; and integrating at least one target classifier to obtain a user identification model with stronger identification capability. And further, the normality or abnormality of the target user can be effectively identified based on the model and the user characteristics of the target user, and the user identification effect is improved.

Description

User identification method, device, equipment and storage medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a user identification method, apparatus, device, and storage medium.

Background

To better maintain the relationship with the user, increasing user stickiness, merchants typically launch a series of offers. However, while the merchant develops such preferential activities, the risk of the activities must be prevented, so as to prevent the benefits of the activities from being acquired by abnormal users such as the "woollen party".

In a conventional user identification scheme, a preset model is trained according to a user feature sample set to obtain a user identification model, and the user identification model is used for identifying whether a user is normal or abnormal. However, the proportion of the abnormal users in the whole user group is small, the number of the positive samples, namely the abnormal user characteristics, in the user characteristic sample set is far lower than that of the negative samples, namely the normal user characteristics, and the positive and negative samples are unbalanced, so that the user identification model cannot effectively identify the abnormal users, and the identification effect is poor.

Disclosure of Invention

The embodiment of the application provides a user identification method, a device, equipment and a storage medium, which can effectively identify the normality or abnormality of a target user and improve the user identification effect.

In a first aspect, an embodiment of the present application provides a method for training a user recognition model, where the method includes:

acquiring a user characteristic sample set, wherein the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples;

acquiring at least one subsample set according to the user characteristic sample set, wherein the subsample set comprises all positive samples and part negative samples in the user characteristic sample set, and the number of the positive samples in the subsample set is greater than or equal to that of the negative samples;

respectively training at least one preset cost-sensitive classifier according to at least one sub-sample set to obtain at least one target classifier;

and integrating at least one target classifier to obtain a user identification model.

In a second aspect, an embodiment of the present application provides a user identification method, where the method includes:

acquiring user characteristics of a target user;

and identifying the user characteristics based on a user identification model to obtain an identification result of the target user, wherein the identification result comprises normal or abnormal target users, and the user identification model is obtained based on the first aspect or the training method of the user identification model in any realizable mode of the first aspect.

In a third aspect, an embodiment of the present application provides an apparatus for training a user recognition model, where the apparatus includes:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a user characteristic sample set, the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is less than that of the negative samples;

the acquisition module is further used for acquiring at least one sub-sample set according to the user characteristic sample set, wherein the sub-sample set comprises all positive samples and part of negative samples in the user characteristic sample set, and the number of the positive samples in the sub-sample set is greater than or equal to that of the negative samples;

the training module is used for respectively training at least one preset cost-sensitive classifier according to at least one sub-sample set to obtain at least one target classifier;

and the integration module is used for integrating at least one target classifier to obtain the user identification model.

In a fourth aspect, an embodiment of the present application provides a user identification apparatus, including:

the acquisition module is used for acquiring the user characteristics of the target user;

the identification module is configured to identify the user characteristics based on a user identification model to obtain an identification result of the target user, where the identification result includes a normal target user or an abnormal target user, and the user identification model is obtained based on the first aspect or the training method of the user identification model described in any one of the realizable manners of the first aspect.

In a fifth aspect, an embodiment of the present application provides a user identification device, where the user identification device includes: a processor and a memory storing computer program instructions; the method for training a user recognition model according to the first aspect is implemented when the processor executes the computer program instructions, or the method for user recognition according to the second aspect is implemented when the processor executes the computer program instructions.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, on which computer program instructions are stored, and the computer program instructions, when executed by a processor, implement the training method for the user recognition model according to the first aspect, or the computer program instructions, when executed by the processor, implement the user recognition method according to the second aspect.

According to the user identification method, the user identification device, the user identification equipment and the storage medium, a user characteristic sample set can be obtained, the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples, at least one sub sample set is obtained according to the user characteristic sample set, the sub sample set comprises all the positive samples in the user characteristic sample set and part of the negative samples, the number of the positive samples in the sub sample set is larger than or equal to that of the negative samples, at least one preset cost sensitive classifier is respectively trained according to the at least one sub sample set, at least one target classifier is obtained, the at least one target classifier is integrated, and a user identification model with strong identification capacity is obtained. And further, the normality or abnormality of the target user can be effectively identified based on the model and the user characteristics of the target user, and the user identification effect is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of an architecture of a training system for a user recognition model according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for training a user recognition model according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart illustrating another method for training a user recognition model according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a user identification method according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an apparatus for training a user recognition model according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a user identification device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a user identification device according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the application and do not limit the application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In order to solve the problem of the prior art, embodiments of the present application provide a user identification method, apparatus, device, and storage medium. The method comprises the steps of obtaining a user characteristic sample set, wherein the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples, obtaining at least one sub-sample set according to the user characteristic sample set, the sub-sample set comprises all positive samples and part of the negative samples in the user characteristic sample set, the number of the positive samples in the sub-sample set is larger than or equal to that of the negative samples, respectively training at least one preset cost sensitive classifier according to the at least one sub-sample set to obtain at least one target classifier, and integrating the at least one target classifier to obtain a user identification model with strong identification capability. And further, the normality or abnormality of the target user can be effectively identified based on the model and the user characteristics of the target user, and the user identification effect is improved.

The user identification method, apparatus, device and storage medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

As an example, the user identification method may be applied to a user identification scenario of an abnormal user such as a user in wool, a user who is overdue, or a user who is fraudulent, and is not limited herein. Since the user identification method provided by the embodiment of the present application needs to use the user identification model, the following first introduces a training method of the user identification model provided by the embodiment of the present application with reference to the drawings.

Fig. 1 is a schematic architecture diagram of a training system for a user recognition model according to an embodiment of the present application. As shown in fig. 1, the training system may include an electronic device 110, a server 120. The electronic device 110 may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), or the like. The server 120 stores a user feature sample set, where the sample set includes a plurality of positive samples and a plurality of negative samples, and the number of the positive samples is smaller than the number of the negative samples. The electronic device 110 may communicate with the server 120 through a network, wherein the network may be a wired communication network or a wireless communication network.

Referring to fig. 1, the electronic device 110 may obtain a user feature sample set from the server 120 and obtain at least one sub-sample set according to the user feature sample set. The sub-sample set comprises all positive samples in the user characteristic sample set and partial negative samples, and the number of the positive samples in the sub-sample set is larger than or equal to the number of the negative samples. And then respectively training at least one preset cost-sensitive classifier according to at least one sub-sample set to obtain at least one target classifier. And then at least one target classifier is integrated to obtain a user identification model with stronger identification capability.

The following describes in detail a training method of a user recognition model provided in an embodiment of the present application. The execution subject of the training method may be the electronic device 110 in the training system shown in fig. 1, or a module in the electronic device 110.

Fig. 2 is a schematic flowchart of a training method for a user recognition model according to an embodiment of the present disclosure. As shown in fig. 2, the training method may include the steps of:

and S210, acquiring a user feature sample set.

The user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples. It is understood that the positive samples are the few classes of samples, i.e., abnormal user features, and the negative samples are the most classes of samples, i.e., normal user features. Illustratively, a pre-stored sample set of user characteristics may be obtained from a server.

And S220, acquiring at least one sub-sample set according to the user characteristic sample set.

The sub-sample set comprises all positive samples in the user characteristic sample set and partial negative samples, and the number of the positive samples in the sub-sample set is larger than or equal to the number of the negative samples. Illustratively, the number of negative examples in each of the two subsets may be the same. I.e. the number of positive and negative samples of any two subsample sets is the same.

In one embodiment, the negative samples in the user feature sample set may be randomly sampled to obtain partial negative samples, and the partial negative samples and all positive samples are combined to obtain a sub-sample set with a smaller Imbalance Ratio (IR). Where IR is the ratio of the number of negative samples to the number of positive samples, and IR e (0, 1) of the subsample set, it can be ensured by random sampling that each negative sample is equally likely to be decimated, making the decimated negative samples better representative.

As an example, the user feature sample set includes 10000 samples, wherein the number of positive samples is 600, the number of negative samples is 9400, and the IR of the user feature sample set is 15.67, 400 negative samples can be randomly drawn from the 9400 negative samples, and the 600 positive samples are combined with the 400 negative samples drawn to obtain a sub-sample set, and the IR of the sub-sample set is 0.67.

And S230, respectively training at least one preset cost-sensitive classifier according to at least one subsample set to obtain at least one target classifier.

Specifically, one sub-sample set corresponds to one cost-sensitive classifier, and the cost-sensitive classifiers corresponding to at least one sub-sample set can be trained according to at least one sub-sample set, so as to obtain a target classifier, i.e., a trained cost-sensitive classifier. For example, the penalty coefficients of the cost-sensitive classifier may include a positive sample penalty coefficient and a negative sample penalty coefficient, and a ratio of the negative sample penalty coefficient to the positive sample penalty coefficient is equal to a ratio of the number of negative samples to the number of positive samples in the corresponding subset of the cost-sensitive classifier. That is, the ratio of the negative sample penalty coefficient to the positive sample penalty coefficient of the cost sensitive classifier is equal to the IR of the sub-sample set to which it corresponds. Therefore, different punishment coefficients of the positive sample and the negative sample can be given to the algorithm level, and the higher punishment coefficient of the positive sample is given to the positive sample, so that the deviation of the positive sample is reduced, and the identification capability of the user identification model is further improved.

Alternatively, the cost-sensitive classifier may comprise an Adaboost classifier or a cost-sensitive decision tree classifier, or the like. Illustratively, the weak classifier in the Adaboost classifier may be a Cost-Sensitive-Support-Vector Machines (CS-SVM).

S240, integrating at least one target classifier to obtain a user identification model.

Specifically, each target classifier may be integrated to obtain a total classifier, i.e., a user identification model, commonly characterized by all the target classifiers.

In the embodiment of the application, a user feature sample set can be obtained, the user feature sample set comprises a plurality of positive samples and a plurality of negative samples, the number of the positive samples in the user feature sample set is smaller than that of the negative samples, at least one sub-sample set is obtained according to the user feature sample set, the sub-sample set comprises all the positive samples in the user feature sample set and part of the negative samples, the number of the positive samples in the sub-sample set is larger than or equal to that of the negative samples, at least one preset cost sensitive classifier is respectively trained according to the at least one sub-sample set, at least one target classifier is obtained, and at least one target classifier is integrated to obtain a user identification model. Therefore, a sub-sample set with smaller IR can be obtained, and a cost-sensitive classifier is trained on the basis of the sub-sample set to obtain a user identification model with stronger identification capability and quicker and more stability.

The Easy Ensemble algorithm and CS-SVM fusion are taken as an example, and a detailed description is given below to the training method of the user recognition model provided in the embodiment of the present application, with reference to fig. 3.

As shown in FIG. 3, the user feature sample set includes N positive samples and P negative samples, P < N, and the imbalance ratio of the user feature sample set is IR₀，IR₀＝N/P。

Step 1, the number T of preset Adaboost classifiers input by a user and the imbalance ratio of the sub-sample set can be received. Wherein the imbalance ratio of each subsample set is consistent and is denoted as IR₀，IR₀∈(0，1]The sample counter is set to i-0.

And 2, repeating sampling, namely i +1 when sampling each time, stopping sampling when i is T, and obtaining negative samples of the user characteristic sample setThe number of middle random sampling is N₀Negative example of (1), N₀＝IR₀P, combining P positive samples with N₀And obtaining a subsample set 1, a subsample set 2 and a subsample set i of … … by using the negative samples. Respectively training an Adaboost classifier G according to the sub-sample set 1, the sub-sample set 2 and the sub-sample set i of … …₁Adaboost classifier G₂… … Adaboost classifier G_i. Wherein G is_iThis can be shown as follows:

wherein x represents a sample, G_i(x) Represents G_iResult of classification of sample x, g_i，j(x) Represents G_iJ-th weak classifier g in (1)_i，j(. results of classification of sample x, β)_i，jDenotes g_i，jWeight of (c), s_iRepresents G_iThe number of weak classifiers in (1), sgn, represents a sign function.

In the training process, the weight of a weak classifier in the Adaboost classifier is continuously updated in an iterative mode until convergence is achieved, and the Adaboost classifier in the convergence process is used as a target classifier.

And step 3, integrating all the target classifiers together to obtain a user identification model G. Wherein G may be as follows:

wherein x represents a sample, G (x) represents the classification result of G on the sample x, G_i，j(x) Represents G_iJ-th weak classifier g in (1)_i，j(. results of classification of sample x, β)_i，jDenotes g_i，jWeight of (c), s_iRepresents G_iT denotes the number of Adaboost classifiers, sgn denotes the sign function.

Illustratively, the weak classifier in the Adaboost classifier in step 2 may be a CS-SVM, typically a generalized SVM including soft-spaced, kernel functions, and illustratively, the CS-SVM may be as follows:

wherein w represents a normal vector, C represents a penalty coefficient, ξ_iRepresenting a relaxation variable, m representing the number of samples of the sub-sample set, min representing the minimum, s.t. representing a constraint, x_iRepresents the ith sample in the subset, y_iRepresents a sample x_iLabel of (a), w^TDenotes the transpose of w, phi denotes the nonlinear transformation function, and b denotes the intercept.

Optionally, on the basis of the formula (3), different penalty coefficients C can be given to the positive and negative classes₊，C_-The deviation of a few classes is reduced, the deviation problem of a division plane is solved, and the influence of unbalance of positive and negative samples is reduced. In particular, the CS-SVM can be set to

Wherein w represents a normal vector, C₊Represents a positive sample penalty coefficient, C_-Representing a negative sample penalty factor, I₊Represents the set of positive samples in the subset, I_-Indicates the set, ξ, in which the negative samples in the set of subsamples are located_iRepresenting a relaxation variable, m representing the number of samples of the sub-sample set, min representing the minimum, s.t. representing a constraint, x_iRepresents the ith sample in the subset, y_iRepresents a sample x_iLabel of (a), w^TDenotes the transpose of w, phi denotes the nonlinear transformation function, and b denotes the intercept. Wherein, C₊，C_-The values of (a) can be as follows:

at this time, the process of the present invention,

higher penalty coefficient can be given to the positive sample, so that the deviation of the positive sample is reduced, and the identification capability of the user identification model is further improved.

The dual formula corresponding to formula (4) can be as follows:

wherein, κ (x)_i，x_j) Represents a sample x_iAnd sample x_jInner product of, y_iAnd y_jRespectively represent samples x_iAnd sample x_jA denotes a Lagrange multiplier, a_iAnd alpha_jRespectively represent samples x_iAnd sample x_jM represents the number of samples in the subsample set, max represents the maximum value, s.t. represents the constraint condition, C₊Represents a positive sample penalty coefficient, C_-Representing a negative sample penalty factor, I₊Represents the set of positive samples in the subset, I_-Representing the set of negative examples in the set of subsamples. Based on the formula (5) and the formula (6), the lagrange multiplier α can be obtained, so that w and b in the formula (4) are calculated, and the dividing hyperplane g (x) of the CS-SVM is obtained^Tφ(x)+b。

Based on the training method for the user recognition model provided by the embodiment of the present application, the embodiment of the present application further provides a user recognition method, as shown in fig. 4, the user recognition method may include the following steps:

and S410, acquiring the user characteristics of the target user.

Wherein the target user is a user to be identified. For example, data related to the target user, such as merchant data, user data, transaction data, and other data, may be acquired from the data server, and feature extraction may be performed on the acquired data to obtain the user feature of the target user.

Alternatively, the user characteristics of the target user may be acquired periodically, for example, daily or weekly, for user identification.

And S420, identifying the user characteristics based on the user identification model to obtain an identification result of the target user.

The recognition result includes that the target user is normal or the target user is abnormal, and the user recognition model is obtained based on the training method of the user recognition model shown in fig. 2. For example, the user characteristics may be input into each target classifier in the user recognition model, each target classifier is used for classifying to obtain a classification result corresponding to the target classifier, and each classification result is integrated to obtain a recognition result.

In the embodiment of the application, the target user can be effectively identified to be normal or abnormal based on the user identification model and the user characteristics of the target user, and the user identification effect is improved.

In one embodiment, a verification result may also be obtained, and in particular, the recognition result may be verified to obtain a verification result. And the detection result represents whether the identification result is consistent with the labeling result corresponding to the target user. That is, the check result may indicate whether the recognition result is correct or not and whether the user recognition model is misjudged or not. And under the condition that the identification result is inconsistent with the labeling result, namely under the condition that the user identification model judges by mistake, updating the user characteristics to a user characteristic sample set corresponding to the user identification model. And training the user identification model according to the updated user characteristic sample set. Therefore, the user identification model can be continuously updated according to the feedback of the identification result, and the long-term effectiveness of the model is ensured.

As an example, the target user is actually a normal user, but the user identification model identifies the target user as an abnormal user by mistake, at this time, the user characteristics of the target user may be obtained, the user characteristics are updated to the user characteristic sample set as a negative sample, and the user identification model is retrained according to the updated user characteristic sample set. Optionally, the parameter threshold in the user identification model may be set according to the service requirement and the existing experience, or may be iteratively optimized by a big data technology.

Based on the training method for the user recognition model provided in the embodiment of the present application, an embodiment of the present application further provides a training apparatus for the user recognition model, as shown in fig. 5, the training apparatus 500 for the user recognition model may include: an acquisition module 510, a training module 520, and an integration module 530.

The obtaining module 510 is configured to obtain a user feature sample set. The user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples.

The obtaining module 510 is further configured to obtain at least one subsample set according to the user feature sample set. The sub-sample set comprises all positive samples in the user characteristic sample set and partial negative samples, and the number of the positive samples in the sub-sample set is larger than or equal to the number of the negative samples.

The training module 520 is configured to train at least one preset cost-sensitive classifier according to at least one sub-sample set, so as to obtain at least one target classifier.

An integrating module 530, configured to integrate at least one object classifier to obtain a user identification model.

In one embodiment, the obtaining module 510 includes:

and the extraction unit is used for randomly sampling the negative samples in the user characteristic sample set to obtain partial negative samples.

And the combining unit is used for combining part of negative samples and all positive samples to obtain a subsample set.

In one embodiment, the penalty coefficients of the cost-sensitive classifier include a positive sample penalty coefficient and a negative sample penalty coefficient, and a ratio of the negative sample penalty coefficient to the positive sample penalty coefficient is equal to a ratio of the number of negative samples to the number of positive samples in the subset corresponding to the cost-sensitive classifier.

In one embodiment, the number of negative samples in each of the two sets of subsamples is the same.

In one embodiment, the cost-sensitive classifier includes an Adaboost classifier or a cost-sensitive decision tree classifier, and a weak classifier in the Adaboost classifier is a cost-sensitive support vector machine CS-SVM.

It can be understood that each module/unit in the training apparatus 500 for a user recognition model shown in fig. 5 has a function of implementing each step in the training method for a user recognition model provided in the embodiment of the present application, and can achieve the corresponding technical effect, and for brevity, no further description is provided here.

Based on the user identification method provided in the embodiment of the present application, an embodiment of the present application further provides a user identification apparatus, as shown in fig. 6, the user identification apparatus 600 may include: an acquisition module 610 and an identification module 620.

The obtaining module 610 is configured to obtain a user characteristic of a target user.

And the identifying module 620 is configured to identify the user characteristics based on the user identification model to obtain an identification result of the target user. The recognition result includes that the target user is normal or the target user is abnormal, and the user recognition model is obtained based on the training method of the user recognition model shown in fig. 2.

In one embodiment, the obtaining module 610 is further configured to obtain a test result. And the detection result represents whether the identification result is consistent with the labeling result corresponding to the target user.

The user identification apparatus 600 further includes: and the updating module is used for updating the user characteristics to the user characteristic sample set corresponding to the user identification model under the condition that the identification result is inconsistent with the labeling result.

And the training module is used for training the user identification model according to the updated user characteristic sample set.

It can be understood that each module/unit in the user identification apparatus 600 shown in fig. 6 has a function of implementing each step in the user identification method provided in the embodiment of the present application, and can achieve the corresponding technical effect, and for brevity, no further description is provided here.

As shown in fig. 7, the user identification apparatus 700 in the present embodiment includes an input apparatus 701, an input interface 702, a central processing unit 703, a memory 704, an output interface 705, and an output apparatus 706. The input interface 702, the central processing unit 703, the memory 704, and the output interface 705 are connected to each other via a bus 710, and the input device 701 and the output device 706 are connected to the bus 710 via the input interface 702 and the output interface 705, respectively, and further connected to other components of the user identification device 700.

Specifically, the input device 701 receives input information from the outside, and transmits the input information to the central processor 703 through the input interface 702; the central processor 703 processes input information based on computer-executable instructions stored in the memory 704 to generate output information, stores the output information temporarily or permanently in the memory 704, and then transmits the output information to the output device 706 through the output interface 705; the output device 706 outputs the output information to the outside of the user recognition device 700 for use by the user.

In one embodiment, the user identification device 700 shown in fig. 7 includes: a memory 704 for storing programs; the processor 703 is configured to run a program stored in the memory to implement the user identification method provided in the embodiment of the present application.

Embodiments of the present application further provide a computer-readable storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement the user identification method provided by the embodiments of the present application.

It should be clear that each embodiment in this specification is described in a progressive manner, and the same or similar parts among the embodiments may be referred to each other, and for brevity, the description is omitted. The present application is not limited to the specific configurations and processes described above and shown in the figures. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuits, semiconductor Memory devices, Read-Only memories (ROMs), flash memories, erasable ROMs (eroms), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A method for training a user recognition model, the method comprising:

acquiring a user feature sample set, wherein the user feature sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user feature sample set is smaller than that of the negative samples;

acquiring at least one subsample set according to the user feature sample set, wherein the subsample set comprises all positive samples and part negative samples in the user feature sample set, and the number of the positive samples in the subsample set is greater than or equal to that of the negative samples;

respectively training at least one preset cost-sensitive classifier according to the at least one sub-sample set to obtain at least one target classifier;

and integrating the at least one target classifier to obtain a user identification model.

2. The method of claim 1, wherein obtaining at least one subsample set according to the user feature sample set comprises:

randomly sampling the negative samples in the user characteristic sample set to obtain the partial negative samples;

and combining the partial negative samples and the all positive samples to obtain the sub-sample set.

3. The method of claim 1, wherein the penalty coefficients of the cost-sensitive classifier include a positive sample penalty coefficient and a negative sample penalty coefficient, and a ratio of the negative sample penalty coefficient to the positive sample penalty coefficient is equal to a ratio of the number of negative samples to the number of positive samples in the subset corresponding to the cost-sensitive classifier.

4. A method according to any of claims 1-3, characterized in that the number of negative samples in each two sub-sample sets is the same.

5. The method according to any one of claims 1 to 3, wherein the cost-sensitive classifier comprises an Adaboost classifier or a cost-sensitive decision tree classifier, and a weak classifier in the Adaboost classifier is a cost-sensitive support vector machine (CS-SVM).

6. A method for identifying a user, the method comprising:

acquiring user characteristics of a target user;

and identifying the user characteristics based on a user identification model to obtain an identification result of the target user, wherein the identification result comprises normal target users or abnormal target users, and the user identification model is obtained based on the training method of the user identification model as claimed in any one of claims 1 to 5.

7. The method of claim 6, further comprising:

obtaining a checking result, wherein the checking result represents whether the identification result is consistent with a labeling result corresponding to a target user;

under the condition that the identification result is inconsistent with the labeling result, updating the user characteristics to a user characteristic sample set corresponding to the user identification model;

and training the user identification model according to the updated user characteristic sample set.

8. An apparatus for training a user recognition model, the apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a user characteristic sample set, the user characteristic sample set comprises a plurality of positive samples and a plurality of negative samples, and the number of the positive samples in the user characteristic sample set is smaller than that of the negative samples;

the obtaining module is further configured to obtain at least one subsample set according to the user feature sample set, where the subsample set includes all positive samples and part negative samples in the user feature sample set, and the number of positive samples in the subsample set is greater than or equal to the number of negative samples;

the training module is used for respectively training at least one preset cost-sensitive classifier according to the at least one sub-sample set to obtain at least one target classifier;

and the integration module is used for integrating the at least one target classifier to obtain a user identification model.

9. The apparatus of claim 8, wherein the obtaining module comprises:

the extraction unit is used for randomly sampling the negative samples in the user characteristic sample set to obtain the partial negative samples;

and the combining unit is used for combining the partial negative samples and the all positive samples to obtain the subsample set.

10. The apparatus of claim 8, wherein the penalty coefficients of the cost-sensitive classifier comprise a positive sample penalty coefficient and a negative sample penalty coefficient, and a ratio of the negative sample penalty coefficient to the positive sample penalty coefficient is equal to a ratio of the number of negative samples to the number of positive samples in the subset corresponding to the cost-sensitive classifier.

11. The apparatus according to any of claims 8-10, wherein the number of negative samples in each of the two subsets is the same.

12. The apparatus according to any one of claims 8-10, wherein the cost-sensitive classifier comprises an Adaboost classifier or a cost-sensitive decision tree classifier, and a weak classifier in the Adaboost classifier is a cost-sensitive support vector machine (CS-SVM).

13. A user identification device, the device comprising:

an identification module, configured to identify the user characteristics based on a user identification model to obtain an identification result of a target user, where the identification result includes that the target user is normal or that the target user is abnormal, and the user identification model is obtained based on the training method of the user identification model according to any one of claims 1 to 5.

14. The apparatus according to claim 13, wherein the obtaining module is further configured to obtain a verification result, wherein the verification result indicates whether the recognition result is consistent with the labeling result corresponding to the target user;

the device further comprises: the updating module is used for updating the user characteristics to a user characteristic sample set corresponding to the user identification model under the condition that the identification result is inconsistent with the labeling result;

15. A user identification device, the device comprising: a processor and a memory storing computer program instructions; the processor when executing the computer program instructions implements a training method for a user recognition model according to any one of claims 1 to 5, or the processor when executing the computer program instructions implements a user recognition method according to any one of claims 6 to 7.

16. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the training method of the user recognition model according to any one of claims 1 to 5, or which, when executed by a processor, implement the user recognition method according to any one of claims 6 to 7.