CN111144899A

CN111144899A - Method and device for identifying false transactions and electronic equipment

Info

Publication number: CN111144899A
Application number: CN201911227488.0A
Authority: CN
Inventors: 刘腾飞; 程羽; 杨洋; 晏荣; 李杨
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2020-05-12
Anticipated expiration: 2039-12-04
Also published as: CN111144899B

Abstract

The embodiment of the specification provides a method and a device for identifying false transactions and an electronic device. The method comprises the following steps: aiming at any transaction to be identified, acquiring an identification result output by a risk model corresponding to each risk dimension according to the characteristic data of the transaction under a plurality of risk dimensions; calculating the joint probability distribution corresponding to the plurality of recognition results by using a set potential category model; calculating the conditional probability value of the transaction to be identified belonging to the false transaction according to the joint probability distribution; and when the conditional probability value is larger than a threshold value, determining that the transaction to be identified is a false transaction.

Description

Method and device for identifying false transactions and electronic equipment

Technical Field

The embodiment of the specification relates to the technical field of internet security, in particular to a method and a device for identifying false transactions and electronic equipment.

Background

With the continuous development of electronic commerce, online shopping has become a way for people to shop everyday.

Since there are many stores selling the same product on the e-commerce platform, most buyers tend to purchase products in stores with large sales volumes; therefore, in order to increase the exposure rate of the stores, some stores create a large number of false transactions in a 'bill-swiping' manner, so that the sales volume of the stores is increased.

Disclosure of Invention

The embodiment of the specification provides a method and a device for identifying false transactions and an electronic device.

According to a first aspect of embodiments herein, there is provided a method of identifying false transactions, the method comprising:

aiming at any transaction to be identified, acquiring identification results output by identification models corresponding to a plurality of risk dimensions respectively;

calculating the joint probability distribution corresponding to the plurality of recognition results by using a set potential category model;

calculating conditional probability values of the transactions belonging to false transactions under the joint probability distribution by using a Bayesian rule;

and when the conditional probability value is larger than a threshold value, determining that the transaction to be identified is a false transaction.

Optionally, the method further includes:

acquiring n transaction samples; performing iterative calculation by using the following steps until the accuracy rate of identifying the false transactions reaches the preset requirement;

acquiring an identification result output by the identification models corresponding to the m risk dimensions of each transaction sample;

inputting the n transaction samples and the corresponding n x m identification results into a potential category model to obtain the identification results of the n transaction samples output by the potential type model;

and if the accuracy of the n identification results does not meet the preset requirement, adjusting the parameter values of the model parameters in the potential classification model by using an optimization algorithm.

Optionally, the optimization algorithm comprises a maximum expectation algorithm.

Optionally, the method further includes:

after the iteration is finished, the meaning of the identification result of the potential category model is checked.

Optionally, the value of the recognition result is 1 or 0; the verifying the meaning of the recognition result of the potential category model specifically includes:

acquiring n recognition results output by the potential category model during the last iteration;

and if the number of the identification result with the value of 1 is larger than the number of the identification result with the value of 0, changing the value of 1 into the normal transaction, and changing the value of 0 into the false transaction.

According to a second aspect of embodiments herein, there is provided an apparatus for identifying false transactions, the apparatus comprising:

the acquisition unit is used for acquiring identification results output by identification models corresponding to a plurality of risk dimensions respectively aiming at any transaction to be identified;

the first calculation unit is used for calculating joint probability distribution corresponding to the identification results by utilizing a set potential category model;

the second calculation unit is used for calculating the conditional probability value of the false transaction belonging to the transaction under the joint probability distribution by using a Bayesian rule;

and the identification unit is used for determining the transaction to be identified as a false transaction when the conditional probability value is greater than a threshold value.

Optionally, the apparatus further comprises:

the model training unit is used for acquiring n transaction samples; performing iterative calculation by using the following steps until the accuracy rate of identifying the false transactions reaches the preset requirement; acquiring an identification result output by the identification models corresponding to the m risk dimensions of each transaction sample; inputting the n transaction samples and the corresponding n x m identification results into a potential category model to obtain the identification results of the n transaction samples output by the potential type model; and if the accuracy of the n identification results does not meet the preset requirement, adjusting the parameter values of the model parameters in the potential classification model by using an optimization algorithm.

Optionally, the apparatus further comprises:

and the checking unit is used for checking the meaning of the identification result of the potential category model after the iteration is finished.

Optionally, the value of the recognition result is 1 or 0; the verification unit specifically includes:

acquiring n recognition results output by the potential category model during the last iteration; and if the number of the identification result with the value of 1 is larger than the number of the identification result with the value of 0, changing the value of 1 into the normal transaction, and changing the value of 0 into the false transaction.

According to a third aspect of embodiments herein, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to any of the above methods of identifying fraudulent transactions.

Drawings

FIG. 1 is a flow diagram of a method for identifying fraudulent transactions provided by an embodiment of the present specification;

FIG. 2 is a schematic diagram of a potential category model provided by an embodiment of the present description;

FIG. 3 is a system architecture diagram for identifying fraudulent transactions provided by one embodiment of the present specification;

FIG. 4 is a hardware block diagram of an apparatus for identifying fraudulent transactions provided by one embodiment of the present specification;

fig. 5 is a block diagram of an apparatus for identifying fraudulent transactions according to an embodiment of the present specification.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

As previously mentioned, some store merchants create a large number of false transactions in a "swipe" fashion to increase store exposure, thereby increasing sales at the stores. The "swiping bill" herein refers to the implicit agreement between the merchant and the buyer that the buyer really orders the goods and pays the amount of the goods on the e-commerce platform (hereinafter referred to as the platform), so that a real transaction is successfully created on the platform. However, in the goods delivery stage, the merchant does not transmit the goods purchased by the buyer, but transmits an empty express package or other small goods with no value; and returns the amount paid by the buyer to the buyer through a channel that the platform cannot monitor (e.g., internet banking transfers, off-line cash, third party payment transfers, etc.). After receiving the express package, the buyer confirms the receiving and evaluates.

At this point, a complete "swipe" process is complete. Although a transaction actually occurs on the platform, the commodity is not actually circulated. Transactions for which there is no circulation of goods are generally referred to as fake transactions.

The actual commodities are not really purchased by the user in the false transaction process, but the sales volume of the commodities is really increased; this is clearly unfair to other normally operating merchants. Moreover, false transactions not only harm the interests of other normal merchants, but also can mislead the buying intentions of other potential buyers, and ultimately also harm the normal and orderly development of the platform. Therefore, it is necessary for the platform to identify which transactions are false transactions in time, so as to penalize merchants and buyers participating in the false transactions, and to transfer related functional departments for processing seriously.

In the related art, identifying whether a transaction is a spurious transaction can be identified from different risk dimensions. Each risk dimension may be pre-established with an identification model. Any one of the recognition models can output a recognition result for judging whether the transaction is a false transaction from the risk dimension. Furthermore, a final recognition result can be obtained by synthesizing the recognition results output by each recognition model by using a preset integration strategy.

Common integration strategies include a majority voting strategy, a one-vote veto strategy, a weighted voting strategy, and the like.

1. Majority voting strategy: and determining the recognition result as the final modified recognition result if half or more of the recognition results are consistent with each other. In practical application, however, the risk dimensions considered by different recognition models are different, and a certain false transaction is not necessarily recognized as a false transaction by more than half of the recognition models. Therefore, the adoption of a majority voting strategy can finally cause the problems of less false transaction identification and lower coverage rate.

2. A ticket veto policy: for a plurality of recognition results, as long as one recognition result is a false transaction, the final recognition result is also a false transaction. However, in order to guarantee the accuracy of the final recognition result, the strategy generally requires that the accuracy of a single recognition model is very high, otherwise, the accuracy of the final recognition result cannot be guaranteed, and the strategy is easily affected by the recognition model with the lowest accuracy. In addition, if the accuracy requirement for a single recognition model is too high, it may also cause the coverage of the final recognition result to be low.

3. And (3) weighting voting strategy: considering that the accuracy of different recognition models is different, weighted voting can be adopted, so that the recognition model with high accuracy has higher weight and the recognition model with low accuracy has lower weight. However, the weighted voting strategy needs to calculate the weights of different recognition models first, and if a transaction sample with a real label can use the real label to evaluate the accuracy of each recognition model, the accuracy is used as the weight of the recognition result of each recognition model. However, if there is no transaction sample of the real label, the weight of the recognition model is difficult to determine accurately.

The present specification provides a method for identifying false transactions without a high requirement on the accuracy of a single identification model and without the need for a real tag to determine the weight of the identification model. Because the accuracy of each recognition model is automatically estimated in the training process of the potential category models, the joint probability distribution of the recognition results output by the recognition models is calculated through the potential category models, and the probability of the false transactions is deduced from the joint probability distribution by using the Bayes law, so that the method is more accurate.

The following may be introduced with reference to the example shown in fig. 1, and the method may be applied to a server, which may be a server of a sub-control system, a server cluster, or a cloud platform constructed by the server cluster. The method may comprise the steps of:

step 110: aiming at any transaction to be identified, acquiring identification results output by identification models corresponding to a plurality of risk dimensions respectively;

step 120: calculating the joint probability distribution corresponding to the plurality of recognition results by using a set potential category model;

step 130: calculating conditional probability values of the transactions belonging to false transactions under the joint probability distribution by using a Bayesian rule;

step 140: and when the conditional probability value is larger than a threshold value, determining that the transaction to be identified is a false transaction.

The Latent Class Model (LCM) is also called Latent Class Model. The potential category model can analyze label-free discrete data, and belongs to one of the probability map models.

The structure of the potential class model is shown in FIG. 2. The structure of the potential class model in fig. 2 contains two variables: the variable Y and the variable Li. Wherein the variable Y belongs to discrete variables, the variable value is 0 or 1, 0 represents that the transaction is not false, and 1 represents that the transaction is false. The variable Y is equal to the recognition result output by the potential category model.

The variable Li is a variable (i ═ 1, 2, 3, …, m) corresponding to the recognition results output by m recognition models, L1 corresponding to the 1 st recognition model, L2 corresponding to the 2 nd recognition model, …, Lm corresponding to the m th recognition model. The variable Li also belongs to discrete variables, the variable value is 0 or 1, 0 indicates that the transaction is not false, and 1 indicates that the transaction is false. The variable Li is equal to the recognition result output by the ith recognition model.

For the variable Y, there is a model parameter p (Y). P (Y) may represent the probability of different values of Y (i.e., Y ═ 0 or Y ═ 1).

Likewise, for the variable Li, there is a model parameter P (Li | Y). P (Li | Y) may represent the probability of different values of Li given Y. For example, P (Li ═ 1| Y ═ 1) represents what the probability is that the ith recognition model also determines that the current transaction is a spurious transaction if the potential category model determines that the current transaction is a spurious transaction.

The m +1 model parameters in the potential category model comprise P (Y), P (L1| Y), P (L2| Y), … …, P (Lm-1| Y) and P (Lm | Y); a joint probability distribution P (Y, L1, L2, …, Lm) can be calculated as shown in equation 1 below:

the joint probability distribution may be referred to as simply the joint distribution. The joint probability distribution may refer to a probability distribution of a random vector consisting of two or more random variables. The representation of the joint probability distribution varies according to the random variables. For discrete random variables, the joint probability distribution can be represented in a list form or a function form; for continuous random variables, the joint probability distribution is represented by the integral of a non-negative function. The joint probabilities are expressed in the form of functions for discrete random variables, respectively.

When the model parameters are determined, for any transaction data, the conditional probability P (Y | L1, L2, …, Lm) can be calculated by using bayesian method in combination with the joint probability distribution of formula 1 as shown in formula 2 below:

the model structure of the LCM model shown in fig. 2 is generally fixed, but the parameter values of the model parameters need to be determined by learning in the model training process.

Generally, as a business operates, the business platform deposits a large amount of historical transaction data. These historical transaction data may be used to train parameter values for various model parameters in the LCM model. It is worth mentioning that the transaction sample in this specification may be historical transaction data without a genuine tag. I.e. unsupervised learning can be used to train the LCM model. Of course, supervised learning or semi-supervised learning may be used to train the LCM model in some cases, but the learning cost can be minimized by using unsupervised learning.

The training process for the LCM model is introduced as follows:

inputting the n transaction samples and the corresponding n x m identification results into a potential category model to obtain final identification results of the n transaction samples output by the potential type model;

and if the accuracy of the n final recognition results does not meet the preset requirement, adjusting the parameter values of the model parameters in the potential classification model by using an optimization algorithm.

In this embodiment, after n transaction samples are obtained, m recognition results are obtained for each transaction sample through m recognition models. Thus, n × m recognition results are used to learn parameter values of the respective model parameters in the LCM model.

The optimization algorithm is used for learning the locally optimal solution of each model parameter in the model. Specifically, the optimization algorithm may employ, for example, an Expectation-maximization (EM) algorithm, a simulated annealing algorithm, a gradient descent algorithm, or the like.

In an embodiment, after parameter values of each model parameter in the LCM model are determined, in order to ensure that a value 1 of the variable Y still represents a false transaction and a value 0 represents a non-false transaction, the meaning of the identification result of the potential category model obtained after the iteration is finished needs to be checked, so that the situation that the meaning of the identification result is reversed is prevented.

Specifically, the method comprises the following steps: therefore, after the model parameter learning is completed, n recognition results output by the potential category model in the last iteration are obtained;

If the number of the recognition results with the value of 1 is smaller than the number with the value of 0, the verification is passed.

In this embodiment, in practical situations, the proportion of false transactions in the historical transaction data is usually much smaller than that of normal transactions, i.e., P (Y ═ 1) < < P (Y ═ 0). By comparing the number of P (Y ═ 1) and P (Y ═ 0) on the last iteration, if P (Y ═ 1) > P (Y ═ 0), it can be considered that the meaning of the Y value is reversed, and then the value of 1 needs to be changed to represent normal transaction, and the value of 0 needs to be changed to represent false transaction. On the contrary, if P (Y ═ 1) < P (Y ═ 0), it can be considered that the meaning of the value of Y is not inverted, and no processing is required.

As shown in fig. 3, after the meaning of the recognition result of the LCM model is verified to be correct, the LCM model can be online and used to recognize whether the transaction to be recognized is a false transaction, that is, after receiving any transaction to be recognized, the transaction is first recognized by using the recognition models corresponding to the risk dimensions, and then the recognition result output by each recognition model is input to the potential category model for secondary recognition, and the secondary recognition process executes the aforementioned step 120-140.

According to the output result of step 130, the conditional probability P that the transaction is a false transaction can be obtained. And then determines whether it is a fraudulent transaction based on P in step 140. If the P value is larger than the threshold value, the transaction is judged to be false, otherwise, the transaction is judged to be not false.

The threshold value may be set in advance manually in this specification; for example, based on manual sampling results, or personal business experience.

With the continuous development of computer technology, especially the progress of artificial intelligence, the threshold value can also be calculated through machine learning. For example, based on historical transaction data, an optimal threshold may be calculated by a machine learning algorithm.

Still further, the threshold may be calculated based on big data techniques. For example, if the threshold value is set to 0.5 when most of the false transactions are identified through mass data, the threshold value of this embodiment may also be set to 0.5.

In the embodiment, the accuracy of a single recognition model is not highly required, and a real label is not required to determine the weight of the recognition model. Because the accuracy of each recognition model is automatically estimated in the training process of the potential category models, the joint probability distribution of the recognition results output by the recognition models is calculated through the potential category models, and the probability of the false transactions is deduced from the joint probability distribution by using the Bayes law, so that the method is more accurate.

In correspondence with the foregoing method embodiments of identifying spurious transactions, the present specification also provides embodiments of an apparatus for identifying spurious transactions. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer business program instructions in the nonvolatile memory into the memory for operation through the processor of the device in which the device is located. From a hardware aspect, as shown in fig. 4, a hardware structure diagram of a device in which the apparatus for identifying the false transaction is located in this specification is shown, except for the processor, the network interface, the memory, and the nonvolatile memory shown in fig. 4, the device in which the apparatus is located in the embodiment may also include other hardware according to an actual function of identifying the false transaction, which is not described again.

Referring to fig. 5, a block diagram of an apparatus for identifying fraudulent transactions is provided for one embodiment of the present specification, the apparatus corresponding to the embodiment shown in fig. 1, and the apparatus including:

an obtaining unit 310, configured to obtain, for any transaction to be identified, identification results output by identification models corresponding to a plurality of risk dimensions respectively;

the first calculating unit 320 calculates joint probability distribution corresponding to the plurality of recognition results by using the set potential category model;

a second calculating unit 330, which calculates conditional probability values of the false transactions according to the joint probability distribution by using bayesian rules;

the identifying unit 340 determines that the transaction to be identified is a false transaction when the conditional probability value is greater than a threshold value.

Optionally, the apparatus further comprises:

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

Fig. 5 above describes the internal functional modules and structural schematic of the apparatus for identifying false transactions, and the actual execution subject can be an electronic device, which includes:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

Optionally, the training process of the potential category model is as follows:

Optionally, the method further includes:

In the above embodiments of the electronic device, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor, and the aforementioned memory may be a read-only memory (ROM), a Random Access Memory (RAM), a flash memory, a hard disk, or a solid state disk. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware processor, or in a combination of the hardware and software modules of the processor.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiment of the electronic device, since it is substantially similar to the embodiment of the method, the description is simple, and for the relevant points, reference may be made to part of the description of the embodiment of the method.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.

It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

Claims

1. A method of identifying fraudulent transactions, the method comprising:

2. The method of claim 1, further comprising:

3. The method of claim 2, the optimization algorithm comprising a max-expectation algorithm.

4. The method of claim 2, further comprising:

5. The method of claim 4, wherein the value of the recognition result is 1 or 0; the verifying the meaning of the recognition result of the potential category model specifically includes:

6. An apparatus to identify false transactions, the apparatus comprising:

7. The apparatus of claim 6, the apparatus further comprising:

8. The apparatus of claim 7, the optimization algorithm comprising a max-expectation algorithm.

9. The apparatus of claim 7, further comprising:

10. The apparatus of claim 9, wherein the value of the recognition result is 1 or 0; the verification unit specifically includes:

11. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured as the method of any of the preceding claims 1-5.