CN111144899B

CN111144899B - Method and device for identifying false transaction and electronic equipment

Info

Publication number: CN111144899B
Application number: CN201911227488.0A
Authority: CN
Inventors: 刘腾飞; 程羽; 杨洋; 晏荣; 李杨
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-12-04
Filing date: 2019-12-04
Publication date: 2023-04-25
Anticipated expiration: 2039-12-04
Also published as: CN111144899A

Abstract

The embodiment of the specification provides a method and device for identifying false transactions and electronic equipment. The method comprises the following steps: aiming at any transaction to be identified, acquiring an identification result output by a risk model corresponding to each risk dimension according to characteristic data of the transaction under a plurality of risk dimensions; calculating joint probability distribution corresponding to the plurality of recognition results by using the set potential category model; calculating a conditional probability value of the to-be-identified transaction belonging to a false transaction according to the joint probability distribution; and when the conditional probability value is greater than a threshold value, determining that the transaction to be identified is a false transaction.

Description

Method and device for identifying false transaction and electronic equipment

Technical Field

The embodiment of the specification relates to the technical field of internet security, in particular to a method and a device for identifying false transactions and electronic equipment.

Background

With the continuous development of electronic commerce, online shopping has become a way for people to shop daily.

Since the electronic commerce platform has many shops selling the same commodity, most buyers tend to purchase commodity in shops with large sales volume; therefore, in order to improve the exposure rate of the stores, some stores use a "bill-of-use" mode to create a large number of false transactions, thereby increasing sales of the stores.

Disclosure of Invention

The embodiment of the specification provides a method and device for identifying false transactions and electronic equipment.

According to a first aspect of embodiments of the present specification, there is provided a method of identifying spurious transactions, the method comprising:

aiming at any transaction to be identified, acquiring identification results respectively output by identification models corresponding to a plurality of risk dimensions;

calculating joint probability distribution corresponding to the plurality of recognition results by using the set potential category model;

calculating a conditional probability value of the transaction belonging to false transaction under the joint probability distribution by using a Bayesian rule;

and when the conditional probability value is greater than a threshold value, determining that the transaction to be identified is a false transaction.

Optionally, the method further comprises:

acquiring n transaction samples; performing iterative computation until the accuracy of identifying the false transaction reaches a preset requirement;

acquiring identification results output by identification models corresponding to m risk dimensions of each transaction sample;

inputting the n transaction samples and the corresponding n x m recognition results into a potential type model to obtain the recognition results of the n transaction samples output by the potential type model;

and if the accuracy of the n recognition results does not meet the preset requirement, adjusting the parameter values of the model parameters in the potential class model by using an optimization algorithm.

Optionally, the optimization algorithm includes a maximum expectation algorithm.

Optionally, the method further comprises:

after the iteration is finished, checking the meaning of the identification result of the potential class model.

Optionally, the value of the identification result is 1 or 0; the verification of the meaning of the recognition result of the potential category model specifically comprises the following steps:

acquiring n recognition results output by the potential class model in the last iteration;

if the number of the identification result with the value of 1 is larger than the number with the value of 0, changing the value of 1 into normal transaction and changing the value of 0 into false transaction.

According to a second aspect of embodiments of the present specification, there is provided an apparatus for identifying a spurious transaction, the apparatus comprising:

the acquisition unit is used for acquiring identification results output by identification models corresponding to a plurality of risk dimensions respectively for any transaction to be identified;

the first calculation unit calculates joint probability distribution corresponding to the plurality of recognition results by using the set potential category model;

the second calculation unit calculates a conditional probability value of the trade belonging to the false trade under the joint probability distribution by using a Bayesian rule;

and the identification unit is used for determining that the transaction to be identified is a false transaction when the conditional probability value is larger than a threshold value.

Optionally, the apparatus further includes:

the model training unit is used for acquiring n transaction samples; performing iterative computation until the accuracy of identifying the false transaction reaches a preset requirement; acquiring identification results output by identification models corresponding to m risk dimensions of each transaction sample; inputting the n transaction samples and the corresponding n x m recognition results into a potential type model to obtain the recognition results of the n transaction samples output by the potential type model; and if the accuracy of the n recognition results does not meet the preset requirement, adjusting the parameter values of the model parameters in the potential class model by using an optimization algorithm.

Optionally, the apparatus further includes:

and the verification unit is used for verifying the meaning of the identification result of the potential class model after the iteration is finished.

Optionally, the value of the identification result is 1 or 0; the verification unit specifically comprises:

acquiring n recognition results output by the potential class model in the last iteration; if the number of the identification result with the value of 1 is larger than the number with the value of 0, changing the value of 1 into normal transaction and changing the value of 0 into false transaction.

According to a third aspect of embodiments of the present specification, there is provided an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to identify a spurious transaction as in any of the above.

Drawings

FIG. 1 is a flow chart of a method for identifying spurious transactions provided by an embodiment of the present description;

FIG. 2 is a schematic diagram of a potential class model provided by an embodiment of the present disclosure;

FIG. 3 is a system architecture diagram for identifying spurious transactions provided by one embodiment of the present description;

FIG. 4 is a hardware block diagram of an apparatus for identifying spurious transactions according to one embodiment of the present disclosure;

fig. 5 is a schematic block diagram of an apparatus for identifying a spurious transaction according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present description as detailed in the accompanying claims.

The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

As previously described, some store merchants use a "swipe" approach to create a large number of spurious transactions to increase store sales in order to increase store exposure. The term "bill-of-sale" refers to the surreptitious agreement between the merchant and the buyer, and the buyer actually places a commodity on an e-commerce platform (hereinafter referred to as platform) and actually pays the amount of the commodity, thus successfully creating a real transaction on the platform. However, in the commodity delivery stage, the merchant does not send the commodity purchased by the buyer, but sends an empty express package or other small non-valued commodity; and returns the amount paid by the buyer to the buyer (e.g., online banking transfer, off-line cash, third party payment transfer, etc.) through a channel that the platform cannot monitor. After receiving the express packages, the buyers confirm the goods receiving and evaluate.

So far, a complete "bill" process is completed. While a transaction actually occurs on the platform, the merchandise is not actually in circulation. Transactions for which no commodity circulation is such are generally referred to as spurious transactions.

The actual commodity is not really purchased by the user in the false transaction process, but the sales volume of the commodity is truly increased; this is certainly unfair to other normally operating merchants. Moreover, false transactions not only compromise the interests of other normal merchants, but may mislead the purchase intent of other potential buyers, ultimately compromising the normal and orderly development of the platform. Thus, it is necessary for the platform to identify which transactions are spurious transactions in time, thereby penalizing merchants and buyers participating in the spurious transactions, and seriously handing over related functional departments for processing.

In the related art, identifying whether a transaction is a spurious transaction may be identified from different risk dimensions. Each risk dimension may be pre-established with an identification model. Any one of the recognition models may output a recognition result that determines from the present risk dimension whether it is a spurious transaction. Further, by utilizing a preset integration strategy, a final recognition result can be obtained by integrating the recognition results output by each recognition model.

Common integration strategies include majority voting strategies, a vote overruling strategy, weighted voting strategies, and the like.

1. Majority voting strategy: if half or more of the recognition results are identical for the plurality of recognition results, the recognition result is determined as the final modified recognition result. In practical applications, however, the risk dimensions considered by different recognition models are different, and a certain false transaction is not necessarily recognized as a false transaction by more than half of the recognition models. Therefore, the majority voting strategy is adopted, and finally the problems of less false transaction identification amount and lower coverage rate can be caused.

2. A ticket overrule policy: for a plurality of recognition results, as long as one recognition result is a false transaction, the final recognition result is also a false transaction. However, in order to ensure the accuracy of the final recognition result, such a strategy generally requires that the accuracy of a single recognition model is very high, otherwise, the accuracy of the final recognition result cannot be ensured, and is easily affected by the recognition model with the lowest accuracy. In addition, if the accuracy requirements for a single recognition model are too high, it may also cause the coverage of the final recognition result to become low.

3. Weighted voting strategy: considering that the accuracy of different recognition models is different, weighted voting can be adopted, so that the recognition model with high accuracy has higher weight and the recognition model with low accuracy has lower weight. However, the weighted voting strategy needs to calculate the weights of different recognition models first, and if a transaction sample with a real label can evaluate the accuracy of each recognition model by using the real label, the accuracy can be taken as the weight of the recognition result of each recognition model. However, without a sample of transactions with a genuine tag, the weights of the recognition model are difficult to determine accurately.

The specification provides a method for identifying false transactions without high requirements on the accuracy of a single identification model and without the need for a real tag to determine the weight of the identification model. Because the accuracy of each recognition model is automatically estimated in the potential category model training process, the probability of deriving false transactions by calculating the joint probability distribution of the recognition results of a plurality of recognition models through the potential category models and utilizing the Bayesian rule is more accurate.

The following may be introduced with reference to the example shown in fig. 1, where the method may be applied to a server, which may be a server of a sub-control system, a server cluster, or a cloud platform constructed by the server cluster. The method may comprise the steps of:

step 110: aiming at any transaction to be identified, acquiring identification results respectively output by identification models corresponding to a plurality of risk dimensions;

step 120: calculating joint probability distribution corresponding to the plurality of recognition results by using the set potential category model;

step 130: calculating a conditional probability value of the transaction belonging to false transaction under the joint probability distribution by using a Bayesian rule;

step 140: and when the conditional probability value is greater than a threshold value, determining that the transaction to be identified is a false transaction.

The potential class model (Latent Class Model, LCM) is also referred to as a hidden class model. The potential class model may analyze the unlabeled discrete data, the potential class model belonging to one of the probability map models.

The structure of the potential class model is shown in fig. 2. The structure of the potential class model in fig. 2 includes two variables: variable Y and variable Li. Wherein the variable Y belongs to a discrete variable, the variable value is 0 or 1,0 indicates that the transaction is not false, and 1 indicates that the transaction is false. The variable Y is equal to the recognition result output by the potential class model.

The variable Li is a variable (i=1, 2,3, …, m) corresponding to the recognition result output by the m recognition models, L1 corresponds to the 1 st recognition model, L2 corresponds to the 2 nd recognition model, …, lm corresponds to the m-th recognition model. The variable Li also belongs to a discrete variable, the variable value being either 0 or 1,0 indicating that the transaction is non-spurious and 1 indicating that the transaction is spurious. The variable Li is equal to the recognition result output by the i-th recognition model.

For variable Y, there is a model parameter P (Y). P (Y) may represent the probability of different values of Y (i.e., y=0 or y=1).

Likewise, for the variable Li, there is a model parameter P (li|y). P (li|y) may represent the probability of different values of Li given Y. For example, P (li=1|y=1) represents how likely the ith recognition model likewise determines that the current transaction is a spurious transaction if the potential category model determines that the current transaction is a spurious transaction.

The m+1 model parameters through the potential class model include P (Y), P (L1|Y), P (L2|Y), … …, P (Lm-1|Y), P (lm|Y); the joint probability distribution P (Y, L1, L2, …, lm) can be calculated as shown in equation 1 below:

the joint probability distribution may be simply referred to as a joint distribution. A joint probability distribution may refer to a probability distribution of a random vector of two or more random variables. The representation of the joint probability distribution is different depending on the random variables. For discrete random variables, the joint probability distribution can be represented in a list form or in a function form; for continuous random variables, the joint probability distribution is represented by the integral of a non-negative function. The joint probabilities are expressed in the form of functions for discrete random variables, respectively, in this specification.

After the model parameters are determined, for any transaction data, the conditional probability P (y|l1, L2, …, lm) can be calculated using bayesian law with the joint probability distribution of equation 1 as shown in equation 2 below:

the model structure of LCM models as shown in fig. 2 is generally fixed, but the parameter values of the model parameters need to be determined by learning during model training.

Generally, as the business runs, the business platform will precipitate a large amount of historical transaction data. These historical transaction data can be used to train parameter values for various model parameters in the LCM model. It should be noted that the transaction samples in this specification may be historical transaction data without a real tag. I.e., unsupervised learning can be employed to train the LCM model. Of course, in some cases, supervised learning or semi-supervised learning may be used to train the LCM model, but an unsupervised learning approach may be used to minimize learning costs.

The training procedure for LCM model is described as follows:

inputting the n transaction samples and the corresponding n x m recognition results into a potential type model to obtain final recognition results of the n transaction samples output by the potential type model;

and if the accuracy of the n final recognition results does not meet the preset requirement, adjusting the parameter values of the model parameters in the potential class model by using an optimization algorithm.

In this embodiment, after n transaction samples are obtained, m recognition results are obtained for each transaction sample through m recognition models. Thus, n×m recognition results are used to learn the parameter values of each model parameter in LCM model.

The optimization algorithm is used for learning a locally optimal solution of each model parameter in the model. Specifically, the optimization algorithm may employ, for example, a maximum Expectation-maximization (EM), a simulated annealing algorithm, a gradient descent algorithm, or the like.

In an embodiment, after determining the parameter values of the model parameters in the LCM model, in order to ensure that the value 1 of the variable Y still represents a false transaction and the value 0 represents a non-false transaction, the meaning of the recognition result of the potential class model obtained after the iteration is completed needs to be checked, so as to prevent the situation that the meaning of the recognition result is turned over.

Specifically: therefore, after model parameter learning is completed, n recognition results output by the potential class model in the last iteration are obtained;

If the number of the identification results with the value of 1 is smaller than the number of the identification results with the value of 0, the verification is passed.

In this embodiment, in practical cases, the duty cycle of the dummy transactions in the historical transaction data is typically much smaller than that of the normal transactions, i.e., P (y=1) < < P (y=0). By comparing the number of P (y=1) and P (y=0) at the last iteration, if P (y=1) > P (y=0), the meaning of the value Y can be considered to be flipped, and it is further necessary to change the value 1 to represent a normal transaction and 0 to represent a false transaction. On the contrary, if P (y=1) < P (y=0), it is considered that the meaning of the value of Y is not inverted, and no processing is required.

As shown in fig. 3, after verifying that the meaning of the recognition result of the LCM model is correct, the LCM model may be put on line and used to recognize whether the transaction to be recognized is a false transaction, that is, after receiving any transaction to be recognized, the transaction is first recognized by using a plurality of recognition models corresponding to risk dimensions, and then the recognition result output by each recognition model is input to a potential class model for secondary recognition, where the secondary recognition process is performed in steps 120-140.

Based on the output of step 130, a conditional probability P that the transaction is a spurious transaction may be obtained. Further, in step 140, it is determined whether or not the transaction is a dummy transaction based on P. If the P value is larger than the threshold value, the false transaction is judged, otherwise, the false transaction is judged.

The threshold value may be manually preset in this specification; for example based on manual sampling results, or personal business experience.

With the continued development of computer technology, and in particular with the advancement of artificial intelligence, the threshold may also be calculated by machine learning. For example, an optimal threshold may be calculated by a machine learning algorithm based on historical transaction data.

Furthermore, the threshold may also be calculated based on big data techniques. For example, when it is found that most of the false transactions are identified by mass data, the set threshold is 0.5, and the threshold in this embodiment may be set to 0.5.

In this embodiment, there is no high requirement on the accuracy of a single recognition model, and no real tag is required to determine the weight of the recognition model. Because the accuracy of each recognition model is automatically estimated in the potential category model training process, the probability of deriving false transactions by calculating the joint probability distribution of the recognition results of a plurality of recognition models through the potential category models and utilizing the Bayesian rule is more accurate.

Corresponding to the foregoing method embodiments for identifying a fraudulent transaction, the present description also provides embodiments of an apparatus for identifying a fraudulent transaction. The embodiment of the device can be implemented by software, or can be implemented by hardware or a combination of hardware and software. Taking a software implementation as an example, the device in a logic sense is formed by reading corresponding computer service program instructions in the nonvolatile memory into the memory by the processor of the device where the device is located for operation. In terms of hardware, as shown in fig. 4, a hardware structure diagram of a device where a device for identifying a false transaction in the present specification is located is shown, and in addition to the processor, the network interface, the memory and the nonvolatile memory shown in fig. 4, the device where the device is located in the embodiment generally includes other hardware according to the actual function of identifying the false transaction, which is not described herein again.

Referring to fig. 5, a block diagram of an apparatus for identifying a spurious transaction according to an embodiment of the present disclosure corresponds to the embodiment shown in fig. 1, and includes:

an acquiring unit 310, configured to acquire, for any transaction to be identified, identification results output by identification models corresponding to a plurality of risk dimensions, respectively;

the first calculating unit 320 calculates joint probability distributions corresponding to the plurality of recognition results by using the set potential class model;

a second calculation unit 330 for calculating a conditional probability value of the transaction belonging to a false transaction under the joint probability distribution using bayesian rule;

and an identifying unit 340, configured to determine that the transaction to be identified is a false transaction when the conditional probability value is greater than a threshold value.

Optionally, the apparatus further includes:

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present description. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

Fig. 5 above describes the internal functional modules and structural schematic of an apparatus for identifying spurious transactions, the substantial execution body of which may be an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

Optionally, the training process of the potential class model is as follows:

Optionally, the method further comprises:

In the above embodiment of the electronic device, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the aforementioned memory may be a read-only memory (ROM), a random access memory (random access memory, RAM), a flash memory, a hard disk, or a solid state disk. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in a hardware processor, or in a combination of hardware and software modules in a processor.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the electronic device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the examples disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.

It is to be understood that the present description is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

Claims

1. A method of identifying a fraudulent transaction, the method comprising:

when the conditional probability value is greater than a threshold value, determining that the transaction to be identified is a false transaction;

wherein the set potential class model is trained by:

2. The method of claim 1, the optimization algorithm comprising a maximum expectation algorithm.

3. The method of claim 1, the method further comprising:

4. A method according to claim 3, wherein the value of the identification result is 1 or 0; the verification of the meaning of the recognition result of the potential category model specifically comprises the following steps:

5. An apparatus to identify a fraudulent transaction, the apparatus comprising:

the identification unit is used for determining that the transaction to be identified is a false transaction when the conditional probability value is larger than a threshold value;

wherein the set potential class model is obtained through training of a model training unit,

6. The apparatus of claim 5, the optimization algorithm comprising a maximum expectation algorithm.

7. The apparatus of claim 5, the apparatus further comprising:

8. The apparatus of claim 7, the recognition result having a value of 1 or 0; the verification unit specifically comprises:

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to the method of any of the preceding claims 1-4.