CN116957606A - Fraud user identification method and system - Google Patents

Fraud user identification method and system Download PDF

Info

Publication number
CN116957606A
CN116957606A CN202310936032.1A CN202310936032A CN116957606A CN 116957606 A CN116957606 A CN 116957606A CN 202310936032 A CN202310936032 A CN 202310936032A CN 116957606 A CN116957606 A CN 116957606A
Authority
CN
China
Prior art keywords
users
user
intermediary
candidate
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310936032.1A
Other languages
Chinese (zh)
Inventor
罗一策
王冠楠
程微宏
刘永超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202310936032.1A priority Critical patent/CN116957606A/en
Publication of CN116957606A publication Critical patent/CN116957606A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0645Rental transactions; Leasing transactions

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present specification provides a fraud user identification method and system, based on a history request user, a candidate intermediary user set associated with the history request user is determined, and based on behavior characteristics of the candidate intermediary user, an expansion intermediary user set is determined from the candidate intermediary user set, so that potential fraud users taught by the expansion intermediary user set are identified. Wherein the behavior characteristics are derived based on the request results of the history requesting user. According to the method and the system provided by the specification, the positioning accuracy of the intermediary user is improved, and meanwhile, a new potential intermediary user can be positioned, so that the accuracy of identifying potential fraud users based on the potential intermediary user is improved, and the occurrence of fraud behaviors is reduced.

Description

Fraud user identification method and system
Technical Field
The present disclosure relates to the field of fraud identification, and in particular, to a fraud user identification method and system.
Background
In recent years, with the loose quantization and economic recovery, financial lending services and online leasing services are rapidly developed, and related user fraud is inevitably generated under the background of rapid development of technologies such as mobile payment, large statistics, artificial intelligence and the like.
Therefore, there is a need to provide a fraud user identification method and system, which can quickly and efficiently identify potential fraud users, so as to reduce the occurrence probability of corresponding fraud events. The statements in this background section merely provide information to the inventors and may not represent prior art to the present disclosure nor may they represent prior art to the filing date of the present disclosure.
Disclosure of Invention
The fraud user identification method and the fraud user identification system can improve the accuracy of identifying potential fraud users based on potential intermediary users so as to reduce the occurrence of fraud events.
In a first aspect, the present specification provides a fraud user identification method, applied to a third party service platform, including: determining, based on a history-requesting user, a set of candidate intermediary users associated with the history-requesting user, the set of candidate intermediary users including candidate intermediary users that a memory cell user is fraud; determining a set of expanded intermediary users from the set of candidate intermediary users based on target features of the candidate intermediary users, the target features comprising behavioral features derived based on request results of historical requesting users having the association with the candidate intermediary users; and identifying potential fraud users taught by the extended set of intermediary users from the users of the third party service platform based on the extended set of intermediary users.
In some embodiments, the third party service platform includes an online lease module, the history requesting user includes a user initiating an online lease request using the online lease module, and the potential fraud user includes a potential user fraud using the online lease module.
In some embodiments, the determining, based on the history request user, a set of candidate intermediary users with which the history request user is associated includes: determining, based on a user association graph, a first set of users having the association with the history of requesting users; and determining, from the first set of users, that a user satisfying a first preset condition is the candidate intermediary user, thereby determining the candidate intermediary user set, wherein the first preset condition includes: for the candidate intermediary users, the statistics of the historical request users with which the association exists are greater than a preset first threshold.
In some embodiments, the statistics of the history requesting users with which the association exists include: the number of history requesting users with which the association exists; and/or the proportion of history requesting users with whom the association exists among all users with whom the association exists.
In some embodiments, the determining the set of expanded intermediary users from the set of candidate intermediary users based on the target characteristics of the candidate intermediary users comprises: determining the target feature of the candidate mediating user comprises: determining the behavior feature of the candidate intermediary user based on the number and/or proportion of abnormal states as a result of the request, the request result comprising: at least one of pass, reject, normal transaction, and breach, the abnormal state including at least one of the reject and the breach; and determining the set of expanded intermediary users based on the target features of the candidate intermediary users.
In some embodiments, the determining the set of expanded intermediary users based on the target characteristics of the candidate intermediary users comprises: and determining that the candidate intermediary users with the behavior characteristics larger than a preset first classification threshold are expanding intermediary users, thereby determining the expanding intermediary user set.
In some embodiments, the determining the set of expanded intermediary users from the set of candidate intermediary users based on the target characteristics of the candidate intermediary users comprises: determining a professor drive based on known fraud users pre-labeled in a user association graph, wherein the history requesting user includes the known fraud users, the expanding set of intermediary users, and the candidate set of intermediary users include the core set of intermediary users.
In some embodiments, the determining a set of core intermediary users of the known fraud users based on the pre-tagged known fraud users in the user association graph includes: determining, based on the user association graph, from among a plurality of users with whom the known fraud user is present, a user satisfying a second preset condition as a core mediating user, thereby determining the core mediating user set, wherein the second preset condition includes: the statistics of the known fraudulent users with which the core mediating users are associated are greater than a preset second threshold.
In some embodiments, the statistics of the known fraudulent users with whom the association exists include: the number of known fraudulent users with whom said association exists; and/or the proportion of known fraudulent users with whom said association exists among all users with whom said association exists.
In some embodiments, the target feature further comprises a relationship feature, the determining the target feature of the candidate intermediary user further comprising: the relationship features of the candidate intermediary users are determined based on the strength of association of the candidate intermediary users with the set of core intermediary users.
In some embodiments, the association strength of the candidate intermediary users with the set of core intermediary users comprises: the core intermediary users may have at least one of a number of core intermediary users in the set with which the candidate intermediary users have the association, a number of times the association has occurred, a time when the association has occurred, a length of time the association has occurred, and an amount of money involved in the association.
In some embodiments, the determining the set of expanded intermediary users based on the target characteristics of the candidate intermediary users comprises: and determining that the candidate intermediary users with the behavior characteristics larger than a preset first classification threshold and the relationship characteristics larger than a preset second classification threshold are expanding intermediary users, thereby determining the expanding intermediary user set.
In some embodiments, the target feature further comprises an application feature, the determining the target feature of the candidate intermediary user further comprising: determining the application features of a candidate intermediary user based on terminals used by the candidate intermediary user, the application features comprising: whether a specific application is carried on a terminal used by the candidate intermediary user or not, wherein the specific application comprises a financial application, and a client provided by the third party service platform is carried on the terminal; and/or whether the candidate intermediary user has used a particular module on the client, the particular module comprising a financial class module.
In some embodiments, the determining the set of expanded intermediary users based on the target characteristics of the candidate intermediary users comprises: determining that the candidate intermediary users with the target characteristics meeting preset judging conditions are expanding intermediary users, thereby determining the expanding intermediary user set, wherein the preset judging conditions comprise: the behavior characteristic is larger than a preset first classification threshold; and at least one of the relationship features is larger than a preset second classification threshold, the specific application is carried on the terminal used by the candidate intermediary user, and the specific module is used by the candidate intermediary user through the client.
In some embodiments, the determining an extended set of intermediary users based on the target features of the candidate intermediary users comprises: and based on the target characteristics of the candidate intermediary users, taking the core intermediary user set as a positive sample, and adopting a positive sample label-free learning algorithm to determine the extended intermediary user set.
In some embodiments, the positive sample label-free learning algorithm is configured to determine a reliable negative sample from the candidate set of intermediary users with a spyware algorithm.
In some embodiments, the identifying potential fraud users taught by the extended set of intermediary users from the users of the third party service platform based on the extended set of intermediary users includes: determining candidate fraud users associated with the extended mediating user set based on the user association graph and the extended mediating user set, thereby determining a candidate fraud user set; and identifying potential fraud users taught by the expanded set of intermediary users from the users of the third party service platform based on the association strength of the candidate fraud users with the expanded set of intermediary users.
In some embodiments, the association strength of the candidate fraud user with the set of expanding intermediary users comprises: the expanding intermediary users centralize at least one of the number of expanding intermediary users with which the candidate fraud users are associated, the number of times the association occurs, the time when the association occurs, the time difference between the time when the association occurs and the time of the service request initiated by the candidate fraud users, the duration of the association occurring and the amount involved in the association occurring.
In some embodiments, the user association graph is constructed based on associations between users using the third party service platform, the nodes of the user association graph including the users, the edges of the user association graph characterizing associations between nodes to which they are connected, the associations comprising: at least one of a funds transfer relationship, a shared device relationship, and a shared network relationship.
The second aspect, a fraud user identification system, comprising at least one storage medium storing at least one instruction set for fraud user identification and at least one processor; the at least one processor being communicatively connected to the at least one storage medium, wherein when the fraud user identification system is running, the at least one processor reads the at least one instruction set and performs the fraud user identification method according to the at least one instruction set, as claimed in any of claims 1-19.
According to the technical scheme, the fraud user identification method and the fraud user identification system provided by the specification can be used for determining the candidate intermediary user set associated with the history request user based on the history request user, and determining the expansion intermediary user set from the candidate intermediary user set based on the behavior characteristics of the candidate intermediary user, so that potential fraud users taught by the expansion intermediary user set are identified. Wherein the behavior characteristics are derived based on the request results of the history requesting user. According to the method and the system provided by the specification, the positioning accuracy of the intermediary user is improved, and meanwhile, a new potential intermediary user can be positioned, so that the accuracy of identifying potential fraud users based on the potential intermediary user is improved, and the occurrence of fraud behaviors is reduced. The fraud user identification method and the fraud user identification system can further combine the behavior characteristics with the relationship characteristics between the candidate intermediary users and the core intermediary users, and further improve the identification accuracy of potential intermediary users.
Other functions of the fraud user identification method and system provided in the present specification will be partially listed in the following description. The following numbers and examples presented will be apparent to those of ordinary skill in the art in view of the description. The inventive aspects of the present specification of fraud user identification methods and systems may be fully explained by practicing or using the methods, devices and combinations described in the following detailed examples.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present description, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a schematic view of an application scenario of a fraud user identification system provided according to an embodiment of the present specification;
FIG. 2 illustrates a hardware architecture diagram of a computing device provided in accordance with an embodiment of the present description;
FIG. 3 shows a flow chart of a fraud user identification method provided according to an embodiment of the present specification;
FIG. 4 shows a flow diagram of another fraud user identification method provided in accordance with an embodiment of the present specification;
FIG. 5 illustrates a schematic diagram of a user association diagram provided in accordance with an embodiment of the present description; and
fig. 6 shows a schematic diagram of a user association diagram provided according to an embodiment of the present description.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the disclosure. Thus, the present description is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. For example, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "includes," and/or "including," when used in this specification, are taken to specify the presence of stated integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
These features and application features of the present description, as well as the operation and function of the related elements of structure, as well as the combination of parts and economies of manufacture, may be significantly improved in view of the following description. All of which form a part of this specification, reference is made to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the description. It should also be understood that the drawings are not drawn to scale.
The flowcharts used in this specification illustrate operations implemented by systems according to some embodiments in this specification. It should be clearly understood that the operations of the flow diagrams may be implemented out of order. Rather, operations may be performed in reverse order or concurrently. Further, one or more other operations may be added to the flowchart. One or more operations may be removed from the flowchart.
For convenience of description, this specification first explains terms that may appear later:
teaching and culprizing: the instigators cause other users to conduct fraud through a service platform (e.g., an online rental service platform) to thereby fraught with merchandise. Accordingly, an intermediary (drive) refers to a person who causes other users to perform fraudulent actions. The other users (taught drive users) refer to users who are taught by the intermediary to perform the fraud. The intermediary typically does not directly conduct the fraud by itself, but rather establishes contact with the guided drive user based on other services provided by the service platform, thereby enabling the guided drive user to conduct the fraud directly by way of the teaching drive.
Positive sample label-free learning algorithm: the PU Learning (Positive Unlabeled Learning) algorithm, i.e. an algorithm that solves the problem of only part of the positive samples having labels by means of machine Learning. The sample set includes a positive sample set P and a non-label sample set U, where positive samples and negative samples may be included in U, but the samples in U are non-labeled.
Spy algorithm: S-EM (The Spy Technique) algorithm, namely the algorithm for realizing PU Learning by adopting a Spy mode. It can be understood that: some positive samples S are randomly selected from the positive sample set P and put into the unlabeled sample set U as Spy samples (Spy), so that the new positive sample set becomes P-S and the new unlabeled sample set becomes u+s. A classifier is then trained using an iterative expectation maximization algorithm (EM, expectation Maximizaiton) to achieve classification of the samples.
Graph neural network: GNN (Graphic Nuaral Network) when one graph a (one set of associated statistics) is input into the GNN, the GNN outputs another graph b (another set of statistics), the graph b typically has no change in connection between its nodes, but changes in node, edge, or global information, as compared to graph a, depending on the final purpose of the GNN, which can be generally divided into three types: identifying the graph (i.e., identifying global information), identifying node information, and identifying edge information. The node information is identified, namely the nodes are classified.
Decision tree model: the DT model for short can be used to solve the classification problem. The decision tree model is generated iteratively in accordance with features of the statistics set based on training statistics. In the case of a trained decision tree model, a target statistics set is input into its root node, passed down and each statistic in the target statistics set is judged by means of internal nodes and then distributed to different leaf nodes, thereby completing the corresponding classification of the statistics set. Thus, the different leaf nodes represent the classification result of the final model.
Currently, fraud user identification may be applied to a variety of scenarios, such as financial lending scenarios, online rental scenarios, and the like. In these scenarios, the user may take at least one of his credit value, relatively low amount funds, items as mortgage, obtain relatively high amount funds, use of items over a period of time, and pay a fee for this. In the above scenario, if a fraud action occurs, for example, the user does not return corresponding funds or goods due, which causes a financial institution providing corresponding financial lending service, a merchant providing rented goods, etc. to suffer a large economic loss, and a service platform providing the online renting service also suffers a large reputation loss, so that development of corresponding services and stability of society are not facilitated. And this is most serious with the drive fraud. The action of the drive fraud can be specifically that the drive (intermediary) contacts other users (i.e. the drive-receiving users) through the service platform and induces the other users to implement fraud through the online leasing service provided by the service platform in a drive and fraud mode so as to achieve the purpose of illegal profit. For example, the intermediary induces the user to apply for the rental of the goods from the merchant and ship the rental goods to the intermediary's address, after expiration of the goods lease, the intermediary does not return the goods, for example, about. While users and merchants suffer economic losses, service platforms suffer greater reputation losses.
Specifically, fig. 1 shows an application scenario schematic diagram of a fraud user identification system 001 provided according to an embodiment of the present disclosure. The fraud user identification system 001 (hereinafter referred to as system 001) may be applied to any scenario in which a user can take at least one of a credit value, money with a relatively small amount, and an item as a mortgage, obtain money with a relatively large amount, use right of the item for a certain period of time, for example, a financial lending scenario, an online renting scenario, and the like. For ease of description and understanding, an online rental scenario will be described below as an example.
As shown in fig. 1, the system 001 may include a plurality of terminals 200 and a server 300. Accordingly, the application scenario of the system 001 may include: a plurality of users 100, a system 001, and a communication network 400.
Wherein the user 100 may be an operator of the terminal 200. One user 100 may correspond to one or more terminals 200. Terminal 200 may interact with user 100. User 100 may trigger an online rental service provided by a third party service platform through operation of terminal 200. For example, the user 100 installs a client (application) of the third party service platform on the terminal 200 and registers with the third party service platform. User 100 may initiate an online rental request with a client to a merchant. After receiving the online lease request, the merchant sends a corresponding authentication request to the server 300. The authentication request may be used to instruct the server 300 to authenticate the corresponding online rental service, thereby determining a corresponding authentication result. I.e., whether the online lease request of the user 100 can pass. The merchant receives the verification result. If the verification result is that the online lease request is agreed (passed), the user 100 and the merchant perform a subsequent related operation to implement the online lease service. The third party service platform may be a comprehensive platform capable of providing a plurality of types of services to the user. The third party service platform may include a variety of service modules to provide a variety of types of services. Wherein the plurality of service modules may include an online rental module. The online rental service is included in the plurality of types of services. The online rental module can provide the online rental service. Specifically, the online leasing module may be an applet or a gadget, and the online leasing module may provide a corresponding online leasing interaction page, so that the user 100, the merchant, and the server 300 may interact based on the online leasing interaction page. As an example, the third party service platform may include, but is not limited to, a digital living open platform, a social open platform, a shopping open platform, a gaming open platform, and so forth.
In some embodiments, terminal 200 may be an intelligent electronic device. For example, the terminal 200 may include a mobile device, a tablet, a notebook, a built-in device of a motor vehicle, or the like, or any combination thereof. For example, the mobile device may include a smart home device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. For example, the smart home device may include a smart television, a desktop computer, or the like, or any combination. For example, the smart mobile device may include a smart phone, personal digital assistant, gaming device, navigation device, etc., or any combination thereof.
In some embodiments, at least one Application (APP) may be installed on the terminal 200. The APP can provide the user 100 with the ability to interact with the outside world via the network 400 as well as an interface. Wherein each of the applications may comprise computer program code. The computer program code may include, but is not limited to, programs, routines, objects, components, statistics structures, procedures, modules, and the like. The at least one application may include a target application. As an example, the target application is a client of the third party service platform. In some embodiments, the target application may have a certain openness, i.e., an external program may add a function of the target application or use a resource of the target terminal, such as a financial lending class applet or a gadget, through an Application Programming Interface (API) or function (function) disclosed by the target application without modifying a source code of the target application. The external program refers to a program other than the target application. The target application may cover various services of daily life. The at least one application may also include a specific application, which may include a financial lending type application. The target application may be based on whether the terminal 200 has the specific application also loaded on the terminal 200.
The server 300 may be a background server of the third party service platform. The server 300 may implement the various functions described above for the target application. It is also understood that the target application is a client application corresponding to the server 300 to provide a local service for the user 100. For example, the user 100 may perform an operation related to the online rental service on the target application mounted on the terminal 200. The target application may then communicate with the server 300 through the terminal 200, so that the server 300 may provide corresponding services (i.e., implement the above-described various functions of the target application), such as an online rental service, to the user 100 through the terminal 200. In some embodiments, the server 300 may be communicatively coupled to a plurality of terminals 200. In some embodiments, terminal 200 may interact with the server 300 via the communication network 400 to send a message to the server 300 to trigger the server 300 to perform operations related to the online rental service.
Among the many users 100 of the third party service platform, there may be some of the drive 120, contacting other users (the drive being taught 140) through the clients of the third party service platform, and the drive being taught 140 implementing fraud through the online rental module. For convenience of description, we define the terminals of the drive 120 as the drive terminals 220, and the terminals of the drive 140 as the drive terminals 240. The driver 120 and the driven 140 can be connected to each other through the server 300 via clients of the third party service platform mounted on the driver terminal 220 and the driven terminal 240.
The communication network 400 is a medium for providing a communication connection between the terminal 200 and the server 300. The communication network 400 may facilitate the exchange of information or statistics. As shown in fig. 1, the terminal 200 and the server 300 may be connected to a communication network 400 and mutually transmit information through the communication network 400. In some embodiments, the communication network 400 may be any type of wired or wireless network, or a combination thereof. For example, the communication network 400 may include a cable network, a wired network, a fiber optic network, a telecommunications network, or the likeA communications network, intranet, internet, local Area Network (LAN), wide Area Network (WAN), wireless Local Area Network (WLAN), metropolitan Area Network (MAN), wide Area Network (WAN), public Switched Telephone Network (PSTN), bluetooth TM Network, zigBee TM A network, a Near Field Communication (NFC) network, or the like. In some embodiments, communication network 400 may include one or more network access points. For example, the communication network 400 may include a wired or wireless network access point, such as a base station or an internet switching point, through which one or more components of the terminal 200, server 300 may connect to the communication network 400 to exchange statistics or information.
It should be understood that the number of terminals 200, servers 300, and communication networks 400 in fig. 1 are merely illustrative. Any number of terminals 200, servers 300, and communication networks 400 may be present in the scene 001, as desired for implementation.
Fig. 2 illustrates a hardware architecture diagram of a computing device 600 provided in accordance with an embodiment of the present description. The computing device 600 may perform the fraud user identification methods described in this specification, which will be described in detail elsewhere in this specification. The fraud user identification method may be performed on the server 300. At this point, computing device 600 may be server 300.
As shown in fig. 2, computing device 600 may include at least one storage medium 630 and at least one processor 620. In some embodiments, computing device 600 may also include a communication port 650 and an internal communication bus 610. Meanwhile, computing device 600 may also include I/O component 660.
Internal communication bus 610 may connect the various system components including storage medium 630, processor 620, and communication ports 650.
I/O component 660 supports input/output between computing device 600 and other components.
The communication port 650 is used for statistic communication of the computing device 600 with the outside world, for example, the communication port 650 may be used for statistic communication between the computing device 600 and the communication network 400. The communication port 650 may be a wired communication port or a wireless communication port.
The storage medium 630 may include a statistics storage. The statistic storage means may be a non-transitory storage medium or a transitory storage medium. For example, the statistic storage may include one or more of magnetic disk 632, read-only storage medium (ROM) 634, or random access storage medium (RAM) 636. The storage medium 630 also includes at least one set of instructions stored in the statistics storage. The instructions are computer program code which may include programs, routines, objects, components, statistics structures, procedures, modules, etc. that perform the fraud user identification methods provided herein.
The at least one processor 620 may be communicatively coupled with at least one storage medium 630 and a communication port 650 via an internal communication bus 610. The at least one processor 620 is configured to execute the at least one instruction set. When the computing device 600 is running, the at least one processor 620 reads the at least one instruction set and, according to the indication of the at least one instruction set, performs the fraud user identification method provided in the present specification. Processor 620 may perform all the steps involved in the fraud user identification method. The processor 620 may be in the form of one or more processors, and in some embodiments, the processor 620 may include one or more hardware processors, such as microcontrollers, microprocessors, reduced Instruction Set Computers (RISC), application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), central Processing Units (CPUs), graphics Processing Units (GPUs), physical Processing Units (PPUs), microcontroller units, digital Signal Processors (DSPs), field Programmable Gate Arrays (FPGAs), advanced RISC Machines (ARM), programmable Logic Devices (PLDs), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustrative purposes only, only one processor 620 is depicted in the computing device 600 in this specification. It should be noted, however, that computing device 600 may also include multiple processors, and thus, operations and/or method steps disclosed in this specification may be performed by one processor as described herein, or may be performed jointly by multiple processors. For example, if the processor 620 of the computing device 600 performs steps a and B in this specification, it should be understood that steps a and B may also be performed by two different processors 620 in combination or separately (e.g., a first processor performs step a, a second processor performs step B, or the first and second processors perform steps a and B together).
Fig. 3 shows a flowchart of a fraud user identification method P100 provided according to an embodiment of the present specification. Fig. 4 shows a flow diagram of another fraud user identification method P100 provided according to an embodiment of the present specification. As previously described, computing device 600 may perform fraud user identification method P100 of the present specification. It should be noted that, in the method P100, the related data of the user 100 acquired is authorized by the user 100.
In general, the computing device 600 may periodically employ P100 to identify fraudulent users at a certain time, for example, so that when receiving the verification request of the merchant, the corresponding target verification result may be quickly fed back to the merchant based on the last identification result of the fraudulent user. Wherein the period of the identification of the fraudulent user by the computing device 600 through the P100 may be one hour, one day, three days, or other longer or shorter time, which is not limited in this specification.
As shown in fig. 3, the method P100 may include:
s110: based on the history request user, a set of candidate intermediary users associated with the history request user presence is determined.
The set of candidate intermediary users may include candidate intermediary users that the drive user is fraud. The candidate intermediary users may be potential intermediary users.
In some embodiments, the history requesting user may include a user who initiated the online lease request using the online lease module within a first historical time window in the past. In general, the history requesting user may be a user of the plurality of users 100. Wherein, the first historical time window may be a month, a half year, a year or other shorter or longer time, which is not limited in this specification.
In step S110, the computing device 600 may determine a set of candidate intermediary users based on the user association graph. The server 300 may obtain various history data of the user 100 using the third party service platform. The plurality of historical data may include: the user 100 logs in at least one of the device employed by the third party service platform each time, the WIFI address of the logged-in device, the MAC address, the type of service used, other users with whom an association (contact) exists, the time at which an association is made with other users, the number of times an association is made with other users, the duration of each association with other users, the matters associated with other users, the amount of money involved in the association with other users. From the above history data, computing device 600 may obtain a user association graph as shown in FIG. 5. The user association graph may characterize the association between users in terms of nodes and edges and their corresponding features based on various historical data as described above. Wherein the nodes of the user association graph may include the user, and the edges of the user association graph may characterize associations between nodes connected thereto. In some embodiments, the associating may include: at least one of a funds transfer relationship (T relationship), a common equipment relationship (U relationship), and a common network relationship (W relationship). Wherein, the relation of the funds circulation can be the relation of the funds circulation, such as the relation of transfer, etc., which is directly generated by the third party service platform. The common network relationship and the common device relationship may be indirect associations between the users, where the common network relationship may include a relationship generated using the same WIFI address or IP address, for example, the common network relationship exists between users logging into different accounts of the third party service platform using the same wireless router; the common device relationship may be a relationship generated by using the same device (which may be understood as using the same MAC address), for example, the common device relationship exists between users who log in to different accounts of the third party service platform using the same MAC address; alternatively, the associating may further include: based on relationships generated by other services of the third party service platform, for example, based on chat relationships generated by friend functions of the third party service platform. Based on the user association graph, the characteristics of any one user node and the edge characteristics of all edges connected with the node can be obtained. By way of example, the point feature may be that the user is a normal (normal) user or a fraudulent (anomalous) user, a credit value of the user, a device employed by the user to log in to the third party service platform each time, a WIFI address of the third party service platform each time the user logs in to the third party service platform, a MAC address of a device used by the user to log in to the third party service platform each time, a service type used by the user to log in to the third party service platform each time, and so on. The edge feature may include: the time at which the corresponding association was made, the number of associations between the same two users, the time of the association, the type of association, the amount to which the association relates, etc. For example, the user node a and the user node B are friends of the third party service platform, and can chat on the third party service platform, and the user node a and the user node B have two transfers in a week, namely, the type of the generated association is a fund circulation association. The associated features (edge features) may include: the transfer time (i.e. the corresponding associated time) is ten am on monday and four pm on friday, the associated times are 2, the amount of two transfers is 200 yuan and 300 yuan, the time from the start to the end of the first chat is 10 minutes, and the first associated time is 10 minutes.
As shown in fig. 4, in some embodiments, step S110 may include: determining, based on a user association graph, a first set of users having the association with the history of requesting users; and determining that the user meeting the first preset condition is the candidate intermediary user from the first user set, thereby determining the candidate intermediary user set. Wherein the first preset condition includes: for the candidate intermediary users, the statistics of the historical request users with which the association exists are greater than a preset first threshold.
The user association graph is considered as an association between global users. To screen out as many potential intermediary users as possible, computing device 600 may first circumscribe the first set of users. Further, the computing device 600 may exclude some normal users from the first set of users, and filter out potential intermediary users to obtain the candidate set of intermediary users, so that the probability that the users in the candidate set of intermediary users are intermediary users is higher than the probability that the users in the first set of users are intermediary users.
Based on this, computing device 600 may request a user to circumscribe the first set of users based on the history in the user association graph. In particular, computing device 600 may tag the history requesting user based on the user association graph and treat users of the user association graph that have the aforementioned association with the history requesting user as the first set of users. To more accurately delineate potential intermediary users, further, the computing device 600 may be triggered by a characteristic of the intermediary user that is different from the normal user, to delineate a set of candidate intermediary users from the first set of users. In order for an intermediary user to successfully implement a drive and thus get as much rental presentation as possible, it is common for the intermediary user to contact a large number of different history-requesting users in a relatively short period of time (e.g., a week), and possibly multiple times with the existence of the history-requesting users. That is, if the statistics of the history request users with which the association exists for one of the first set of users exceeds a certain constraint, then there is a greater likelihood that the one of the first set of users is the intermediary user, i.e., the user is a candidate intermediary user.
In some embodiments, the statistics of the history requesting users with which the association exists may include: the number of history requesting users with which the association exists; and/or the proportion of history requesting users with whom the association exists among all users with whom the association exists. Specifically, for each user in the first set of users, the current user may be the candidate intermediary user when the statistics of the history of requesting users with which the association is present is greater than a first threshold. The set of candidate intermediary users is the candidate intermediary user set. The first threshold may be a number threshold and/or a ratio threshold. Wherein the number threshold corresponds to a judgment threshold of the number of history request users with which the association exists. The proportion threshold corresponds to a judgment threshold of the proportion of the history request users with which the association exists. The first threshold may be obtained based on the historical data in a statistical manner or a machine learning manner, which is not limited in this specification.
Fig. 6 shows a schematic diagram of a user association diagram provided according to an embodiment of the present description. Wherein node 0 represents a normal user, node 1 represents a history requesting user, node 3 represents a user in the first user set associated with the history requesting user, and node 4 represents a candidate mediating user in the first user set satisfying a first preset condition.
Further, the computing device 600 may also obtain other statistics based on other data of candidate intermediary users in the user association graph, for example, features of the candidate intermediary user nodes or features of edges associated with the candidate intermediary user nodes, so as to delineate the extended intermediary user set on the basis of the candidate intermediary user set.
As shown in fig. 3, the method P100 may further include:
s130: an extended set of intermediary users is determined from the set of candidate intermediary users based on the target characteristics of the candidate intermediary users.
As previously described, each node, each edge, has a corresponding feature in the user association graph. Therefore, the target characteristics shown by the candidate intermediary users in the online leasing service scene can be further analyzed, so that the expansion intermediary users are further screened out from the candidate intermediary user set, and the expansion intermediary user set is obtained. Specifically, as shown in fig. 4, step S130 may include: the target features of the candidate intermediary users are determined, and the set of expanded intermediary users is determined based on the target features of the candidate intermediary users.
In some embodiments, the target feature may comprise a behavioral feature. The behavioral characteristics are derived based on the request results of the historical requesting users with which the candidate intermediary users have the association. As previously described, the history requesting user may be a user who initiates an online lease request using the online lease module. The request result may be a request result of an online lease request for the history requesting user. The request result may include at least one of a pass, a reject, a normal transaction, and a violation. The normal transaction may be any state in which the online lease is normally completed, for example, a state in which deposit, shipment, receipt, return, lease, etc. are paid in the online lease process, and for example, the online lease process is normally completed. The pass or reject may be a result of the server 300 verifying the online rental service. The violation may be a violation of a contractual agreement that occurs during an online rental service, such as unrefreshing merchandise on schedule, paying rentals on schedule, and so forth. The request results may characterize the verification results of the online lease request or the status of the online lease request by server 300 over a historical time. The status may include a status of an incomplete online rental service or a status of a completed online rental service. The request results are refusal and default conditions, and have abnormal directivity. We define rejection and violations as abnormal states and other states as normal states. The history of the request results being abnormal states requests that the user has some possibility of fraud. It can also be understood that, for a normal request user in the history request users, the request result of each link of the online rental service is normal; for an abnormal user in the history request users, there is at least one abnormal request result in the request results of a plurality of links of the online leasing service. Thus, if there is an abnormal state in the request result of the online rental service of one user, the user may have a certain risk of fraud. The candidate mediating users may be mediating users if the number and/or the proportion of the plurality of historical requesting users associated with the candidate mediating users are abnormal states exceeds a preset threshold. And, the higher the number and/or proportion of abnormal states, the higher the probability that the current candidate mediating users are mediating users.
Based on this, step S130 may select possible intermediary users from the candidate intermediary user set as expanding intermediary users based at least on the behavior characteristics. In particular, the behavioral characteristics may include the number and/or proportion of the abnormal state as a result of the request. Determining the target feature of the candidate intermediary user in step S130 may include computing device 600 may determine the behavioral feature of the candidate intermediary user based on the number and/or proportion of abnormal states as a result of the request. In step S130, the computing device 600 may obtain behavioral characteristics of each candidate intermediary user in the set of candidate intermediary users. Specifically, for each candidate intermediary user, computing device 600 may obtain the request results of a plurality of historical requesting users that have the association with the current candidate intermediary user and determine the behavioral characteristics of the current candidate intermediary user based on the number and/or proportion of the request results being abnormal states. Wherein a history requesting user may have one or more request results. The proportion of the abnormal state as the request result may be the proportion of the number of abnormal states in all the request results, or the proportion of the number of abnormal states relative to the number of normal states.
In some embodiments, computing device 600 may determine the set of expanded intermediary users from the set of candidate intermediary users based solely on the behavioral characteristics. For example, determining the set of expanded intermediary users based on the target characteristics of the candidate intermediary users in step S130 may include: and determining that the candidate intermediary users with the behavior characteristics larger than a preset first classification threshold are expanding intermediary users, thereby determining the expanding intermediary user set. Wherein the first classification threshold may be a number and/or a ratio. The first classification threshold may be obtained through a statistical manner, or may be obtained through an experimental manner, or may be obtained through an empirical manner, or may be obtained through a machine learning manner, which is not limited in this specification.
In some embodiments, the target feature may also include other features.
Often, the intermediary users are frequently associated with each other within a certain period of time. This association is represented by the fact that there are more edges between different intermediary users on the user association graph. Accordingly, computing device 600 may determine an extended intermediary user based on the core intermediary users that have been determined and from the association between the candidate intermediary users and the core intermediary users. It should be noted that the certain time may be a second historical time window, and the second historical time window may be one month, two months, or other shorter or longer time, which is not limited in this specification.
Based on this, in some embodiments, the target features may also include relational features, as shown in fig. 4. The relationship characteristics may include the strength of association of the candidate intermediary users with the core set of intermediary users. The core mediating users set may be a set of users made up of core mediating users. As shown in fig. 4, in some embodiments, step S130 may further include: based on the known fraud users pre-marked in the user association graph, a set of core intermediary users of the known fraud users is determined. The computing device 600 may obtain known fraud users within a third historical time window based on the third party paymate and pre-label features of user nodes of the known fraud users as known fraud users in the user association graph. The known fraudulent user may be a user who confirms the breach in the history request, such as a user who has expired without returning the merchandise. The history requesting user includes the known fraudulent user therein. It should be noted that the third historical time window may be one year, two years, or other shorter or longer time, which is not limited in this specification.
In some embodiments, the determining a set of core intermediary users of the known fraud users based on the previously tagged known fraud users in the user association graph may include: based on the user association graph, determining a user satisfying a second preset condition as a core intermediary user from a plurality of users associated with the known fraud user, thereby determining the core intermediary user set. Wherein the second preset condition includes: the statistics of the known fraudulent users with which the core mediating users are associated are greater than a preset second threshold. That is, for one user node X in the user association graph, if the statistics of known fraudulent users that are associated with the user node X are greater than a preset second threshold, then this user X is the core mediating user.
In some embodiments, the statistics of the known fraudulent users with whom the association exists may include: the number of known fraud users with whom the association exists, and/or the proportion of known fraud users with whom the association exists among all users with whom the association exists. The second threshold may be a number threshold and/or a ratio threshold. Wherein the number threshold corresponds to a judgment threshold of the number of known fraudulent users with whom said association exists. The proportion threshold corresponds to a judgment threshold of the proportion of known fraudulent users with whom the association exists. Wherein the second threshold may be obtained based on a statistical or machine learning manner, which is not limited in the embodiments of the present specification.
It should be noted that the candidate mediating users set may include the core mediating users set, that is, the core mediating users set is a subset of the candidate mediating users set. The set of extended intermediary users may also include the set of core intermediary users.
After determining the set of core intermediary users, computing device 600 may filter out potential intermediary users based on the association between the candidate intermediary users and the core intermediary users in the set of core intermediary users. The association between the candidate intermediary users and the core intermediary users in the core intermediary user set may be represented by a relational feature. At this time, as shown in fig. 4, determining the target feature of the candidate mediating user in step S130 may further include: based on the strength of association between the candidate intermediary users and the core intermediary users in the core intermediary user set, a relationship characteristic of the candidate intermediary users is determined.
In particular, computing device 600 may obtain a relationship feature for each candidate intermediary user in the set of candidate intermediary users. For each candidate intermediary user, the strength of association of the candidate intermediary user with the set of core intermediary users may include: the set of core intermediary users has at least one of a number of core intermediary users that have the association with the current candidate intermediary user, a number of times the association with the current candidate intermediary user has occurred, a time of the association with the current candidate intermediary user, a length of time the association with the current candidate intermediary user has occurred, and an amount of money involved in the association with the current candidate intermediary user. For example, the greater the number of core intermediary users that have the association, the greater the number of times the association occurs, the tighter the time between each occurrence of the association, the longer the duration of the association occurs, the greater the amount of money involved in the association, the greater the likelihood that the candidate intermediary user is an intermediary user. Thus, computing device 600 may select any one or more of the above data to determine the strength of association of the candidate intermediary user with the set of core intermediary users and determine the relationship characteristics of the candidate intermediary user based on the strength of association of the candidate intermediary user with the set of core intermediary users. In some embodiments, the association strength may be a weighted sum or a normalized weighted sum of the above-described various data. The present specification is not limited thereto.
In some embodiments, computing device 600 may determine the expanded intermediary users based on the behavioral characteristics and the relationship characteristics of the candidate intermediary users to promote accuracy in identifying the set of expanded intermediary users. In some embodiments, determining the set of expanded intermediary users based on the target characteristics of the candidate intermediary users in step S130 may include: and determining that the candidate intermediary users with the behavior characteristics larger than a preset first classification threshold and the relationship characteristics larger than a preset second classification threshold are expanding intermediary users, thereby determining the expanding intermediary user set.
The intermediary user may piggyback (install) a particular application, such as a financial class related application, a loan class related application, etc., with a larger probability difference than a normal user. Thus, whether to use and/or install the particular application may be considered an assist feature in determining whether the candidate intermediary user is an extended intermediary user. Based on this, in some embodiments, the target features further comprise application features. The application features include: whether a specific application is carried on a terminal used by the candidate intermediary user or not; and/or whether the candidate intermediary user has used a particular module on the client, the particular module comprising a financial class module. And the terminal is provided with a client provided by the third party service platform. As shown in fig. 4, in some embodiments, determining the target feature of the candidate intermediary user in step S130 may further include: and determining the application characteristics of the candidate intermediary users based on the terminals used by the candidate intermediary users. The specific application may be installed on the terminal of the candidate intermediary user, which is different from another specific APP of the third party service platform, that is, the candidate intermediary user needs to install the specific APP on the terminal used by the candidate intermediary user, and the third party service platform may learn whether the specific application is also installed on the terminal of the candidate intermediary user through the client also installed on the terminal of the candidate intermediary user. Alternatively, the third party service platform may implement the corresponding specific service through a specific module, for example, the specific module may be a financial module. The module may be an applet or a gadget built on the third party service platform, so that the user may access the specific module directly through the client of the third party service platform without installing the applet, so that the server 300 may learn whether the candidate intermediary user uses the specific module on the client.
In some embodiments, computing device 600 may determine the set of expanded intermediary users based on the behavioral characteristics, the relationship characteristics, and the application characteristics of the candidate intermediary users. In some embodiments, determining the set of extended intermediary users based on the target characteristics of the candidate intermediary users in step S130 may include: the behavioral characteristics are greater than a preset first classification threshold, and at least one of the following conditions: the relation characteristic is larger than a preset second classification threshold, the specific application is carried on the terminal used by the candidate intermediary user, and the specific module is used by the candidate intermediary user through the client.
In some embodiments, determining the set of extended intermediary users based on the target features of the candidate intermediary users in step S130 may be implemented by a machine learning model. For example, computing device 600 may determine the set of expanded intermediary users using a positive sample unlabeled learning algorithm with the set of core intermediary users as positive samples based on the target features of the candidate intermediary users. The target feature may be the behavior feature, the target feature may be the behavior feature and the relationship feature, the target feature may be the behavior feature and the application feature, or the target feature may also be the behavior feature and the relationship feature and the application feature. The positive sample label-free Learning algorithm is a PU Learning algorithm. The PU Learning algorithm can be used to solve the classification problem in the case where there are only a few positive labeled samples (P-set) and most of the samples are unlabeled (U-set). In some embodiments, in the positive sample label-free learning algorithm, a reliable negative sample may be determined from the candidate set of intermediary users with a spyware algorithm (The Spy Technique, s-em for short). In some embodiments, the positive sample unlabeled learning algorithm may also determine a reliable negative sample from the candidate set of intermediary users with other algorithms, such as DNF techniques.
The training process of the PU Learning algorithm can be divided into the following steps:
step 1, identifying reliable negative samples (Reliable Negative, RN) from the U-set in combination with the target features;
step 2, forming a training sample set by using the P set and the RN set, and training to obtain a group of classifier;
step 3, selecting the optimal classifier from the iteratively generated set of classifiers according to a specific selection strategy.
Based on the method, firstly, a sample known fraud user can be obtained through data of the third party service platform in the past for a certain time, and then a core intermediary user sample determined by the sample known fraud user is used as a positive sample set (P set) in the training process of the PU Learning algorithm; a set of unlabeled exemplars (U-set) is obtained based on data of the third party service platform over time, and then reliable negative exemplars (Reliable Negative, RN) are identified from the U-set. Specifically, the implementation process of the spy technology (s-em) is as follows:
step 11: some positive samples S are selected from the P-set and put into the U-set as Spy-samples (Spy). The sample set becomes P-S (P ') and U+S (U') at this time. Wherein the proportion of the number of sub-sets S divided from P may generally be 15%.
step 12: and taking P-S as a new positive sample set, taking U+S as a new unlabeled sample set, and classifying by using an iterative EM algorithm based on the target characteristics. When initializing, taking all unlabeled samples as negative samples, training a classifier, and predicting probability for all samples.
step 13: taking the minimum value of the Spy sample distribution as an RN classification threshold, and considering all samples below the RN classification threshold in U as RNs.
Second, after determining the RN sample set based on the spy technique, a classifier may be trained based on the P set and the RN set. The specific process can be as follows: the method comprises the steps of forming a training set X_train by a P set and an RN set, wherein a sample label 1 in an original P set and a sample label 0 in an original RN set form a training set label y_train; training a classification model by using X_train and y_train; predicting (classifying) the Q set with a classification model, wherein q=u-RN, finding out the data in the Q set predicted as negative to form a set W; combining W and RN as a new RN while excluding W from Q; and forming a new X_train by using the new RN and the new P, correspondingly generating a new label y_train, continuously training a classification model, expanding the RN until W is an empty set, and ending the cycle. Wherein a classification model is obtained for each cycle. The training device may then select an optimal classification model from the plurality of classification models obtained from the plurality of cycles as a trained classifier based on the particular selection strategy. The specific selection strategy may be a prediction error lifting difference, specifically, since the training of the two kinds of models is usually aimed at minimizing the prediction error of the model, when the prediction error lifting difference is smaller than 0, the prediction error of the model of the previous cycle is higher than that of the model of the previous cycle, and at this time, the model of the previous cycle may be selected as the optimal model.
Through the mode, training is carried out to obtain a classification model. The computing device 600 may input the set of candidate intermediary users and the target features of each candidate intermediary user therein into the trained classification model, outputting a set of expanded intermediary users. It should be noted that the set of extended intermediary users also includes the set of core intermediary users.
After identifying the set of extension intermediary users, computing device 600 may mark the extension intermediary users on server 300, or may mark the extension intermediary users in a user association graph.
In summary, we have developed an extended set of intermediary users, and the computing device 600 may then determine the potential fraudulent users based on the extended set of intermediary users, thereby pre-warning ahead of time, avoiding losses. As shown in fig. 3 and 4, the method P100 may further include:
s150: and identifying potential fraud users taught by the extended mediator user set from users of the third-party service platform based on the extended mediator user set.
Wherein the potentially fraudulent user is a user that is being fraudulently using the online rental service with a greater likelihood. Analyzing the fraud occurring in the online rental scene in combination with the historical data can indicate that the fraud user usually makes frequent contact with the middle user of the teaching drive who makes fraud within a certain time before the fraud is implemented, wherein the certain time can be a fourth historical time window, and the fourth historical time window can be one day, one week or other shorter or longer time, and the specification is not limited to this. The contact between the potential fraud user and the expanding intermediary user is represented on the user association graph as: within the fourth historical time window, there are multiple edges between the potential fraud user and the expansion mediating user.
Specifically, in some embodiments, step S150 may include: determining candidate fraud users associated with the extended mediating user set based on the user association graph and the extended mediating user set, thereby determining a candidate fraud user set; and identifying potential fraud users taught by the expansion intermediary user set from users of the third party service platform based on the association strength of the candidate fraud users and the expansion intermediary user set.
Wherein the determining, based on the user association graph and the extended mediating user set, a candidate fraud user set associated with the extended mediating user may include: and determining the users meeting the third preset condition as the candidate fraud users in all users, thereby determining the candidate fraud user set. Wherein, the third preset condition may include: for the candidate fraud users, the statistics of the expanding intermediary users with which the association exists are greater than a preset third threshold. The statistics of the expanding intermediary users with which the association exists may include the number of expanding intermediary users with which the association exists; and/or the proportion of the extended mediator users associated with their presence among all users associated with their presence.
Wherein the association strength of the candidate fraud user with the extended mediating user set may comprise: the expanding intermediary users centralize at least one of the number of expanding intermediary users with which the candidate fraud users are associated, the number of times the association occurs, the time when the association occurs, the time difference between the time when the association occurs and the time of the service request initiated by the candidate fraud users, the duration of the association occurring and the amount involved in the association occurring. The service request initiated by the candidate fraud user can be an online lease request initiated by an online lease module of a third party service platform. In general, the greater the number of expanded intermediary users for which the association exists, the greater the number of times the association occurs, the smaller the time difference between the time when the association occurs and the time when the candidate fraud user initiates the service request, the longer the duration in which the association occurs and the greater the amount of money involved in the association occurs, the greater the strength of the association between the candidate fraud user and the associated expanded intermediary user, and, correspondingly, the greater the likelihood that the candidate fraud user is a potential fraud user. The association strength may be a weighted sum or a normalized weighted sum of the above-described various data. In some embodiments, when the strength of association between the candidate fraud user and the associated expanded intermediary user exceeds a preset strength threshold, computing device 600 may determine that the current candidate fraud user is a potential fraud user.
In some embodiments, computing device 600 may also take the user association graph and the set of expanded intermediary users as inputs to a trained machine learning model in the manner of a machine learning model. The trained machine learning model outputs (identifies) potential fraud users of the third party service platform that are taught by the expanding intermediary users in the expanding intermediary user set. By way of example, the machine learning model may be a GNN model, a decision tree model, or the like. Wherein the decision tree model may comprise: lightGBM (a lightweight gradient hoist), extreme gradient hoist model (eXtreme Gradient Boosting, XGBoost), or random forest (random), etc.
After identifying the potentially fraudulent user, computing device 600 may mark the potentially fraudulent user on server 300. Server 300 may refuse potential fraud to apply for online rentals on the online rental module.
The present specification provides a fraud user identification method P100 and a system 001, which can determine, based on a history request user, a candidate mediating user set associated with the history request user, and determine, based on behavior characteristics of the candidate mediating user, an extended mediating user set from the candidate mediating user set, so as to identify potential fraud users taught by the extended mediating user set. Wherein the behavior characteristics are derived based on the request results of the history requesting user. The method P100 and the system 001 provided by the specification can locate new potential intermediary users while improving the positioning accuracy of the intermediary users, thereby improving the accuracy of identifying potential fraud users based on the potential intermediary users and reducing the occurrence of fraud behaviors. The fraud user identification method P100 and the system 001 provided by the specification can also combine the behavior characteristics with the relationship characteristics between the candidate intermediary users and the core intermediary users, so that the identification accuracy of the potential intermediary users is further improved.
Another aspect of the present description provides a non-transitory storage medium storing at least one set of executable instructions for fraud user identification. When executed by a processor, the executable instructions direct the processor to perform the steps of the fraud user identification method P100 described herein. In some possible implementations, aspects of the specification can also be implemented in the form of a program product including program code. The program code is for causing the computing device 600 to perform the steps of the fraud user identification method P100 described in the present specification, when said program product is run on the computing device 600. The program product for implementing the methods described above may employ a portable compact disc read only memory (CD-ROM) comprising program code and may run on computing device 600. However, the program product of the present specification is not limited thereto, and in the present specification, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer readable storage medium may include a propagated statistics signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated statistic signal may take many forms, including, but not limited to, an electromagnetic signal, an optical signal, or any suitable combination of the preceding. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations of the present specification may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on computing device 600, partly on computing device 600, as a stand-alone software package, partly on computing device 600, partly on a remote computing device, or entirely on a remote computing device.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In view of the foregoing, it will be evident to a person skilled in the art that the foregoing detailed disclosure may be presented by way of example only and may not be limiting. Although not explicitly described herein, those skilled in the art will appreciate that the present description is intended to encompass various adaptations, improvements, and modifications of the embodiments. Such alterations, improvements, and modifications are intended to be proposed by this specification, and are intended to be within the spirit and scope of the exemplary embodiments of this specification.
Furthermore, certain terms in the present description have been used to describe embodiments of the present description. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present description. Thus, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the invention.
It should be appreciated that in the foregoing description of embodiments of the present specification, various features have been combined in a single embodiment, the accompanying drawings, or description thereof for the purpose of simplifying the specification in order to assist in understanding one feature. However, this is not to say that a combination of these features is necessary, and it is entirely possible for a person skilled in the art to label some of the devices as separate embodiments to understand them upon reading this description. That is, embodiments in this specification may also be understood as an integration of multiple secondary embodiments. While each secondary embodiment is satisfied by less than all of the features of a single foregoing disclosed embodiment.
Each patent, patent application, publication of patent application, and other materials, such as articles, books, specifications, publications, documents, articles, etc., cited herein are hereby incorporated by reference. All matters are to be interpreted in a generic and descriptive sense only and not for purposes of limitation, except for any prosecution file history associated therewith, any and all matters not inconsistent or conflicting with this document or any and all matters not complaint file histories which might have a limiting effect on the broadest scope of the claims. Now or later in association with this document. For example, if there is any inconsistency or conflict between the description, definition, and/or use of terms associated with any of the incorporated materials, the terms in the present document shall prevail.
Finally, it is to be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the present specification. Other modified embodiments are also within the scope of this specification. Accordingly, the embodiments disclosed herein are by way of example only and not limitation. Those skilled in the art can adopt alternative arrangements to implement the application in the specification based on the embodiments in the specification. Therefore, the embodiments of the present specification are not limited to the embodiments precisely described in the application.

Claims (20)

1. A fraud user identification method is applied to a third party service platform and comprises the following steps:
determining, based on a history-requesting user, a set of candidate intermediary users associated with the history-requesting user, the set of candidate intermediary users including candidate intermediary users that a memory cell user is fraud;
determining a set of expanded intermediary users from the set of candidate intermediary users based on target features of the candidate intermediary users, the target features comprising behavioral features derived based on request results of historical requesting users having the association with the candidate intermediary users; and
and identifying potential fraud users taught by the extended mediator user set from users of the third-party service platform based on the extended mediator user set.
2. The method of claim 1, wherein the third party service platform comprises an online lease module, the history requesting user comprises a user who initiates an online lease request using the online lease module, and the potential fraud user comprises a potential user who is fraud using the online lease module.
3. The method of claim 1, wherein the determining, based on the history request user, a set of candidate intermediary users with which the history request user is associated comprises:
determining, based on a user association graph, a first set of users having the association with the history of requesting users; and
determining from the first set of users that the user satisfying a first preset condition is the candidate intermediary user, thereby determining the candidate intermediary user set,
wherein the first preset condition includes: for the candidate intermediary users, the statistics of the historical request users with which the association exists are greater than a preset first threshold.
4. The method of claim 3, wherein the statistics of the history of requesting users with which the association exists comprise:
the number of history requesting users with which the association exists; and/or
The history with which the association exists requests the proportion of users among all users with whom the association exists.
5. The method of claim 1, wherein the determining an extended set of intermediary users from the set of candidate intermediary users based on the target characteristics of the candidate intermediary users comprises:
determining the target feature of the candidate mediating user comprises:
determining the behavior feature of the candidate intermediary user based on the number and/or proportion of abnormal states as a result of the request, the request result comprising: at least one of pass, reject, normal transaction, and breach, the abnormal state including at least one of the reject and the breach; and
the set of expanded intermediary users is determined based on the target features of the candidate intermediary users.
6. The method of claim 5, wherein the determining the set of expanded intermediary users based on the target characteristics of the candidate intermediary users comprises:
and determining that the candidate intermediary users with the behavior characteristics larger than a preset first classification threshold are expanding intermediary users, thereby determining the expanding intermediary user set.
7. The method of claim 5, wherein the determining an extended set of intermediary users from the set of candidate intermediary users based on the target characteristics of the candidate intermediary users comprises:
Based on the known fraud users pre-marked in the user association graph, determining a set of core intermediary users of said known fraud users,
wherein the history requesting user comprises the known fraud user, the expanding set of intermediary users and the candidate set of intermediary users comprise the core set of intermediary users.
8. The method of claim 7, wherein said determining a set of core intermediary users of a known fraud user based on pre-tagged known fraud users in a user association graph comprises:
determining, based on the user association graph, from among a plurality of users with whom the known fraud user exists, a user satisfying a second preset condition as a core mediation user, thereby determining the core mediation user set,
wherein the second preset condition includes: the statistics of the known fraudulent users with which the core mediating users are associated are greater than a preset second threshold.
9. The method of claim 8, wherein said statistics of known fraudulent users with whom said association exists comprise:
the number of known fraudulent users with whom said association exists; and/or
The proportion of known fraudulent users with whom the association exists among all users with whom the association exists.
10. The method of claim 7, wherein the target feature further comprises a relationship feature, the determining the target feature of the candidate intermediary user further comprising:
the relationship features of the candidate intermediary users are determined based on the strength of association of the candidate intermediary users with the set of core intermediary users.
11. The method of claim 10, wherein the strength of association of the candidate intermediary users with the set of core intermediary users comprises:
the core intermediary users may have at least one of a number of core intermediary users in the set with which the candidate intermediary users have the association, a number of times the association has occurred, a time when the association has occurred, a length of time the association has occurred, and an amount of money involved in the association.
12. The method of claim 10, wherein the determining the set of expanded intermediary users based on the target characteristics of the candidate intermediary users comprises:
and determining that the candidate intermediary users with the behavior characteristics larger than a preset first classification threshold and the relationship characteristics larger than a preset second classification threshold are expanding intermediary users, thereby determining the expanding intermediary user set.
13. The method of claim 10, wherein the target feature further comprises an application feature, the determining the target feature of the candidate intermediary user further comprising:
determining the application features of a candidate intermediary user based on terminals used by the candidate intermediary user, the application features comprising:
whether a specific application is carried on a terminal used by the candidate intermediary user or not, wherein the specific application comprises a financial application, and a client provided by the third party service platform is carried on the terminal; and/or
Whether the candidate intermediary user has used a particular module on the client, the particular module comprising a financial class module.
14. The method of claim 13, wherein the determining the set of expanded intermediary users based on the target characteristics of the candidate intermediary users comprises:
determining that the candidate intermediary users with the target characteristics meeting preset judging conditions are expanding intermediary users, thereby determining the expanding intermediary user set, wherein the preset judging conditions comprise:
the behavior characteristic is larger than a preset first classification threshold; at least one of the following
The relational characteristic is greater than a preset second classification threshold,
The terminal used by the candidate intermediary user is provided with the specific application, and
the candidate intermediary user has used the particular module through the client.
15. The method of claim 5, wherein the determining an extended set of intermediary users based on the target characteristics of the candidate intermediary users comprises:
and based on the target characteristics of the candidate intermediary users, taking the core intermediary user set as a positive sample, and adopting a positive sample label-free learning algorithm to determine the extended intermediary user set.
16. The method of claim 15 wherein in the positive sample unlabeled learning algorithm, reliable negative samples are determined from the candidate set of intermediary users with a spyware algorithm.
17. The method of claim 1, wherein the identifying potential fraud users taught by the extended set of intermediary users from users of the third party service platform based on the extended set of intermediary users comprises:
determining candidate fraud users associated with the extended mediating user set based on the user association graph and the extended mediating user set, thereby determining a candidate fraud user set; and
and identifying potential fraud users taught by the expansion intermediary user set from users of the third party service platform based on the association strength of the candidate fraud users and the expansion intermediary user set.
18. The method of claim 17, wherein the association strength of the candidate fraud user with the extended set of intermediary users comprises:
the expanding intermediary users centralize at least one of the number of expanding intermediary users with which the candidate fraud users are associated, the number of times the association occurs, the time when the association occurs, the time difference between the time when the association occurs and the time of the service request initiated by the candidate fraud users, the duration of the association occurring and the amount involved in the association occurring.
19. The method of any of claims 3, 7 and 17, wherein the user association graph is constructed based on associations between users using the third party service platform, nodes of the user association graph including the users, edges of the user association graph characterizing associations between nodes to which the user association graph is connected, the associations comprising:
at least one of a funds transfer relationship, a shared device relationship, and a shared network relationship.
20. A fraud user identification system, comprising:
at least one storage medium storing at least one instruction set for fraud user identification; and
At least one processor communicatively coupled to the at least one storage medium,
wherein the at least one processor reads the at least one instruction set and performs the fraud user identification method of any of claims 1-19 according to the at least one instruction set when the fraud user identification system is running.
CN202310936032.1A 2023-07-27 2023-07-27 Fraud user identification method and system Pending CN116957606A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310936032.1A CN116957606A (en) 2023-07-27 2023-07-27 Fraud user identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310936032.1A CN116957606A (en) 2023-07-27 2023-07-27 Fraud user identification method and system

Publications (1)

Publication Number Publication Date
CN116957606A true CN116957606A (en) 2023-10-27

Family

ID=88445994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310936032.1A Pending CN116957606A (en) 2023-07-27 2023-07-27 Fraud user identification method and system

Country Status (1)

Country Link
CN (1) CN116957606A (en)

Similar Documents

Publication Publication Date Title
TWI816647B (en) Systems and methods for implementing deterministic finite automata (dfas) via a blockchain
US11710055B2 (en) Processing machine learning attributes
US11153314B2 (en) Transaction sequence processing with embedded real-time decision feedback
US20210182850A1 (en) System and method for assessing a digital interaction with a digital third party account service
US11715102B2 (en) Dynamically verifying a signature for a transaction
CN104376452A (en) System and method for managing payment success rate on basis of international card payment channel
CN110570188A (en) Method and system for processing transaction requests
US11748247B2 (en) Rule testing framework for executable rules of a service provider system
CN114358147A (en) Training method, identification method, device and equipment of abnormal account identification model
US11854018B2 (en) Labeling optimization through image clustering
CN116957606A (en) Fraud user identification method and system
US20220327218A1 (en) Electronic system for dynamically performing linked evaluation tests to identify faulty code and vulnerabilities in software programs based on efficacy
US20220051270A1 (en) Event analysis based on transaction data associated with a user
US20230125814A1 (en) Credit score management apparatus, credit score management method, and computer readable recording medium
Samet Introduction to online payments risk management
US20180330434A1 (en) Computerized Method, Computer System and Computer Program Product for Providing a User With Personalized Financial Information
US20240177162A1 (en) Systems and methods for machine learning feature generation
US20220207409A1 (en) Timeline reshaping and rescoring
US20230376811A1 (en) Enhancing api access controls with markov chains and hidden markov models
US20230145924A1 (en) System and method for detecting a fraudulent activity on a digital platform
CN110992017B (en) Remote security wind control management method, system and storage medium for off-line prepaid card
US20240112015A1 (en) Training a recurrent neural network machine learning model with behavioral data
CN117034161A (en) Data processing method and related device
WO2024107587A1 (en) Framework for refining machine learning model from pre-trained base model
CN118096170A (en) Risk prediction method and apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination