WO2020199621A1 - 基于知识图谱检测欺诈 - Google Patents

基于知识图谱检测欺诈 Download PDF

Info

Publication number
WO2020199621A1
WO2020199621A1 PCT/CN2019/121458 CN2019121458W WO2020199621A1 WO 2020199621 A1 WO2020199621 A1 WO 2020199621A1 CN 2019121458 W CN2019121458 W CN 2019121458W WO 2020199621 A1 WO2020199621 A1 WO 2020199621A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
nodes
knowledge graph
similarity
fraudulent
Prior art date
Application number
PCT/CN2019/121458
Other languages
English (en)
French (fr)
Inventor
陈振
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2020199621A1 publication Critical patent/WO2020199621A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Definitions

  • the present disclosure relates to the field of network technology, and in particular, to a method, device, and storage medium for detecting fraud based on a knowledge graph.
  • the financial sector has high requirements for transaction risk control, and it is necessary to ensure the safety of capital transactions.
  • O2O Online To Offline, online to offline
  • behaviors such as batch registration of false users, swipe orders, cheating, and transaction fraud.
  • the present disclosure provides a method, a device, and a storage medium for detecting fraud based on a knowledge graph, so as to solve the technical problem that it is difficult to identify batch registration for fraud in related technologies.
  • the first aspect of the embodiments of the present disclosure provides a method for detecting fraud based on a knowledge graph, the method including:
  • Collect user metadata, behavior data, and fraudulent user blacklist select entities in the metadata as nodes, and establish edges based on business binding relationships and co-occurrence relationships between entities in the behavior data to construct Knowledge graph; mark fraudulent nodes in the knowledge graph according to the blacklist of fraudulent users; calculate the distance between unmarked nodes in the knowledge graph and the fraudulent nodes according to the similarity of adjacent nodes in the knowledge graph According to the calculation result, output the fraud risk assessment result of the unmarked node.
  • the calculating the similarity between the unmarked node in the knowledge graph and the fraudulent node includes: calculating the similarity between the two nodes in the knowledge graph according to the following formula:
  • s(a,b) is the similarity between nodes a and b;
  • I(a) represents the set of incident adjacent nodes of the node a, and I i (a) represents the i-th incident phase of the node a Neighbor nodes, Means empty, Indicates that the node a has an incident adjacent node;
  • I(b) represents the set of incident adjacent nodes of the node b, and
  • I j (b) represents the j-th incident adjacent node of the node b, Indicates that the node b has an incident adjacent node;
  • s(I i (a), I j (b)) is the i-th incident adjacent node of the node a and the j-th incident adjacent node of the node b
  • C is the damping coefficient, C ⁇ (0,1).
  • outputting the fraud risk assessment result of the unmarked node according to the calculation result includes: for each unmarked node, calculating the difference between the unmarked node and all the fraudulent nodes The mean value of similarity; if the mean value of the similarity is greater than the threshold, output the fraud risk assessment result that characterizes the unmarked node as a suspected fraud node.
  • collecting the metadata and the behavior data of the user includes: extracting the metadata from a user request log, where the metadata includes device information, account information, card information, and context information
  • the behavior data of the user is obtained, wherein the behavior data includes at least one of the user's order, payment, comment, bound email and mobile phone number, and retrieve password An operation data.
  • a second aspect of the embodiments of the present disclosure provides a device for detecting fraud based on a knowledge graph, the device including:
  • the collection module is used to collect user metadata, behavior data and fraudulent user blacklist;
  • the building module is used to select entities in the metadata as nodes, and based on the business binding between the entities in the behavior data Relationships and co-occurrence relationships to establish edges to construct a knowledge graph;
  • a marking module used to mark fraudulent nodes in the knowledge graph according to the blacklist of fraudulent users;
  • a calculation module used to mark fraudulent nodes in the knowledge graph according to the information of adjacent nodes in the knowledge graph
  • the similarity is to calculate the similarity between the unmarked node and the fraudulent node in the knowledge graph;
  • the output module is used to output the fraud risk assessment result of the unmarked node according to the calculation result.
  • the calculation module is further configured to calculate the similarity of two nodes in the knowledge graph according to the following formula:
  • s(a,b) is the similarity between nodes a and b;
  • I(a) represents the set of incident adjacent nodes of the node a, and I i (a) represents the i-th incident phase of the node a Neighbor nodes, Means empty, Indicates that the node a has an incident adjacent node;
  • I(b) represents the set of incident adjacent nodes of the node b, and
  • I j (b) represents the j-th incident adjacent node of the node b, Indicates that the node b has an incident adjacent node;
  • s(I i (a), I j (b)) is the i-th incident adjacent node of the node a and the j-th incident adjacent node of the node b
  • C is the damping coefficient, C ⁇ (0,1).
  • the output module includes: a calculation sub-module for calculating the mean value of similarity between the unmarked node and all the fraudulent nodes; and an output sub-module for If the average value of the similarity is greater than the threshold, then output a fraud risk assessment result that characterizes the unmarked node as a suspected fraud node.
  • the collection module includes: an extraction sub-module for extracting the metadata from the user request log, wherein the metadata includes at least one of device information, account information, card information, and context information Data; obtaining sub-module, used to obtain the behavior data of the user according to the business process, wherein the behavior data includes the user’s order, payment, comment, bound email phone number and retrieve password in At least one type of operational data.
  • a third aspect of the embodiments of the present disclosure provides a non-volatile computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the processor is prompted to implement any one of the above-mentioned first aspects. The steps of the method described in item.
  • the fourth aspect of the embodiments of the present disclosure provides a device for detecting fraud based on a knowledge graph, including:
  • a memory on which a computer program is stored; and a processor for executing the computer program in the memory to implement the steps of the method in any one of the above-mentioned first aspects.
  • the present disclosure constructs a knowledge graph, and then marks the nodes in the knowledge graph that appear in the blacklist of fraudulent users, and then calculates the similarity between the unmarked nodes in the knowledge graph and the fraudulent nodes, and then according to the Similarity, fraud risk assessment, nodes greater than the threshold are regarded as high-risk fraud nodes. Fraud risk assessment is based on similarity.
  • the present disclosure is suitable for scenarios where false users are registered and identified in batches, which can effectively detect fraudulent users and avoid losses caused by actions such as swiping orders, cheating, and fraudulent transactions. Improved the accuracy of identifying fraud in batch registration.
  • the present disclosure can make full use of user behavior information accumulated in the field of e-commerce or O2O, such as registration, login, ordering, payment, and comments, and the method of constructing a knowledge graph is simple and easy to implement, and has strong performance advantages.
  • the SimRank algorithm adopted in the present disclosure essentially calculates the similarity between nodes in the network, and is suitable for solving the problem of fraud in batch registration.
  • Fig. 1 is a flow chart of a method for detecting fraud based on a knowledge graph according to an exemplary embodiment of the present disclosure.
  • Fig. 2 is a specific flowchart of step S11 in Fig. 1 according to an exemplary embodiment.
  • Fig. 3 is a specific flowchart of step S15 in Fig. 1 according to an exemplary embodiment.
  • Fig. 4 is a block diagram of an apparatus for detecting fraud based on a knowledge graph according to an exemplary embodiment of the present disclosure.
  • Fig. 5 is a block diagram of an output module of an apparatus for detecting fraud based on a knowledge graph according to an exemplary embodiment of the present disclosure.
  • Fig. 6 is a block diagram of a collection module of an apparatus for detecting fraud based on a knowledge graph according to an exemplary embodiment of the present disclosure.
  • Fig. 7 is a hardware structure diagram of a device for detecting fraud based on a knowledge graph according to an exemplary embodiment of the present disclosure.
  • a fraud propagation relationship graph can be constructed based on the flow of funds information, related information of transaction parties, and transaction data.
  • each node in the fraud propagation relationship graph is constructed according to the related information of the transaction party, and the directed edge between each node is constructed based on the fund flow information and the related information of the transaction party.
  • the directed edge can be used to represent the fraud propagation between nodes Relationship; then iteratively update the algorithm according to PageRank to calculate the fraud propagation weight of each node.
  • the PageRank iterative update algorithm essentially ranks the importance of nodes, only the weight of each node is given, and it is not easy to reflect the correlation between the nodes involved in the case and the nodes not involved in the case.
  • Fig. 1 is a flow chart showing a method for detecting fraud based on a knowledge graph according to an exemplary embodiment. As shown in Fig. 1, the method for detecting fraud based on the knowledge graph may include the following steps.
  • S14 Calculate the similarity between the unmarked node in the knowledge graph and the fraudulent node according to the similarity of adjacent nodes in the knowledge graph.
  • the fraudulent user blacklist is to obtain the identified fraudulent users based on business accumulation, manual judgment and historical records, and add the account ID, mobile phone number, and unique device identification number involved in the case.
  • This application does not limit the number of users whose metadata is collected. Generally speaking, the greater the number of users, the more helpful it is to obtain the final fraud risk assessment result.
  • Fig. 2 is a specific flowchart of step S11 in Fig. 1 according to an exemplary embodiment. As shown in Fig. 2, collecting user metadata and behavior data may include the following steps.
  • S111 Extract metadata from the user request log; the metadata includes at least one of device information, account information, card information, and context information.
  • S112 Obtain user behavior data according to the business process; the behavior data includes at least one operation data of the user placing an order, paying, commenting, binding an email phone number, and retrieving a password.
  • the user request log includes user registration request, order request, payment request, etc., and metadata can be extracted from these request data.
  • the device information included in the metadata may be information such as a unique device identification number, a MAC (Media Access Control) address, and IMEI (International Mobile Equipment Identity; International Mobile Equipment Identity).
  • the account information included in the metadata may be account ID, mobile phone number, email address and other information.
  • the card information included in the metadata may be a bank card number.
  • the context information included in the metadata may be IP (Internet Protocol) address, merchant ID, latitude and longitude, Wi-Fi information, request time and other information.
  • step S112 the user's behavior data is obtained, such as the user's order, payment, comment and other behaviors, the binding of the mailbox mobile phone number, and the retrieval of the password.
  • the entity in the metadata can be used as the relationship between the node and the entity in the behavior data to construct a knowledge graph.
  • the entities in the metadata specifically, such as mobile phone number, mailbox, Wi-Fi information, device number, IP address, etc., can be used as entities as long as they actually exist and can be identified by ID.
  • time, action, relationship, etc. cannot be regarded as entities.
  • Knowledge map is a modern theory that combines the theories and methods of applied mathematics, graphics, information visualization technology, information science and other disciplines with metrological citation analysis, co-occurrence analysis and other methods, and can be visually displayed using visual maps.
  • the knowledge graph aims to describe various entities or concepts and their relationships in the real world. It constitutes a huge semantic network graph. Nodes represent entities or concepts, and edges are composed of attributes or relationships.
  • Constructing a knowledge graph using the entities in the metadata as nodes and the relationships between the entities in the behavior data may include the following steps: selecting the entities in the metadata as nodes; then, according to the behavior data
  • the business binding relationships and co-occurrence relationships between entities establish edges to build a knowledge graph.
  • the entities appearing in step S111 can be used as the nodes of the financial knowledge graph.
  • the nodes of the financial knowledge graph can include device number, MAC address, account ID, mobile phone number, email address, card number, merchant ID, IP Address etc.
  • a two-way edge between the nodes is established to form a large heterogeneous network, which is generally called a financial knowledge graph.
  • the service binding relationship may be: a mobile phone number, card number, and email address are bound to an account, and then a binding relationship is formed between these entities.
  • the co-occurrence relationship may be: an account and a merchant appear in the same order log, then a co-occurrence relationship is formed between the account ID and the merchant ID.
  • step S13 is executed to mark the fraud nodes in the knowledge graph according to the blacklist of fraudulent users.
  • step S13 check whether the nodes in the knowledge graph are in the fraudulent user blacklist. If there are nodes in the knowledge graph such as the account ID, mobile phone number, and unique device identification number, then they are in the knowledge graph The node involved in the case is identified as a fraudulent node.
  • step S14 is executed to calculate the similarity between the unmarked nodes in the knowledge graph and the fraud nodes according to the similarity of adjacent nodes in the knowledge graph.
  • Calculating the similarity between the unmarked node in the knowledge graph and the fraudulent node may be based on the SimRank algorithm to calculate the similarity between the unmarked node and the fraudulent node in the knowledge graph.
  • the SimRank algorithm is run to calculate the similarity between two nodes. After several iterations, the similarity of all nodes is updated and tends to converge.
  • the core idea of the SimRank algorithm is that if two points are relatively similar in the neighborhood of the graph (there are many similar neighbors), then the two points should also be relatively similar. That is, whether two points are similar is determined by whether their neighbors are similar.
  • the similarity of two nodes in the knowledge graph can be calculated by the following formula:
  • s(a,b) is the similarity between nodes a and b;
  • I(a) represents the set of incident adjacent nodes of node a (that is, the nodes pointing to node a), and
  • I i (a) represents the value of a
  • the i-th incident adjacent node Means empty, It means that a has incident adjacent nodes;
  • I(b) represents the set of incident adjacent nodes of node b, and
  • I j (b) represents the j-th incident adjacent node of b, Indicates that there is an incident adjacent node in b;
  • s(I i (a), I j (b)) is the similarity between the i-th incident adjacent node of a and the j-th incident adjacent node of b;
  • C is the damping coefficient , C ⁇ (0,1).
  • the similarity between any two points can be expressed by the above formula.
  • the above formula is constantly iteratively updated. It can be simply understood as that, after multiple iterations of the SimRank algorithm, a similarity matrix on the knowledge graph is obtained, and the elements of the matrix represent the similarity between two nodes. After this step, the similarity between the unmarked node and the determined fraud node can be obtained.
  • step S15 is executed, and the fraud risk assessment result of the unmarked node is output according to the calculation result.
  • Fig. 3 is a detailed flowchart of step S15 in Fig. 1 according to an exemplary embodiment. As shown in Fig. 3, outputting the fraud risk assessment result of the unmarked node according to the calculation result may include the following steps.
  • the similarity of the unmarked node A is 0.7. Since the similarity 0.7 of the unmarked node A is greater than the threshold value 0.5, it can be determined that the unmarked node A is also a fraudulent node, that is, the fraud risk assessment result of the unmarked node A as a suspected fraudulent node can be output.
  • This method can be used to perform the above calculation and judgment steps one by one for other unmarked nodes in the constructed knowledge graph, such as B, C, D, etc., and output the fraud risk assessment results of other unmarked nodes.
  • the unmarked node of each attribute corresponds to a threshold, and after obtaining the mean value of the similarity between each of the unmarked nodes and all the fraudulent nodes, the mean value is taken as the unmarked The similarity of nodes; then, it is judged whether the similarity of unmarked nodes with different attributes is greater than the corresponding threshold.
  • the thresholds corresponding to the reference of unmarked nodes with different attributes may be different.
  • nodes with different attributes such as account ID, mobile phone number, card number, mailbox, merchant ID, etc., and set a threshold for nodes with different attributes according to business scenarios.
  • attributes such as account ID, mobile phone number, card number, mailbox, merchant ID, etc.
  • threshold for nodes with different attributes according to business scenarios.
  • mobile phone numbers and card numbers are identified as suspected fraudulent mobile phone numbers
  • all card numbers with a similarity greater than the threshold s are identified as suspected fraud card numbers.
  • the present disclosure constructs a knowledge graph, and then marks the nodes in the knowledge graph that appear in the blacklist of fraudulent users, and then calculates the similarity between the unmarked nodes in the knowledge graph and the fraudulent nodes, and then according to the Similarity, fraud risk assessment, nodes greater than the threshold are regarded as high-risk fraud nodes. Fraud risk assessment is based on similarity.
  • the present disclosure is suitable for scenarios where false users are registered and identified in batches, which can effectively detect fraudulent users and avoid losses caused by actions such as swiping orders, cheating, and fraudulent transactions. Improved the accuracy of identifying fraud in batch registration.
  • the present disclosure can make full use of user behavior information accumulated in the field of e-commerce or O2O, such as registration, login, ordering, payment, and comments, and the method of constructing a knowledge graph is simple and easy to implement, and has strong performance advantages.
  • the SimRank algorithm adopted in the present disclosure essentially calculates the similarity between nodes in the network, and is suitable for solving the problem of fraud in batch registration.
  • Fig. 4 shows a device for detecting fraud based on a knowledge graph according to an exemplary embodiment of the present disclosure.
  • the apparatus 300 for detecting fraud based on the knowledge graph includes the following modules.
  • the collection module 310 is used to collect user metadata, behavior data, and fraudulent user blacklist.
  • the construction module 320 is configured to select entities in the metadata as nodes, and establish edges according to business binding relationships and co-occurrence relationships between entities in the behavior data to construct a knowledge graph.
  • the marking module 330 is configured to mark fraudulent nodes in the knowledge graph according to the blacklist of fraudulent users.
  • the calculation module 340 is configured to calculate the similarity between the unmarked node in the knowledge graph and the fraudulent node according to the similarity of adjacent nodes in the knowledge graph.
  • the output module 350 is configured to output the fraud risk assessment result of the unmarked node according to the calculation result.
  • the construction module 320 is also used to:
  • s(a,b) is the similarity between nodes a and b;
  • I(a) represents the set of incident adjacent nodes of the node a, and I i (a) represents the i-th incident phase of the node a Neighbor nodes, Means empty, Indicates that the node a has an incident adjacent node;
  • I(b) represents the set of incident adjacent nodes of the node b, and
  • I j (b) represents the j-th incident adjacent node of the node b, Indicates that the node b has an incident adjacent node;
  • s(I i (a), I j (b)) is the i-th incident adjacent node of the node a and the j-th incident adjacent node of the node b
  • C is the damping coefficient, C ⁇ (0,1).
  • the output module 350 includes the following sub-modules.
  • the calculation sub-module 351 is used to calculate the average value of the similarity between each of the unmarked nodes and all the fraudulent nodes.
  • the output sub-module 352 is configured to output a fraud risk assessment result indicating that the unmarked node is a suspected fraud node if the average value of the similarity between the unmarked node and all the fraud nodes is greater than a threshold.
  • the collection module 310 includes the following sub-modules.
  • the extraction sub-module 311 is configured to extract the metadata from the user request log, where the metadata includes at least one of device information, account information, card information, and context information.
  • the obtaining sub-module 312 is configured to obtain the behavior data of the user according to the business process, where the behavior data includes at least one of the user's order, payment, comment, binding of email and mobile phone number, and password retrieval An operation data.
  • the present disclosure also provides a non-volatile computer-readable storage medium on which a computer program is stored.
  • the program When executed by a processor, it prompts the processor to implement the knowledge-based graph in any one of the above optional embodiments.
  • Method steps for detecting fraud Method steps for detecting fraud.
  • the present disclosure also provides a device for detecting fraud based on the knowledge graph, including:
  • One or more processors are used to execute programs in the computer-readable storage medium.
  • Fig. 7 is a block diagram showing a device 400 for detecting fraud based on a knowledge graph according to an exemplary embodiment.
  • the device 400 may include a processor 401, a memory 402, a multimedia component 403, an input/output (I/O) interface 404, and a communication component 405.
  • the processor 401 is used to control the overall operation of the device 400 to complete all or part of the steps in the method for detecting fraud based on the knowledge graph.
  • the memory 402 is used to store various types of data to support operations on the device 400, and these data may include, for example, instructions for any application or method operating on the device 400, and application-related data.
  • the memory 402 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (Static Random Access Memory, SRAM for short), electrically erasable programmable read-only memory ( Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read only Memory (Read-Only Memory, ROM for short), magnetic memory, flash memory, magnetic disk or optical disk.
  • the multimedia component 403 may include a screen and an audio component.
  • the screen may be a touch screen, for example, and the audio component is used to output and/or input audio signals.
  • the audio component may include a microphone, which is used to receive external audio signals.
  • the received audio signal can be further stored in the memory 402 or sent through the communication component 405.
  • the audio component also includes at least one speaker for outputting audio signals.
  • the I/O interface 404 provides an interface between the processor 401 and other interface modules.
  • the above-mentioned other interface modules may be keyboards, mice, buttons, etc. These buttons can be virtual buttons or physical buttons.
  • the communication component 405 is used for wired or wireless communication between the apparatus 400 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so the corresponding communication component 405 may include: Wi-Fi module, Bluetooth module, NFC module.
  • the device 400 may be used by one or more application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), digital signal processor (Digital Signal Processor, DSP for short), and digital signal processing equipment (Digital Signal Processing Equipment). Processing Device, DSPD for short), Programmable Logic Device (PLD for short), Field Programmable Gate Array (FPGA for short), controller, microcontroller, microprocessor or other electronic components , Used to implement the above-mentioned method for detecting fraud based on the knowledge graph.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSP Digital Signal Processing Equipment
  • Processing Device DSPD for short
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • a computer-readable storage medium including program instructions, such as a memory 402 including program instructions.
  • the program instructions can be executed by the processor 401 of the device 400 to complete the above-mentioned knowledge-based graph. Methods of detecting fraud.

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开涉及一种基于知识图谱检测欺诈的方法和装置、存储介质。所述基于知识图谱检测欺诈的方法包括:收集用户的元数据、行为数据和欺诈用户黑名单;选取所述元数据中的实体以作为节点,并根据所述行为数据中实体之间的业务绑定关系以及共现关系建立边,以构建知识图谱;根据所述欺诈用户黑名单,标记所述知识图谱中的欺诈节点;根据所述知识图谱中相邻节点的相似性,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度;根据计算结果,输出所述无标记节点的欺诈风险评估结果。

Description

基于知识图谱检测欺诈 技术领域
本公开涉及网络技术领域,具体地,涉及一种基于知识图谱检测欺诈的方法和装置、存储介质。
背景技术
金融领域对交易风险控制的要求较高,需要保证资金交易的安全性。在实际应用中,可能会存在一些欺诈行为。比如,在电商或者O2O(Online To Offline,线上到线下)领域,可能存在虚假用户批量注册、刷单、作弊、交易欺诈等行为。
发明内容
本公开提供一种基于知识图谱检测欺诈的方法和装置、存储介质,以解决相关技术中难以识别批量注册进行欺诈的技术问题。
为实现上述目的,本公开实施例的第一方面,提供一种基于知识图谱检测欺诈的方法,所述方法包括:
收集用户的元数据、行为数据和欺诈用户黑名单;选取所述元数据中的实体以作为节点,并根据所述行为数据中实体之间的业务绑定关系以及共现关系建立边,以构建知识图谱;根据所述欺诈用户黑名单,标记所述知识图谱中的欺诈节点;根据所述知识图谱中相邻节点的相似性,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度;根据计算结果,输出所述无标记节点的欺诈风险评估结果。
可选地,所述计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度,包括:按照以下公式计算所述知识图谱中两个节点的相似度:
Figure PCTCN2019121458-appb-000001
其中,s(a,b)是节点a和b的相似度;I(a)表示所述节点a的入射相邻节点的集合,I i(a)表示所述节点a的第i个入射相邻节点,
Figure PCTCN2019121458-appb-000002
表示空,
Figure PCTCN2019121458-appb-000003
表示所述节点a存在入射相邻节点;I(b)表示所述节点b的入射相邻节点的集合,I j(b)表示所述节点b的第j个入射相 邻节点,
Figure PCTCN2019121458-appb-000004
表示所述节点b存在入射相邻节点;s(I i(a),I j(b))是所述节点a的第i个入射相邻节点与所述节点b的第j个入射相邻节点的相似度;C是阻尼系数,C∈(0,1)。
可选地,根据所述计算结果,输出所述无标记节点的所述欺诈风险评估结果,包括:对于每个所述无标记节点,计算所述无标记节点与所有所述欺诈节点之间的相似度的均值;若所述相似度的均值大于阈值,则输出表征所述无标记节点为疑似欺诈节点的欺诈风险评估结果。
可选地,收集所述用户的所述元数据、所述行为数据,包括:在用户请求日志中提取所述元数据,其中,所述元数据包括设备信息、账号信息、卡信息、上下文信息中的至少一种数据;根据业务流程,获取所述用户的所述行为数据,其中,所述行为数据包括所述用户下单、支付、评论、绑定邮箱手机号和找回密码中的至少一种操作数据。
本公开实施例的第二方面,提供一种基于知识图谱检测欺诈的装置,所述装置包括:
收集模块,用于收集用户的元数据、行为数据和欺诈用户黑名单;构建模块,用于选取所述元数据中的实体以作为节点,并根据所述行为数据中实体之间的业务绑定关系以及共现关系建立边以构建知识图谱;标记模块,用于根据所述欺诈用户黑名单,标记所述知识图谱中的欺诈节点;计算模块,用于根据所述知识图谱中相邻节点的相似性,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度;输出模块,用于根据计算结果,输出所述无标记节点的欺诈风险评估结果。
可选地,所述计算模块还用于:按照以下公式计算所述知识图谱中两个节点的相似度:
Figure PCTCN2019121458-appb-000005
其中,s(a,b)是节点a和b的相似度;I(a)表示所述节点a的入射相邻节点的集合,I i(a)表示所述节点a的第i个入射相邻节点,
Figure PCTCN2019121458-appb-000006
表示空,
Figure PCTCN2019121458-appb-000007
表示所述节点a存在入射相邻节点;I(b)表示所述节点b的入射相邻节点的集合,I j(b)表示所述节点b的第j个入射相邻节点,
Figure PCTCN2019121458-appb-000008
表示所述节点b存在入射相邻节点;s(I i(a),I j(b))是所述节点a的第i个入射相邻节点与所述节点b的第j个入射相邻节点的相似度;C是阻尼系数,C∈(0,1)。
可选地,对于每个所述无标记节点,所述输出模块包括:计算子模块,用于计 算所述无标记节点与所有所述欺诈节点之间的相似度的均值;输出子模块,用于若所述相似度的均值大于阈值,则输出表征所述无标记节点为疑似欺诈节点的欺诈风险评估结果。
可选地,所述收集模块包括:提取子模块,用于在用户请求日志中提取所述元数据,其中,所述元数据包括设备信息、账号信息、卡信息、上下文信息中的至少一种数据;获取子模块,用于根据业务流程,获取所述用户的所述行为数据,其中,所述行为数据包括所述用户下单、支付、评论、绑定邮箱手机号和找回密码中的至少一种操作数据。
本公开实施例的第三方面,提供一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,促使所述处理器实现上述第一方面中任一项所述方法的步骤。
本公开实施例的第四方面,提供一种基于知识图谱检测欺诈的装置,包括:
存储器,其上存储有计算机程序;以及处理器,用于执行所述存储器中的所述计算机程序,以实现上述第一方面中任一项所述方法的步骤。
采用上述技术方案,至少能够达到如下技术效果。
本公开构建知识图谱,然后,对知识图谱中出现在欺诈用户黑名单中的节点进行标记,接着,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度,再根据所述相似度,进行欺诈风险评估,大于阈值的节点作为高风险欺诈节点。基于相似度来进行欺诈风险评估,本公开适合虚假用户批量注册识别的场景,可有效检测欺诈用户,避免刷单、作弊、欺诈交易等行为带来的损失。提高了识别批量注册进行欺诈的准确性。
本公开能够充分利用电商或者O2O领域中积累的注册,登陆,下单,支付,评论等用户行为的信息,构建的知识图谱的方式简单易实现,具有较强的性能优势。
本公开中采用的SimRank算法本质上是计算网络中节点之间的相似度,适合解决批量注册进行欺诈的问题。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的 具体实施方式一起用于解释本公开,但并不构成对本公开的限制。
图1是本公开一示例性实施例示出的一种基于知识图谱检测欺诈的方法流程图。
图2是根据一示例性实施例示出的图1中步骤S11的具体流程图。
图3是根据一示例性实施例示出的图1中步骤S15的具体流程图。
图4是本公开一示例性实施例示出的一种基于知识图谱检测欺诈的装置框图。
图5是本公开一示例性实施例示出的一种基于知识图谱检测欺诈的装置的输出模块框图。
图6是本公开一示例性实施例示出的一种基于知识图谱检测欺诈的装置的收集模块框图。
图7是本公开一示例性实施例示出的一种基于知识图谱检测欺诈的装置的硬件结构图。
具体实施方式
以下结合附图对本公开的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本公开,并不用于限制本公开。
对于金融领域的欺诈案件,由于受害者没有举报,或者由于其他原因而未被发现。这部分未被发现的欺诈案件对金融安全防控可能会带来很大的风险,比如,欺诈者的账户可以继续骗取其他受害者的钱财,造成资金损失。因此,定位欺诈者,识别隐藏的欺诈案件,对于金融安全防控具有很大的意义。
在一个例子中,可以根据资金流向信息、交易方关联信息和交易数据构建欺诈传播关系图。其中,根据交易方关联信息构建欺诈传播关系图中的各个节点,根据资金流向信息和交易方关联信息构建各个节点之间的有向边,该有向边可以用于表示节点之间的欺诈传播关系;然后按照PageRank迭代更新算法,计算每个节点的欺诈传播权重。
上述例子中,只是针对交易场景,识别交易欺诈,无法泛化到其他欺诈场景,如在电商或者O2O领域,识别虚假用户批量注册、刷单、作弊、交易欺诈等行为。另外,PageRank迭代更新算法本质上是对节点进行重要性排名,给出的只是每个节点的权重,不易体现出涉案节点和非涉案节点之间的关联性。
图1是根据一示例性实施例示出的一种基于知识图谱检测欺诈的方法流程图。 如图1所示,该基于知识图谱检测欺诈的方法可以包括以下步骤。
S11,收集用户的元数据、行为数据和欺诈用户黑名单。
S12,选取所述元数据中的实体以作为节点,并根据所述行为数据中实体之间的业务绑定关系以及共现关系建立边,以构建知识图谱。
S13,根据所述欺诈用户黑名单,标记所述知识图谱中的欺诈节点。
S14,根据所述知识图谱中相邻节点的相似性,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度。
S15,根据计算结果,输出所述无标记节点的欺诈风险评估结果。
在步骤S11中,所述欺诈用户黑名单是根据业务的积累、人工判断和历史记录的案件,获取已经被识别出来的欺诈用户,并把涉案的账户ID,手机号,设备唯一标识号均加入欺诈黑名单中。本申请不限定被收集元数据的用户的数量,通常来讲,用户数量越多,对于最后得到欺诈风险评估结果越有帮助。
图2是根据一示例性实施例示出的图1中步骤S11的具体流程图,如图2所示,收集用户的元数据、行为数据,可以包括以下步骤。
S111,在用户请求日志中提取元数据;所述元数据包括设备信息、账号信息、卡信息、上下文信息中的至少一种数据。
S112,根据业务流程,获取用户的行为数据;所述行为数据包括用户下单、支付、评论、绑定邮箱手机号和找回密码中的至少一种操作数据。
其中,在步骤S111中,用户请求日志包括用户注册请求、下单请求、支付请求等,从这些请求数据中可以提取元数据。所述元数据所包括的设备信息可以是设备唯一标识号、MAC(Media Access Control)地址、IMEI(International Mobile Equipment Identity;国际移动设备识别码)等信息。所述元数据所包括的账号信息可以是账户ID、手机号、邮箱等信息。所述元数据所包括的卡信息可以是银行***。所述元数据所包括的上下文信息可以是IP(Internet Protocol)地址、商户ID、经纬度、Wi-Fi信息、请求时间等信息。
在步骤S112中,根据具体的业务流程,获取用户的行为数据,比如用户下单、支付、评论等行为,绑定邮箱手机号、找回密码等操作。
在收集到用户的元数据、行为数据和欺诈用户黑名单后,即可以以所述元数据 中的实体作为节点和所述行为数据中实体之间的关系构建知识图谱。元数据中的实体,具体是指,如手机号、邮箱、Wi-Fi信息、设备号、IP地址等,只要是实际存在且可以有ID来标识的都可以作为实体。但如时间、动作、关系等不能作为实体。知识图谱是将应用数学、图形学、信息可视化技术、信息科学等学科的理论与方法与计量学引文分析、共现分析等方法结合,并可以利用可视化的图谱进行形象地展示的现代理论。知识图谱旨在描述真实世界中存在的各种实体或概念及其关系,其构成一张巨大的语义网络图,节点表示实体或概念,边则由属性或关系构成。
以所述元数据中的实体作为节点和所述行为数据中实体之间的关系构建知识图谱,可以包括以下步骤:选取所述元数据中的实体以作为节点;然后,根据所述行为数据中实体之间的业务绑定关系以及共现关系建立边,以构建知识图谱。
在构建知识图谱的过程中,可以把步骤S111中出现的实体作为金融知识图谱的节点,金融知识图谱的节点可以包括设备号、MAC地址、账户ID、手机号、邮箱、***、商户ID、IP地址等。
获取作为节点的实体后,依据实体之间的业务绑定关系、共现关系,建立节点之间的双向边,从而形成一个大的异构网络,一般称之为金融知识图谱。其中,所述业务绑定关系可以为:一个账号绑定了手机号、***和邮箱,则在这几个实体之间形成了绑定关系。所述共现关系可以为:一个账号和一个商户出现在同一订单日志中,则账户ID和商户ID之间形成共现关系。
在构建知识图谱后,执行步骤S13,根据所述欺诈用户黑名单,标记所述知识图谱中的欺诈节点。构建知识图谱后,查找所述知识图谱中的节点是否存在于欺诈用户黑名单中,如果所述知识图谱中存在涉案账户ID、手机号、设备唯一标示号等节点,则在所述知识图谱中将涉案节点标识为欺诈节点。
标记所述知识图谱中的欺诈节点后,执行步骤S14,根据所述知识图谱中相邻节点的相似性,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度。计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度,可以是基于SimRank算法计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度。在该知识图谱中,运行SimRank算法,来计算两两节点之间的相似度。经过若干次迭代后,所有节点的相似度得以更新并趋于收敛。
SimRank算法的核心思想是,如果两个点在图中的邻域比较相似(有很多相似 邻居),则这两个点也应该比较相似。即两个点是否相似,由他们的邻居是否相似来决定。
可以通过以下公式计算所述知识图谱中两个节点的相似度:
Figure PCTCN2019121458-appb-000009
其中,s(a,b)是节点a和b的相似度;I(a)表示节点a的入射相邻节点(即指向结点a的结点)的集合,I i(a)表示a的第i个入射相邻节点,
Figure PCTCN2019121458-appb-000010
表示空,
Figure PCTCN2019121458-appb-000011
表示a存在入射相邻节点;I(b)表示节点b的入射相邻节点的集合,I j(b)表示b的第j个入射相邻节点,
Figure PCTCN2019121458-appb-000012
表示b存在入射相邻节点;s(I i(a),I j(b))是a的第i个入射相邻节点与b的第j个入射相邻节点的相似度;C是阻尼系数,C∈(0,1)。
C的含义可以这么理解:假如I(a)=I(b)={A},A是a的相邻节点。按照上式计算出s(a,b)=C*s(A,A)=C,很明显,C应该大于0小于1,所以C∈(0,1),可选地,C可以设为0.8。该公式可以简单解释为节点a和b的相似度等于a的邻居和b的邻居的相似度均值,乘以系数C。
在上述知识图谱上,任意两点之间的相似度都可以用上述公式来表示。在SimRank执行环节,就是上述公式不断迭代更新的过程。可以简单理解为,经过SimRank算法的多次迭代,得到知识图谱上的相似度矩阵,矩阵的元素表示两个节点之间的相似度。经过该步骤,即可得到所述无标记节点与已经确定的涉案欺诈节点的相似度。
在获取所述无标记节点与所述欺诈节点之间的相似度后,执行步骤S15,根据计算结果,输出所述无标记节点的欺诈风险评估结果。图3是根据一示例性实施例示出的图1中步骤S15的具体流程图,如图3所示,根据计算结果,输出所述无标记节点的欺诈风险评估结果,可以包括以下步骤。
S151,计算每个所述无标记节点与所有所述欺诈节点之间的相似度的均值。
S152,若所述无标记节点与所有所述欺诈节点之间的相似度的均值大于阈值,则输出表征所述无标记节点为疑似欺诈节点的欺诈风险评估结果。
举例来讲,构建的所述知识图谱中有十个欺诈节点。经过迭代后,得到其中一个无标记节点A分别与十个所述欺诈节点的相似度。接着,计算这十个相似度的均值,将均值当作所述无标记节点A的风险大小评分。
假设,所述阈值为0.5,这十个相似度的均值为0.7,则所述无标记节点A的相似度为0.7。由于所述无标记节点A的相似度0.7大于所述阈值0.5,则可以认定所述无标记节点A也为欺诈节点,即可以输出所述无标记节点A为疑似欺诈节点的欺诈风险评估结果。
可以使用该方法,对于构建的所述知识图谱中的其他无标记节点,如B、C、D等,逐一进行上述计算和判断的步骤,输出其他无标记节点的欺诈风险评估结果。
可选地,每种属性的无标记节点对应一个阈值,在获取所述每个所述无标记节点与所有所述欺诈节点之间的相似度的均值后,将所述均值作为所述无标记节点的相似度;接着,判断不同属性的无标记节点的相似度是否大于对应的阈值,不同属性的无标记节点对应参考的阈值可以不相同。
例如,可以把账户ID,手机号,***,邮箱,商户ID等不同属性的节点拆分,根据业务场景,不同属性的节点分别设定一个阈值。以手机号和***为例,把所有相似度大于阈值h的手机号认定为疑似欺诈手机号,把所有相似度大于阈值s的***认定为疑似欺诈***。
采用上述技术方案,至少能够达到如下技术效果。
本公开构建知识图谱,然后,对知识图谱中出现在欺诈用户黑名单中的节点进行标记,接着,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度,再根据所述相似度,进行欺诈风险评估,大于阈值的节点作为高风险欺诈节点。基于相似度来进行欺诈风险评估,本公开适合虚假用户批量注册识别的场景,可有效检测欺诈用户,避免刷单、作弊、欺诈交易等行为带来的损失。提高了识别批量注册进行欺诈的准确性。
本公开能够充分利用电商或者O2O领域中积累的注册,登陆,下单,支付,评论等用户行为的信息,构建的知识图谱的方式简单易实现,具有较强的性能优势。
本公开中采用的SimRank算法本质上是计算网络中节点之间的相似度,适合解决批量注册进行欺诈的问题。
对于图1所示的方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本公开所必须的。
图4是本公开一示例性实施例示出的一种基于知识图谱检测欺诈的装置。如图4 所示,所述基于知识图谱检测欺诈的装置300包括以下模块。
收集模块310,用于收集用户的元数据、行为数据和欺诈用户黑名单。
构建模块320,用于选取所述元数据中的实体以作为节点,并根据所述行为数据中实体之间的业务绑定关系以及共现关系建立边,以构建知识图谱。
标记模块330,用于根据所述欺诈用户黑名单,标记所述知识图谱中的欺诈节点。
计算模块340,用于根据所述知识图谱中相邻节点的相似性,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度。
输出模块350,用于根据计算结果,输出所述无标记节点的欺诈风险评估结果。
可选地,所述构建模块320还用于:
按照以下公式计算所述知识图谱中两个节点的相似度;
Figure PCTCN2019121458-appb-000013
其中,s(a,b)是节点a和b的相似度;I(a)表示所述节点a的入射相邻节点的集合,I i(a)表示所述节点a的第i个入射相邻节点,
Figure PCTCN2019121458-appb-000014
表示空,
Figure PCTCN2019121458-appb-000015
表示所述节点a存在入射相邻节点;I(b)表示所述节点b的入射相邻节点的集合,I j(b)表示所述节点b的第j个入射相邻节点,
Figure PCTCN2019121458-appb-000016
表示所述节点b存在入射相邻节点;s(I i(a),I j(b))是所述节点a的第i个入射相邻节点与所述节点b的第j个入射相邻节点的相似度;C是阻尼系数,C∈(0,1)。
可选地,如图5所示,所述输出模块350包括以下子模块。
计算子模块351,用于计算每个所述无标记节点与所有所述欺诈节点之间的相似度的均值。
输出子模块352,用于若所述无标记节点与所有所述欺诈节点之间的相似度的均值大于阈值,则输出表征所述无标记节点为疑似欺诈节点的欺诈风险评估结果。
可选地,如图6所示,所述收集模块310包括以下子模块。
提取子模块311,用于在用户请求日志中提取所述元数据,其中,所述元数据包括设备信息、账号信息、卡信息、上下文信息中的至少一种数据。
获取子模块312,用于根据业务流程,获取所述用户的所述行为数据,其中,所 述行为数据包括所述用户下单、支付、评论、绑定邮箱手机号和找回密码中的至少一种操作数据。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
本公开还提供一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时,促使所述处理器实现上述任一项可选实施例所述基于知识图谱检测欺诈的方法步骤。
本公开还提供一种基于知识图谱检测欺诈的装置,包括:
上述的计算机可读存储介质;以及
一个或者多个处理器,用于执行所述计算机可读存储介质中的程序。
图7是根据一示例性实施例示出的一种基于知识图谱检测欺诈的装置400的框图。如图7所示,该装置400可以包括:处理器401,存储器402,多媒体组件403,输入/输出(I/O)接口404,以及通信组件405。
其中,处理器401用于控制该装置400的整体操作,以完成上述的基于知识图谱检测欺诈的方法中的全部或部分步骤。存储器402用于存储各种类型的数据以支持在该装置400的操作,这些数据例如可以包括用于在该装置400上操作的任何应用程序或方法的指令,以及应用程序相关的数据。该存储器402可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,例如静态随机存取存储器(Static Random Access Memory,简称SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,简称EEPROM),可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,简称EPROM),可编程只读存储器(Programmable Read-Only Memory,简称PROM),只读存储器(Read-Only Memory,简称ROM),磁存储器,快闪存储器,磁盘或光盘。多媒体组件403可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏,音频组件用于输出和/或输入音频信号。例如,音频组件可以包括一个麦克风,麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器402或通过通信组件405发送。音频组件还包括至少一个扬声器,用于输出音频信号。I/O接口404为处理器401和其他接口模块之间提供接口,上述其他接口模块可以是键盘,鼠标,按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件405用于该装置400与其他设备之间进行有线或无线通信。无线通信,例如Wi-Fi,蓝牙,近场通信(Near Field  Communication,简称NFC),2G、3G或4G,或它们中的一种或几种的组合,因此相应的该通信组件405可以包括:Wi-Fi模块,蓝牙模块,NFC模块。
在一示例性实施例中,装置400可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,简称ASIC)、数字信号处理器(Digital Signal Processor,简称DSP)、数字信号处理设备(Digital Signal Processing Device,简称DSPD)、可编程逻辑器件(Programmable Logic Device,简称PLD)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述的基于知识图谱检测欺诈的方法。
在另一示例性实施例中,还提供了一种包括程序指令的计算机可读存储介质,例如包括程序指令的存储器402,上述程序指令可由装置400的处理器401执行以完成上述的基于知识图谱检测欺诈的方法。
以上结合附图详细描述了本公开的优选实施方式,但是,本公开并不限于上述实施方式中的具体细节,在本公开的技术构思范围内,可以对本公开的技术方案进行多种简单变型,这些简单变型均属于本公开的保护范围。
另外需要说明的是,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合。为了避免不必要的重复,本公开对各种可能的组合方式不再另行说明。
此外,本公开的各种不同的实施方式之间也可以进行任意组合,只要其不违背本公开的思想,其同样应当视为本公开所公开的内容。

Claims (10)

  1. 一种基于知识图谱检测欺诈的方法,其特征在于,所述方法包括:
    收集用户的元数据、行为数据和欺诈用户黑名单;
    选取所述元数据中的实体以作为节点,并根据所述行为数据中实体之间的业务绑定关系以及共现关系建立边,以构建知识图谱;
    根据所述欺诈用户黑名单,标记所述知识图谱中的欺诈节点;
    根据所述知识图谱中相邻节点的相似性,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度;
    根据计算结果,输出所述无标记节点的欺诈风险评估结果。
  2. 根据权利要求1所述的方法,其特征在于,计算所述知识图谱中无标记节点与所述欺诈节点之间的所述相似度,包括:
    按照以下公式计算所述知识图谱中两个节点的相似度:
    Figure PCTCN2019121458-appb-100001
    其中,s(a,b)是节点a和b的相似度;I(a)表示所述节点a的入射相邻节点的集合,I i(a)表示所述节点a的第i个入射相邻节点,
    Figure PCTCN2019121458-appb-100002
    表示空,
    Figure PCTCN2019121458-appb-100003
    表示所述节点a存在入射相邻节点;I(b)表示所述节点b的入射相邻节点的集合,I j(b)表示所述节点b的第j个入射相邻节点,
    Figure PCTCN2019121458-appb-100004
    表示所述节点b存在入射相邻节点;s(I i(a),I j(b))是所述节点a的第i个入射相邻节点与所述节点b的第j个入射相邻节点的相似度;C是阻尼系数,C∈(0,1)。
  3. 根据权利要求1所述的方法,其特征在于,根据所述计算结果,输出所述无标记节点的所述欺诈风险评估结果,包括:
    对于每个所述无标记节点,
    计算所述无标记节点与所有所述欺诈节点之间的相似度的均值;
    若所述相似度的均值大于阈值,则输出表征所述无标记节点为疑似欺诈节点的欺诈风险评估结果。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,收集所述用户的所述元数据、所述行为数据,包括:
    在用户请求日志中提取所述元数据,其中,所述元数据包括设备信息、账号信息、卡信息、上下文信息中的至少一种数据;
    根据业务流程,获取所述用户的所述行为数据,其中,所述行为数据包括所述用户下单、支付、评论、绑定邮箱手机号和找回密码中的至少一种操作数据。
  5. 一种基于知识图谱检测欺诈的装置,其特征在于,所述装置包括:
    收集模块,用于收集用户的元数据、行为数据和欺诈用户黑名单;
    构建模块,用于选取所述元数据中的实体以作为节点,并根据所述行为数据中实体之间的业务绑定关系以及共现关系建立边,以构建知识图谱;
    标记模块,用于根据所述欺诈用户黑名单,标记所述知识图谱中的欺诈节点;
    计算模块,用于根据所述知识图谱中相邻节点的相似性,计算所述知识图谱中无标记节点与所述欺诈节点之间的相似度;
    输出模块,用于根据计算结果,输出所述无标记节点的欺诈风险评估结果。
  6. 根据权利要求5所述的装置,其特征在于,所述计算模块还用于:
    按照以下公式计算所述知识图谱中两个节点的相似度:
    Figure PCTCN2019121458-appb-100005
    其中,s(a,b)是节点a和b的相似度;I(a)表示所述节点a的入射相邻节点的集合,I i(a)表示所述节点a的第i个入射相邻节点,
    Figure PCTCN2019121458-appb-100006
    表示空,
    Figure PCTCN2019121458-appb-100007
    表示所述节点a存在入射相邻节点;I(b)表示所述节点b的入射相邻节点的集合,I j(b)表示所述节点b的第j个入射相邻节点,
    Figure PCTCN2019121458-appb-100008
    表示所述节点b存在入射相邻节点;s(I i(a),I j(b))是所述节点a的第i个入射相邻节点与所述节点b的第j个入射相邻节点的相似度;C是阻尼系数,C∈(0,1)。
  7. 根据权利要求5所述的装置,其特征在于,对于每个所述无标记节点,所述输出模块包括:
    计算子模块,用于计算所述无标记节点与所有所述欺诈节点之间的相似度的均值;
    输出子模块,用于若所述相似度的均值大于阈值,则输出表征所述无标记节点为疑似欺诈节点的欺诈风险评估结果。
  8. 根据权利要求5至7中任一项所述的装置,其特征在于,所述收集模块包括:
    提取子模块,用于在用户请求日志中提取所述元数据,其中,所述元数据包括设备信息、账号信息、卡信息、上下文信息中的至少一种数据;
    获取子模块,用于根据业务流程,获取所述用户的所述行为数据,其中,所述行为数据包括所述用户下单、支付、评论、绑定邮箱手机号和找回密码中的至少一种操作数 据。
  9. 一种非易失性计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时,促使所述处理器实现权利要求1至4中任一项所述方法的步骤。
  10. 一种基于知识图谱检测欺诈的装置,其特征在于,包括:
    存储器,其上存储有计算机程序;以及
    处理器,用于执行所述存储器中的所述计算机程序,以实现权利要求1至4中任一项所述方法的步骤。
PCT/CN2019/121458 2019-04-01 2019-11-28 基于知识图谱检测欺诈 WO2020199621A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910258370.8A CN110111110A (zh) 2019-04-01 2019-04-01 基于知识图谱检测欺诈的方法和装置、存储介质
CN201910258370.8 2019-04-01

Publications (1)

Publication Number Publication Date
WO2020199621A1 true WO2020199621A1 (zh) 2020-10-08

Family

ID=67484924

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121458 WO2020199621A1 (zh) 2019-04-01 2019-11-28 基于知识图谱检测欺诈

Country Status (2)

Country Link
CN (1) CN110111110A (zh)
WO (1) WO2020199621A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200382A (zh) * 2020-10-27 2021-01-08 支付宝(杭州)信息技术有限公司 一种风险预测模型的训练方法和装置

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111110A (zh) * 2019-04-01 2019-08-09 北京三快在线科技有限公司 基于知识图谱检测欺诈的方法和装置、存储介质
CN110490730B (zh) * 2019-08-21 2022-07-26 北京顶象技术有限公司 异常资金聚集行为检测方法、装置、设备及存储介质
CN110855614B (zh) * 2019-10-14 2021-12-21 微梦创科网络科技(中国)有限公司 一种针对业内共享黑产信息处理的方法及装置
CN113111132B (zh) * 2020-01-13 2024-06-21 北京沃东天骏信息技术有限公司 一种识别目标用户的方法和装置
CN111415241A (zh) * 2020-02-29 2020-07-14 深圳壹账通智能科技有限公司 欺诈人员识别方法、装置、设备和存储介质
CN112053221A (zh) * 2020-08-14 2020-12-08 百维金科(上海)信息科技有限公司 一种基于知识图谱的互联网金融团伙欺诈行为检测方法
CN112035677B (zh) * 2020-09-03 2023-09-22 中国银行股份有限公司 基于知识图谱的诈骗人员发现方法及装置
CN112200583B (zh) * 2020-10-28 2023-12-19 交通银行股份有限公司 一种基于知识图谱的欺诈客户识别方法
CN112200644B (zh) * 2020-12-09 2021-05-14 北京顺达同行科技有限公司 欺诈用户识别方法、装置、计算机设备以及存储介质
CN112581271B (zh) * 2020-12-21 2022-11-15 上海浦东发展银行股份有限公司 一种商户交易风险监测方法、装置、设备及存储介质
CN113364764B (zh) * 2021-06-02 2022-07-12 ***通信集团广东有限公司 基于大数据的信息安全防护方法及装置
CN113592517A (zh) * 2021-08-09 2021-11-02 深圳前海微众银行股份有限公司 欺诈客群识别方法、装置、终端设备及计算机存储介质
CN116308748B (zh) * 2023-03-19 2023-10-20 二十六度数字科技(广州)有限公司 一种基于知识图谱的用户欺诈行为判断***

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110040631A1 (en) * 2005-07-09 2011-02-17 Jeffrey Scott Eder Personalized commerce system
CN109064313A (zh) * 2018-07-20 2018-12-21 重庆富民银行股份有限公司 基于知识图谱技术的贷后预警监测***
CN109191281A (zh) * 2018-08-21 2019-01-11 重庆富民银行股份有限公司 一种基于知识图谱的团体欺诈识别***
CN109460664A (zh) * 2018-10-23 2019-03-12 北京三快在线科技有限公司 风险分析方法、装置、电子设计及计算机可读介质
CN109523153A (zh) * 2018-11-12 2019-03-26 平安科技(深圳)有限公司 非法集资企业的获取方法、装置、计算机设备和存储介质
CN110111110A (zh) * 2019-04-01 2019-08-09 北京三快在线科技有限公司 基于知识图谱检测欺诈的方法和装置、存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110040631A1 (en) * 2005-07-09 2011-02-17 Jeffrey Scott Eder Personalized commerce system
CN109064313A (zh) * 2018-07-20 2018-12-21 重庆富民银行股份有限公司 基于知识图谱技术的贷后预警监测***
CN109191281A (zh) * 2018-08-21 2019-01-11 重庆富民银行股份有限公司 一种基于知识图谱的团体欺诈识别***
CN109460664A (zh) * 2018-10-23 2019-03-12 北京三快在线科技有限公司 风险分析方法、装置、电子设计及计算机可读介质
CN109523153A (zh) * 2018-11-12 2019-03-26 平安科技(深圳)有限公司 非法集资企业的获取方法、装置、计算机设备和存储介质
CN110111110A (zh) * 2019-04-01 2019-08-09 北京三快在线科技有限公司 基于知识图谱检测欺诈的方法和装置、存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200382A (zh) * 2020-10-27 2021-01-08 支付宝(杭州)信息技术有限公司 一种风险预测模型的训练方法和装置
CN112200382B (zh) * 2020-10-27 2022-11-22 支付宝(杭州)信息技术有限公司 一种风险预测模型的训练方法和装置

Also Published As

Publication number Publication date
CN110111110A (zh) 2019-08-09

Similar Documents

Publication Publication Date Title
WO2020199621A1 (zh) 基于知识图谱检测欺诈
US10897482B2 (en) Method, device, and system of back-coloring, forward-coloring, and fraud detection
WO2020192184A1 (zh) 基于图模型检测团伙欺诈
US10135788B1 (en) Using hypergraphs to determine suspicious user activities
JP6697584B2 (ja) データリスクを識別する方法及び装置
US20180033010A1 (en) System and method of identifying suspicious user behavior in a user's interaction with various banking services
US20160071108A1 (en) Enhanced automated anti-fraud and anti-money-laundering payment system
WO2018072580A1 (zh) 一种非法交易检测方法及装置
WO2022121145A1 (zh) 一种基于图分类的以太坊网络钓鱼诈骗检测方法及装置
US20140122311A1 (en) System and method for determining a risk root cause
EP3797396A1 (en) Blockchain transaction safety
US12001800B2 (en) Semantic-aware feature engineering
US11336673B2 (en) Systems and methods for third party risk assessment
US20140351109A1 (en) Method and apparatus for automatically identifying a fraudulent order
WO2021254027A1 (zh) 一种可疑社团的识别方法、装置、存储介质和计算机设备
CN113360580B (zh) 基于知识图谱的异常事件检测方法、装置、设备及介质
CN111199474A (zh) 一种基于双方网络图数据的风险预测方法、装置和电子设备
CN114187112A (zh) 账户风险模型的训练方法和风险用户群体的确定方法
KR102259838B1 (ko) 암호화폐 블랙리스트 구축 장치 및 방법
US11968184B2 (en) Digital identity network alerts
CN113240505A (zh) 图数据的处理方法、装置、设备、存储介质及程序产品
CN112819611A (zh) 欺诈识别方法、装置、电子设备和计算机可读存储介质
CN111951008A (zh) 一种风险预测方法、装置、电子设备和可读存储介质
US9438626B1 (en) Risk scoring for internet protocol networks
CN112750038A (zh) 交易风险的确定方法、装置和服务器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19923228

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19923228

Country of ref document: EP

Kind code of ref document: A1