CN112215500B - Account relation identification method and device - Google Patents

Account relation identification method and device Download PDF

Info

Publication number
CN112215500B
CN112215500B CN202011105858.6A CN202011105858A CN112215500B CN 112215500 B CN112215500 B CN 112215500B CN 202011105858 A CN202011105858 A CN 202011105858A CN 112215500 B CN112215500 B CN 112215500B
Authority
CN
China
Prior art keywords
node
nodes
graph
edge
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011105858.6A
Other languages
Chinese (zh)
Other versions
CN112215500A (en
Inventor
陆健
蒋博赟
柳燕
冯力国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202011105858.6A priority Critical patent/CN112215500B/en
Publication of CN112215500A publication Critical patent/CN112215500A/en
Application granted granted Critical
Publication of CN112215500B publication Critical patent/CN112215500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the specification provides an account relation identification method and device, and the method comprises the following steps: acquiring a knowledge graph constructed according to attribute information of the account, wherein the knowledge graph comprises a plurality of nodes corresponding to a plurality of accounts, the attribute information comprises information of a plurality of attribute items, and for each attribute item, two nodes corresponding to two accounts with the same attribute value are connected through a connecting edge; then, carrying out graph embedding processing on the knowledge graph by using a pre-trained graph embedding model based on the node characteristics of each node and the edge characteristics of each connecting edge in the knowledge graph to obtain an embedding vector corresponding to each node; and inputting the embedded vectors corresponding to at least one pair of nodes into a pre-trained prediction model to obtain the probability value of each pair of nodes corresponding to the same entity, wherein one pair of nodes are two nodes with connecting edges between each other in the knowledge graph. The method can effectively discriminate different account numbers corresponding to the same entity so as to reduce repeated business operations aiming at the same entity.

Description

Account relation identification method and device
Technical Field
One or more embodiments of the present disclosure relate to the technical field of artificial intelligence, and in particular, to an account relation identification method and apparatus.
Background
Under actual application scenarios such as service development and wind control management, an entity such as a shop is one of target objects for performing service development or wind control management.
The same entity may have a plurality of different account numbers in different systems, or the same entity may have more than two different account numbers in the same system, for example, the same store has two account numbers in a third party payment platform system, and in a bank system, as a bank merchant, the store also has a virtual account number generated by the bank for settlement with the bank through the third party payment platform, and the store may also have other account numbers in other takeout platforms or systems such as B2B platform.
Currently, in a service development or wind control management scheme that performs a combined operation inside the same system or across platforms and systems, an account is generally used as an identity of an entity, and corresponding service operations may be repeatedly performed for different accounts of the same entity, for example, when a service, a red packet, or a coupon is pushed to a user in a service development scene, multiple times of repeated pushing may be performed for multiple accounts of the same store, and the same store repeatedly receives a push page of the same service or repeatedly receives the red packet, the coupon; and in a wind control scene, a plurality of account numbers of the same shop are used as different entities to carry out risk control respectively, so that corresponding risk control measures are redundant or invalid.
Accordingly, improved solutions are desired for efficiently identifying multiple accounts corresponding to the same entity, either in the same hierarchy or in different hierarchies.
Disclosure of Invention
One or more embodiments of the present specification describe an account relation identification method and apparatus, where a knowledge graph represents association relations between a plurality of accounts and entities, graph embedding processing is performed on the knowledge graph, node features are extracted, and a prediction model predicts probability values corresponding to the same entity, so as to implement identification of correspondence relations between nodes and entities.
According to a first aspect, an account relation identification method is provided, and the method includes:
acquiring a knowledge graph constructed according to attribute information of the account, wherein the knowledge graph comprises a plurality of nodes corresponding to a plurality of accounts, the attribute information comprises information of a plurality of attribute items, and for each attribute item, two nodes corresponding to two accounts with the same attribute value are connected through a connecting edge;
carrying out graph embedding processing on the knowledge graph by using a pre-trained graph embedding model based on the node characteristics of each node and the edge characteristics of each connecting edge in the knowledge graph to obtain an embedding vector corresponding to each node;
And inputting the embedded vectors corresponding to at least one pair of nodes into a pre-trained prediction model to obtain the probability value of each pair of nodes corresponding to the same entity, wherein one pair of nodes are two nodes with connecting edges between each other in the knowledge graph.
In one embodiment, the attribute information includes any one or more of an address class attribute item, a static identity class attribute item, a merchant class attribute item, and a business feature class attribute item.
In one embodiment, the entity is a merchant, and the address class attribute item comprises any one or more of shipping address information, payer LBS convergence address information, cash register equipment address information and geographic position information;
the static identity attribute items comprise any one or more of merchant names, legal information, settlement bank cards and addressees; merchant category attribute items including any one or more of chain stores, affiliate stores, and other specified entity types; and the operation characteristic class attribute items comprise any one or more of the amount of a single transaction, the transaction frequency, the transaction time interval and the transaction object.
In one embodiment, the plurality of attribute items includes a plurality of attribute items divided into different levels;
Each connecting edge has a respective edge type, and the edge type of any connecting edge corresponds to the level of the attribute item corresponding to the connecting edge; different types of connecting edges have different edge characteristics.
In one embodiment, the different levels include strong, medium, and weak attribute items, divided by the strength of the relationship represented by the attribute items;
a primary connecting edge, a secondary connecting edge and a tertiary connecting edge are respectively established between nodes corresponding to two account numbers with the same strong attribute value, medium attribute value and weak attribute value, wherein the initial values of edge characteristics from the primary connecting edge to the tertiary connecting edge are IV1, IV2 and IV3 respectively, and IV1 is greater than IV2 and is greater than IV 3.
In one embodiment, the graph embedding processing is performed on the knowledge-graph based on the node characteristics of each node and the edge characteristics of each connecting edge in the knowledge-graph, and comprises the following steps:
extracting graph structure information based on the knowledge graph; determining node characteristics of each node and edge characteristics of each edge in the knowledge graph; and inputting the graph structure information, the node characteristics and the edge characteristics into a graph embedding model obtained by pre-training, and carrying out graph embedding processing on the node characteristics corresponding to each node through the graph embedding model.
In one embodiment, the graph embedding processing is performed on the node features corresponding to each node through a graph embedding model, and the graph embedding processing includes:
determining a neighbor node set of a first node by taking any node in the knowledge graph as the first node, and taking a connecting edge between each node in the neighbor node set and the first node as a target edge; and performing at least one-stage vector embedding according to the node characteristics of each node in the first node and neighbor node set and the edge characteristics of each target edge to obtain an embedded vector corresponding to the first node.
In one embodiment, performing at least one level of vector embedding according to the node features of each node in the first node and neighbor node sets and the edge features of each target edge includes:
determining a primary embedding vector of a first node according to the original node characteristics of the first node; and performing one-level or multi-level vector aggregation based on the primary embedding vector and a neighbor node set of the first node, wherein each level of vector aggregation comprises neighbor aggregation of the previous-level embedding vector of each neighbor node in the neighbor node set, and determining the current-level embedding vector of the first node according to the neighbor aggregation result and the previous-level embedding vector of the first node.
In one embodiment, the graph embedding model includes an encoder and a decoder;
based on the knowledge graph, before graph embedding processing is carried out on node features corresponding to each node, pre-training is carried out on a graph embedding model, and the method specifically comprises the following steps:
inputting the graph structure information, the characteristics of each node and the characteristics of each edge into an encoder, and executing graph embedding operation through the encoder to obtain an embedded vector corresponding to each node; extracting at least one triple from the knowledge graph, inputting the triple into a decoder, wherein the triple comprises a first node, a second node and a first connecting edge connecting the first node and the second node; calculating the evaluation score of the triple through a decoder, and determining a loss value corresponding to the current evaluation score according to a preset loss function; the evaluation score of the triple is obtained based on the difference between the sum of the embedding vector of the first node and the edge vector of the first connecting edge and the embedding vector of the second node; the encoder is updated with the goal of minimizing the loss value.
In one embodiment, the encoder is implemented based on any one of a graph neural network Gen i ePath for adaptive perceptual paths, a graph neural network GraphSage for inductive learning, and a gaussian mixture model GMM.
In one embodiment, before inputting the embedded vectors corresponding to at least one pair of nodes into the pre-trained prediction model, the method further includes:
extracting a plurality of triples based on the knowledge graph, wherein two nodes in the triples are used as a pair of nodes; inputting the embedded vectors corresponding to the at least one pair of nodes into a pre-trained prediction model to obtain probability values of the at least one pair of nodes corresponding to the same entity, including: and sequentially inputting each pair of nodes into a pre-trained prediction model, and respectively obtaining probability values of two nodes in each pair of nodes corresponding to the same entity.
In one embodiment, the prediction model is any one of an XGboost, a logistic regression model, and a neural network model.
In one embodiment, after obtaining probability values that at least one pair of nodes correspond to the same entity, the method further comprises:
when the probability value reaches a preset threshold value, the pair of nodes are used as a pair of candidate nodes corresponding to the same entity; constructing a maximum connected graph corresponding to the entity based on a plurality of pairs of candidate nodes corresponding to the same entity; and determining a plurality of account numbers corresponding to a plurality of nodes contained in the maximum connected subgraph as corresponding to the same entity.
According to a second aspect, an embodiment of the present specification further provides an account relation identification apparatus, including:
The acquisition unit is configured to acquire a knowledge graph constructed according to attribute information of the accounts, wherein the knowledge graph comprises a plurality of nodes corresponding to a plurality of accounts, the attribute information comprises information of a plurality of attribute items, and for each attribute item, two nodes corresponding to two accounts with the same attribute value are connected through a connecting edge;
the graph embedding unit is configured to utilize a pre-trained graph embedding model, and based on node features of all nodes and edge features of all connecting edges in the knowledge graph, carry out graph embedding processing on the knowledge graph to obtain embedding vectors corresponding to all nodes;
and the prediction unit is configured to input the embedded vectors corresponding to at least one pair of nodes into a pre-trained prediction model to obtain the probability value of each pair of nodes corresponding to the same entity, and the pair of nodes are two nodes with connecting edges between each other in the knowledge graph.
According to a third aspect, embodiments of the present specification provide a computer-readable storage medium, on which a computer program is stored, which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fourth aspect, embodiments of the present specification provide a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor when executing the executable code implementing the method of the first aspect.
According to the method and the device provided by the embodiment of the specification, the connecting edge is established between two nodes with the same attribute value in the knowledge graph, the complex association relation among the attribute characteristics of a plurality of account numbers is expressed through the knowledge graph, then, graph embedding processing is carried out based on the knowledge graph, embedding vectors of all nodes are obtained, the obtained embedding vectors contain node characteristic information and edge characteristic information representing attribute incidence relation among the nodes, whether the embedded vectors of different accounts correspond to the same entity or not is predicted through a prediction model, so that different accounts corresponding to the same entity can be effectively identified, in application scenarios such as risk control or service development, invalid or repeated service promotion operations and risk management measures can be effectively avoided, repeated pushing operations on the same entity are reduced, user experience is improved, and unnecessary cost expenditure or redundant wind control measures are reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a system framework diagram illustrating an account relation identification method in an embodiment of the present specification;
fig. 2 is a flowchart illustrating an account relation identification method according to an embodiment of the present specification;
FIG. 3 illustrates a knowledge graph example in one embodiment of the present description;
FIG. 4 is a diagram illustrating an example of a knowledge graph in another embodiment of the present description;
FIG. 5 shows an example diagram of a knowledge graph in another embodiment of the present description;
FIG. 6 shows a schematic diagram of the KARI algorithm in one embodiment of the present description;
FIG. 7 illustrates an example graph of a maximum connectivity subgraph in one embodiment of the present description;
fig. 8 is a schematic structural diagram of an account relation recognition apparatus in one embodiment of this specification.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
The embodiment of the specification provides an account relation identification method and device, which can be used for identifying whether different accounts belong to the same entity, and the scheme discloses a Novel Knowledge graph structure and a Novel characterization Learning technical means-KARI algorithm, namely a Novel characterization Learning algorithm (Knowledge-aware basis on Novel prediction Learning + Knowledge indication orientation + Logic Rule + adaptive Sampling) based on an Attention mechanism, Logic rules and negative Sampling, so as to map the Knowledge graph into an embedded vector. In the KARI algorithm, feature aggregation (for example, feature aggregation based on an attention mechanism) is performed on node features and edge features and graph structure information through an encoder to obtain primary embedding vectors of each node, a loss function is constructed, loss values are propagated backwards through a decoder, so that the encoder is continuously optimized, the encoder outputs more optimized embedding vectors, and whether the account numbers correspond to the same entity or not, for example, whether the account numbers correspond to the same store with an actual business address or not, can be predicted based on the embedding vectors output by the KARI.
In the solution disclosed in the embodiment of the present specification, characterization learning is performed first to obtain embedded vectors corresponding to respective account numbers, and then whether different account numbers correspond to the same entity is predicted based on the respective embedded vectors.
Exemplarily, referring to fig. 1, a knowledge graph is first constructed or a pre-constructed knowledge graph is obtained, node features (initial features) of each node and edge features (initial features) of each edge in the knowledge graph are obtained, graph structure information, edge features and node features extracted based on the knowledge graph are input into a graph embedding model, multi-level vector embedding is performed through the graph embedding model, and an embedding vector of each node is obtained, that is, an embedding vector corresponding to each account is first obtained by using a KARI algorithm, then the embedding vector of a pair of nodes in a triplet is input into a prediction model, and a prediction value output by the prediction model is a probability value that two accounts represented by the pair of nodes correspond to the same entity.
It is to be appreciated that the method can be performed by any computing, processing capable apparatus, device, platform, cluster of devices.
In embodiments of the present specification, the entity may be a physical human user (e.g., a consumer), a merchant, a floating vendor, etc., wherein the merchant refers to a merchant having a physical location, and may be, for example, various stores having physical addresses, such as an electronic product marketing physical store, a clothing store, a hotel, a restaurant, a bar, an educational institution, a medical clinic, a fitness club, a dessert, a restaurant, a cafeteria, a yard, an intermediary, a building floor, a post house, etc.
Referring to fig. 2, an account relationship identification method disclosed in the embodiment of the present specification may include:
s201, acquiring a knowledge graph constructed according to attribute information of the account; s202, using a pre-trained graph embedding model, and carrying out graph embedding processing on the knowledge graph based on the node characteristics of each node and the edge characteristics of each connecting edge in the knowledge graph to obtain an embedding vector corresponding to each node; s203, inputting the embedded vectors corresponding to at least one pair of nodes into a pre-trained prediction model to obtain the probability value of each pair of nodes corresponding to the same entity, wherein one pair of nodes are two nodes with connecting edges between each other in the knowledge graph.
In the knowledge graph disclosed in the embodiments of the present specification, a node represents an account (or an account), and a connecting edge between two nodes represents that the two nodes have the same attribute value. The account represented by the node may include a user ID having an actual operation authority, for example, a user ID for performing operations such as login and query, or may be a virtual account that only functions as an identifier, one knowledge graph may only include an account within a platform system, different platforms correspond to different sub-graphs, or one knowledge graph includes accounts of different platforms, that is, a large-scale knowledge graph is established across platforms.
The construction of the knowledge graph firstly needs to determine a plurality of account numbers contained in the knowledge graph, and then needs to establish connection edges between nodes based on attribute information of the account numbers. The attribute information, i.e. various items of information related to identifying whether the account number corresponds to an entity, may include a plurality of attribute items, for example, information such as an operation address, a merchant name, corporate information, transaction frequency, etc. may be used as the attribute items. In the knowledge graph, if two account numbers have the same attribute value on a certain attribute item, the two account numbers are considered to be associated, and a connecting edge between two nodes is established in the knowledge graph.
Illustratively, referring to fig. 3, nodes a to I in fig. 3 respectively represent a plurality of different accounts, and a connecting edge between the nodes represents that two corresponding nodes have some same attribute value.
In one embodiment, the attribute items correspond to connection edges, that is, one attribute item corresponds to one connection edge, the number of the connection edges between two nodes is the same as the number of the attribute items having the same value between two nodes, for example, if there are 3 connection edges between node a and node B, then there are 3 attribute items having the same attribute value between two accounts represented by node a and node B, and there is only one connection edge between node B and node C, then there is only one attribute item having the same attribute value between two accounts represented by node B and node C.
In one embodiment, the connection edges corresponding to different attribute items can be represented by the same type of connection edge, but the connection edges corresponding to different attribute items have different edge characteristics.
In another embodiment, different attribute items are respectively represented by different types of connecting edges, one attribute item corresponds to one connecting edge, as shown in fig. 3, different linear connecting edges represent different attribute items, and fig. 3 shows 4 types of connecting edges in total, where 3 types of connecting edges between a node a and a node B represent that two account numbers corresponding to the node a and the node B have the same attribute values on the 3 attribute items of the merchant name, the LBS aggregated address, and the transaction period, for example, the merchant name corresponding to the account number represented by the node a and the account number represented by the node B are both "× milk tea shop", and the aggregated address is both "× city street 32 × 103", and the transaction period is all 9:00 to 21: 00. The node E and the node C have the same attribute values on 3 attribute items, namely the cashier equipment address, the LBS convergence address and the merchant name.
In practical application, attribute information related to an account needs to be acquired in multiple ways and is very diversified, and in order to facilitate data processing, the attribute information can be firstly divided into several large categories, for example, an address category attribute item, a static identity category attribute item, a merchant category attribute item, an operation characteristic category attribute item and the like, wherein each large category comprises a plurality of specific attribute items.
A feasible way of classifying and processing attribute information is described below by taking an entity such as a merchant as an example:
when the entity is a merchant, the attribute information is first divided into the following four broad categories: the system comprises an address class, a static identity class, a merchant class and an operation characteristic class, wherein each class is provided with at least one attribute item.
For example, the address class attribute may include any one or more of shipping address information, Location Based Services (LBS) aggregated address information, cash register address information, and geographic Location information.
The shipping address information may come from B2B (business to business) platform, i.e. the address of the merchant prepared by the B2B platform service provider for the merchant when the merchant makes a purchase. The shipping address information is sometimes verified by lack of information verification at the front end, the accuracy of the information is up to the facilitator, and in one embodiment, the shipping address information may be verified first.
The LBS aggregated address information is obtained by aggregating the address of the geographical position information shared by a plurality of buyer users who have transacted with the same account number during transaction. In most cases, the LBS aggregated address information is the actual operating address of the merchant, or the association degree with the actual operating address of the target merchant is very high, for example, according to the transaction record, the address information obtained by aggregating the position information of a user who shares a plurality of open positions with the account of the same milky tea shop (non-takeaway transaction) is usually the actual operating address of the milky tea shop, and therefore, in one embodiment, the LBS aggregated address information may be used as a strong attribute item. It should be noted that when the buyer does not share the location of the buyer or the indoor environment of the merchant is not accurately located, the merchant cannot be accurately located; in addition, the LBS positioning address is influenced by factors such as positioning accuracy, technical level and the like, and positioning drift can be generated; for mobile vendors, more transactions are required to fully locate the mobile address of the merchant.
The address information of the cash register device includes an address of a code scanning device or a physical address information of a cash register device such as a face recognition device, for example, if a merchant uses the code scanning device or the face recognition device to perform cash registering, an ID of the corresponding cash register device may be recorded in a transaction record or a transaction detail associated with at least one account corresponding to the merchant, and the address information of the corresponding device may be obtained according to the ID.
The geographic location information, that is, the geographic location information corresponding to each account acquired from the map platform or the navigation platform, may be acquired, for example, according to a merchant name corresponding to each account. The geographic location information can also assist in checking and screening other address class data.
Still taking the merchant as an example, the static identity class attribute item of the merchant may include at least one of a merchant name, corporate information, a settlement bank card, and an addressee.
The receiving address, that is, the receiving address in the transaction record corresponding to each account may be obtained through, for example, the B2C platform, and the receiving address is closely associated with the location of the merchant or the residence of the legal person, and generally serves as strong attribute item information or medium attribute item information.
The bank card for settlement, i.e. the bank account for settlement when operating the account, is generally a bank card of a merchant boss or a corporate account of a company.
For an enterprise business, the information contained in the business name is relatively accurate and generally corresponds to a definite enterprise entity. The merchant name can clearly indicate the brand or name of a merchant corresponding to a certain account, but for a plurality of stores with the same merchant name, such as chain stores, franchises, etc., the same merchant name cannot indicate that two accounts necessarily correspond to the same physical store.
The information of the legal person generally includes the name, the mobile phone number of the account contact person, and the like of the legal person, wherein the information may be doped with a non-real mobile phone number and needs to be subjected to data cleaning.
The merchant category attribute item may include at least one of a chain store, an affiliate store, and other specified entity types, and is used to distinguish multiple account numbers having the same static identity attribute item information, for example, when the multiple account numbers correspond to the same merchant name, and the merchant category of the merchant is the chain store, it is necessary to further identify whether corresponding address information of each account number is diversified.
And the operation characteristic class attribute item comprises at least one of the amount of a single transaction, the transaction frequency, the transaction time interval and the transaction object. The method can be obtained from transaction records related to each account, wherein, regarding a single transaction amount, different account numbers of the same holder sometimes show similar guest unit price/unit price, although sometimes the same store may operate different categories, that is, the difference in guest unit price/unit price cannot indicate that the corresponding account numbers necessarily correspond to different stores, two account numbers with larger difference in single transaction amount generally correspond to two different entity stores, for example, the single transaction amount of a milky tea store is generally small consumption and does not exceed hundred yuan, and mobile phone marketing, the single transaction amount is mostly more than thousand yuan and the single transaction amount can reflect the characteristics of the entity store corresponding to the account number to a certain extent. The commodity name can also be used as an operation characteristic attribute item, but the general information loss rate is higher, and the method is only suitable for individual scenes with sufficient commodity name information.
With regard to the transaction time periods, the transaction time periods are generally similar for merchants engaged in the same business category, and the time periods of frequent transaction running of the same store corresponding to different account numbers are generally the same or similar. The transaction object may include attributes of the buyer user, such as age, sex, gambling, and frequent credit/debit records of the buyer user.
Based on the multi-source attribute information, specific data content of each attribute item corresponding to each account number can be obtained, and an attribute value of each attribute item, that is, specific content of each attribute item, or an attribute vector obtained based on the specific content, for example, if a merchant name corresponding to the account number a is a "× milky tea shop", and a merchant name corresponding to the account number b is also a "× milky tea shop", the attribute values are considered to be the same, and LBS aggregation addresses of the account number c and the account number d are consistent, that is, the attribute values are the same. In one embodiment, before comparing the attribute values of the attribute items, the original content of each attribute item may be subjected to unique hot coding, and the coded vector is used as an attribute value, for example, the merchant category attribute item, various merchant categories may be numbered, and then the number is converted into a unique hot coding form, so as to obtain the attribute value of the merchant category attribute item; for other attribute items such as transaction frequency, transaction time period, gender of the buyer user and the like, the same processing means can be adopted to obtain vector expressions of the attribute items as attribute values so as to facilitate quick matching between the attribute items of different account numbers.
As can be known from the above listed attribute information, the attribute items included in the attribute information may include several items or even more than ten items, and in some embodiments, if different attribute items are represented by different types of connecting edges, more than ten different types of connecting edges may appear in the knowledge graph, which increases the complexity of constructing the knowledge graph.
In view of this, in one embodiment, the attribute information may be classified into classes, that is, each attribute item has a corresponding class, each connection edge has a respective edge type, the edge type of any connection edge corresponds to the class of the attribute item corresponding to the connection edge, and different types of connection edges have different edge characteristics. For example, according to the strength of the relationship represented by the attribute items, the attribute items may be divided into strong attribute items, medium attribute items, and weak attribute items, and the attribute items in the three levels correspond to the first-level connection edge, the second-level connection edge, and the third-level connection edge, respectively. Illustratively, the LBS aggregated address information and the settlement bank card may be used as strong attribute item information, the shipping address information and the transaction period as medium attribute item information, the merchant name and legal information as weak attribute item information, and the like.
For example, there are four types of the connecting edges in fig. 3, and the types of the connecting edges are reduced to 3 in fig. 4. In this way, the edge types in the knowledge graph are reduced, and the complexity of constructing the knowledge graph is reduced.
In the solution disclosed in the embodiment of the present specification, there are various ways of performing the classification of the attribute items, and the classification is not limited to the above 3-level classification way, and for example, the classification may be performed in two levels of strong and weak, or in four levels of strong, medium weak and weak, and so on.
Illustratively, referring to fig. 4, three levels of connection edges are shown in fig. 4, where different line types represent different levels, 3 levels of connection edges are connected between a node a and a node B, only one secondary connection edge is connected between a node B and a node F, and one secondary connection edge and one tertiary connection edge are connected between a node G and a node H. In the knowledge graph, the connecting edges of different levels have different initial edge characteristics, the initial edge characteristic value of the first-level connecting edge corresponding to the strong attribute item is IV1, the initial edge characteristic value of the second-level connecting edge corresponding to the middle attribute item is IV2, the initial edge characteristic value of the third-level connecting edge corresponding to the weak attribute item is IV3, and IV1 is greater than IV2 is greater than IV 3.
It should be noted that, the connection edges at different levels may have different edge characteristics, and there may also be different edge characteristics between a plurality of connection edges belonging to the same level, and the three levels of edge characteristic initial values may respectively represent ranges of the characteristic initial values at corresponding levels, for example, IV1 represents a range of the edge characteristic initial values of the connection edges at one level. In most embodiments, the edge characteristics of each connected edge depend on the corresponding attribute item. In some other embodiments, the connected edges at the same level may have the same edge characteristics, so that the graph embedding process is performed with lower computational complexity, but the expression accuracy of the attribute information is reduced. In most embodiments, different attribute items have different edge characteristics.
In one embodiment, the initial value of the edge feature of each connecting edge in the knowledge-graph is positively correlated with the strength of the relationship represented by the attribute item corresponding to the connecting edge, for example, the initial value of the edge feature of the primary connecting edge corresponding to the strong attribute item is greater than the initial value of the edge feature of the secondary connecting edge corresponding to the medium attribute item, and the initial value of the edge feature of the secondary connecting edge is greater than the initial value of the edge feature of the tertiary connecting edge. The initial value of the edge feature of each edge may be initialized randomly on the premise that the ordering of the relationship strength indicated by each attribute item is satisfied.
Specifically, in one embodiment, a one-hot encoding manner may be adopted to map each type of connected edge into one vector. When there are N possible connected edge types, then an N-dimensional vector is obtained.
In an embodiment, the initial feature of each node may be determined based on the attribute items and attribute values thereof corresponding to each node, for example, the initial feature of a node is obtained by splicing the attribute values of a plurality of attribute items corresponding to the node.
Therefore, based on the attribute information, various relationship edges (namely connection edges) can be established among the nodes for representing the account, the various relationship edges and the nodes form an undirected connected graph, and the property and the strength of the connection relationship among the nodes can be visually shown. The more the number of connecting edges between two nodes is, the higher the association degree between the two nodes is.
Furthermore, based on the attribute information, the initial features of each node and each edge can be obtained and converted into the node initial vector and the edge initial vector, that is, the graph embedding model can be input to execute the graph embedding process. In one embodiment, the node features are kept consistent with the vector dimensions of the edge features when the graph is input into the embedding model.
In one embodiment, the constructed original knowledge-graph structure may be further processed before the graph embedding process is performed, and multiple connecting edges between two nodes are fused into one connecting edge, for example, after the original knowledge-graph structure shown in fig. 4 is subjected to the fusion process, a knowledge-graph as shown in fig. 5 may be obtained.
In one embodiment, the fusion of the original connection edges between two nodes is performed, and may be performed by performing weighted summation on the original vectors of the original connection edges, where different attribute items correspond to different weights, for example, the original knowledge graph shown in fig. 4, the node a, and the node bThe 3 connecting edges between B, which have initial characteristic values of IV1, IV2 and IV3, are converted into corresponding edge initial vectors
Figure BDA0002726929000000141
The weights corresponding to the 3 connecting edges are respectively alpha, beta and gamma, and then the initial vector of the fused connecting edges is
Figure BDA0002726929000000142
The fusion operation is not limited to weighted summation, but may be splicing edge initial vectors of a plurality of connecting edges between two nodes, or vector dot multiplication.
The fused knowledge graph has only one connecting edge between two nodes, so that the complexity and the operand of graph embedding processing are greatly reduced.
Next, S202 is executed to input the node features of each node and the edge features of each connecting edge in the knowledge graph into a graph embedding model trained in advance, perform graph embedding processing through the graph embedding model, and output an embedding vector embedding corresponding to each node.
The graph embedding model may be various models capable of performing graph embedding processes, and in one embodiment, the graph embedding model may be a model that operates based on the KARI algorithm.
Referring to fig. 6, the graph embedding model based on the KARI algorithm includes an encoder and a decoder. The input to the encoder is graph structure information and node features and edge features extracted from the knowledge-graph. The graph structure information, that is, the structured data carrying the connection relationship information between nodes, may be, for example, an adjacency matrix, or may record at least one-order neighborhood node connected to each node in various forms such as a table, an array, or a matrix. And then converting the node characteristics of each node and the edge characteristics of each edge into corresponding node initial vectors and edge initial vectors, inputting the node characteristics and the edge initial vectors into an encoder in the graph embedding model together with graph structure information, and carrying out graph embedding processing on the node characteristics corresponding to each node through the encoder.
Specifically, in one embodiment, the encoder performs a graph embedding process on the node features corresponding to each node as follows:
taking any node in the knowledge graph as a first node, determining a neighbor node set of the first node, taking a connecting edge between each node in the neighbor node set and the first node as a target edge, and then embedding at least one stage of vectors according to the node characteristics of each node in the first node and the neighbor node set and the edge characteristics of each target edge to obtain an embedded vector corresponding to the first node.
At least one level of vector embedding may include:
determining a primary embedding vector of a first node according to the original characteristics of the nodes of the first node, executing one-level or multi-level vector aggregation based on the primary embedding vector and a neighbor node set of the first node, wherein each level of vector aggregation comprises the steps of carrying out neighbor node aggregation on the previous level of embedding vectors of all neighbor nodes in the neighbor node set, and determining the current level of embedding vectors of the first node according to the neighbor node aggregation result and the previous level of embedding vectors of the first node.
The feature aggregation process performed by the graph embedding process is described in detail below with any node in the knowledge graph as a target node. The embedding processing aiming at the target node comprises the step of carrying out multi-level vector embedding according to the node attribute characteristics of the target node and the neighbor node set of the target node, so as to obtain a corresponding high-order vector. Specifically, in order to perform multi-level vector embedding on a target node v, first, a primary embedded vector of the target node is determined based on the node initial characteristics of the target node
Figure BDA0002726929000000151
Then, based on the primary embedded vector and the neighbor node set of the target node, multi-stage vector aggregation is carried out until a preset stage number K is reached, and the aggregation vector of the preset stage number is obtained
Figure BDA0002726929000000152
As a high order vector for the target node v.
As mentioned above, the initial characteristic of the node is determined based on the attribute values of the attribute items corresponding to the account represented by the node. In one embodiment, the attribute contents of the attribute items corresponding to the account represented by the target node are encoded and then spliced or otherwise combined, and the obtained vector is used as an initial vector of the node, that is, a primary embedded vector. In another embodiment, the obtained vector may be further subjected to linear or non-linear transformation, and the transformed vector is used as the primary embedding vector.
On the basis of obtaining the primary embedding vector, the primary embedding vector can be based on
Figure BDA0002726929000000153
And the neighbor node set of the target node, and executing multi-stage vector aggregation, wherein each stage of vector aggregation comprises that neighbor aggregation is carried out on the previous-stage embedded vector of each neighbor node in the neighbor node set, and the current-stage embedded vector of the target node is determined according to the neighbor aggregation result and the previous-stage embedded vector of the target node.
In one embodiment, the set of neighbor nodes includes all nodes connected to the target node, that is, includes neighbor nodes having a one-degree connection relationship with the target node.
In one embodiment, the neighbor node aggregation operation may be through an aggregation function AGGkTo indicate. Thus, for a target node v, its k-th order vector aggregation may include, first, utilizing an aggregation function AGGkEmbedding vectors according to the previous stage (i.e., k-1 stage) of the neighbor node u of the target node v
Figure BDA0002726929000000154
Obtaining neighbor aggregation vectors
Figure BDA0002726929000000161
Where n (v) represents the set of neighbor nodes for node v, namely:
Figure BDA0002726929000000162
then, the vector is aggregated according to the neighbor
Figure BDA0002726929000000163
And the previous-stage (i.e., k-1-stage) embedding vector of the target node v
Figure BDA0002726929000000164
Determining the current level (k level) embedded vector of the target node v
Figure BDA0002726929000000165
Namely:
Figure BDA0002726929000000166
where f represents the aggregate vector to the neighbors
Figure BDA0002726929000000167
And the vector of the upper level of the node v
Figure BDA0002726929000000168
Applied synthesis function, WkIs a parameter of the k-th order aggregation. In various embodiments, the synthesis operation in function f may include, to be
Figure BDA0002726929000000169
And
Figure BDA00027269290000001610
concatenation, or summation, or averaging, etc.
In different embodiments, the above aggregation function AGG for performing neighbor aggregation operationkMay take different forms and algorithms.
As an implementable manner, when performing the above neighbor aggregation, an attention mechanism may be introduced, giving different attention and weights to different neighbor nodes. In this embodiment, the aggregation function AGG is set as described above kMay be embodied as a weighted sum operation. Accordingly, the formula (1) isThe method comprises the following steps:
Figure BDA00027269290000001611
that is, the previous-stage embedding vector to the neighbor node u of the target node v
Figure BDA00027269290000001612
Weighted summation is carried out to obtain a neighbor aggregation vector
Figure BDA00027269290000001613
Wherein alpha isuvAnd the weight factor is corresponding to the neighbor node u. The weighting factor may depend on the connecting edge from the neighbor node u to the target node v.
In one example, the weighting factor α isuvAnd determining according to the edge vector of the connecting edge between the neighbor node u and the target node v. The edge vector is the weight factor alpha corresponding to the neighbor node uuv
In an embodiment in which a plurality of connection edges between two nodes are not subjected to fusion processing before graph embedding processing, in the graph embedding processing process, a plurality of connection edges may exist between the two nodes, and one possible implementation manner is that each connection edge independently participates in feature aggregation, the plurality of connection edges are aggregated respectively with reference to an aggregation manner in which only one connection edge exists between the two nodes, then aggregation features corresponding to each connection edge are summed, the obtained comprehensive features are taken as features of aggregated target nodes, and vector embedding at each level is sequentially performed; or, another possible implementation manner is that a plurality of connection edges between two nodes are first subjected to fusion processing to obtain a comprehensive connection edge, and then multi-level vector embedding can be realized according to the above manner.
The encoder is realized based on any one of a graph neural network GeniePath of an adaptive sensing path and a graph neural network GraphSage of inductive learning.
As already mentioned, the KARI algorithm includes an encoder and a decoder. The above-mentioned graph embedding process is performed by an encoder, and the decoder is used for continuously optimizing parameters in the encoder through a direction propagation loss value so that the encoder outputs more accurate and more comprehensive embedded vectors. The edge vectors of various types of connection edges are used as weights in the image embedding process, namely, the edge vectors are part of network parameters in the encoder, and iterative updating can be performed in the training of the encoder until stable edge embedding vectors are obtained, namely stable weight factors are obtained.
Specifically, the decoder performs the following:
before the graph embedding process is performed on the node features corresponding to each node, the graph embedding model is trained in advance, and an encoder is generally trained through feedback of a decoder. Specifically, graph structure information, each node feature and each edge feature are input into an encoder, graph embedding operation is executed through the encoder to obtain an embedding vector corresponding to each node, then at least one triple is extracted from a knowledge graph and input into a decoder, the evaluation score of the triple is calculated through the decoder, a loss value corresponding to the current evaluation score is determined according to a preset loss function, the gradient corresponding to the loss value is reversely propagated to the encoder, and the encoder is updated with the minimized loss value as a target.
The triple comprises a first node, a second node and a first connecting edge connecting the first node and the second node. Based on the characteristics of the knowledge graph itself, the knowledge graph is often recorded in the field in a (head node h, relation r, tail node t) triple manner. It can be understood that a triple may record an entity relationship corresponding to a connection edge in the knowledge-graph, where a head node h (head) is a node at one end of the connection edge, a tail node t (tail) is a node at the other end of the connection edge, and a relationship r (relationship) is the connection edge. For example, in fig. 3, (a, r1, B), (a, r2, B), (a, r3, B) are three different triplets. That is, in the knowledge-graph, any two nodes connected by a connecting edge form a triple with the connecting edge. In fig. 3, 13 triples can be extracted.
In the embodiments of the present specification, of tripletsThe evaluation score is obtained based on the sum of the embedding vector of the first node and the edge vector of the first connecting edge and the difference of the embedding vector of the second node, i.e. based on
Figure BDA0002726929000000181
And (4) obtaining.
To characterize a knowledge graph, the entity relationships shown by the above triples may be targeted for characterization learning. That is, when the knowledge graph is characterized, each entity and each relationship in the graph are expressed in the form of a vector, and the vector representation can conform to the entity relationship shown by the triple as much as possible. The vector representation conforming to a triplet entity relationship may be embodied as: vector of head node
Figure BDA0002726929000000182
Adding a relationship vector
Figure BDA0002726929000000183
Vector equal to tail node
Figure BDA0002726929000000184
In this embodiment, the input of the decoder is a triplet, and then decoding is performed based on a DTransE algorithm, which can be regarded as an inverse process of a TransE algorithm, that is, decoding an edge vector of a current encoder and an output node embedded vector, where the decoding process is to calculate information about the decoding according to a preset evaluation function
Figure BDA0002726929000000185
The evaluation score of (1). And in one embodiment, the higher the strength of the relationship, the higher the evaluation score value, e.g., for the correct triplet for which there is a relationship r
Figure BDA0002726929000000186
May be greater than the evaluation score of an erroneous triplet for which there is no relationship r.
And calculating a loss value according to the evaluation score corresponding to the triple based on a loss function preset by the decoder, reversely transmitting the gradient to the encoder, and adjusting the network parameters of the encoder until a preset convergence condition is reached in the direction of reducing the loss value.
In this way, each node output by the trained encoder is embedded into a vector, and the input to the prediction model is continued. The prediction model may be any one of an XGboost, a logistic regression model, and a neural network model.
In one embodiment, two nodes in the triple are used as a pair of nodes, and the embedded vectors corresponding to at least one pair of nodes are sequentially input into a pre-trained prediction model to respectively obtain probability values of two nodes in each pair of nodes corresponding to the same entity.
When the probability value reaches a predetermined threshold value, the pair of nodes is taken as a pair of candidate nodes corresponding to the same entity, so that at least one pair of nodes corresponding to the same entity can be preliminarily determined. The output of the predictive model can only be used to predict whether two nodes of a one-degree relationship correspond to the same entity.
And then, constructing a maximum connected graph corresponding to each entity according to a plurality of pairs of candidate nodes corresponding to the same entity by adopting a maximum connected graph algorithm, and determining a plurality of account numbers corresponding to the nodes contained in the maximum connected graph as corresponding to the same entity.
For example, as shown in fig. 7, if it is predicted that node a and node B correspond to the same entity, node B and node C correspond to the same entity, and node C and node E correspond to the same entity, the corresponding maximum connected subgraph is shown by the dashed oval frame in fig. 7. The four accounts corresponding to the node A, the node B, the node C and the node E are determined to correspond to the same entity. When service development or risk joint control is carried out, the four account numbers can be operated in a unified mode.
In summary, the embodiments of the present specification disclose an account relationship identification method. In the related account identification scheme in the prior art, the judgment of the same holder is mainly performed on accounts subjected to real-name authentication. For example, the electronic account co-determination in the same group system can greatly improve the user login and use experience, but the scheme is not suitable for the identification of the same shop, the same holder can have different shops in a plurality of places, and the same shop is possibly mastered by different users, so the identification scheme for the user cannot be used for solving the problem of identification of the relationship between the account and the shop.
With the account relation identification method disclosed in the embodiment of the present specification, it is possible to determine whether or not a plurality of accounts belonging to the same entity (for example, the same store) in each system account and among a plurality of systems, and it is superior to the above-described existing method in processing a large amount of data.
On one hand, according to the scheme disclosed by the specification, through mutual verification of multi-source information data, information is richer, relationship depiction is more definite, and on the other hand, due to the construction of a point-side relationship in a knowledge graph, the nature of relationship connection is reflected, such as stronger management LBS convergence address association, or only weaker merchant name/legal person association, and the relation connection is reflected, such as static address information association, transaction LBS static information association, buyer group association, legal person association, or only merchant name association, and the like. Preferably, the data can be preprocessed in an early stage, non-real information is removed in advance, and the judgment accuracy is greatly improved. According to estimation, the judgment accuracy of the scheme disclosed by the specification reaches 98%.
The scheme disclosed by the specification can be applied to a plurality of practical scenes, for example, cross-account business incentive or risk linkage can be carried out on a shop main body of the same merchant, even a credit loan record hook is carried out, so that the business incentive and attribute description can be intelligently carried out on shops on the market, the business decision can be completely assisted, and key data information can be provided for relevant departments when necessary so as to assist the relevant departments in carrying out market economic management.
Referring to fig. 8, an embodiment of the present disclosure further provides an account relation identification apparatus 800, including:
an obtaining unit 801 configured to obtain a knowledge graph constructed according to attribute information of an account, where the knowledge graph includes a plurality of nodes corresponding to a plurality of accounts, the attribute information includes information of a plurality of attribute items, and for each attribute item, two nodes corresponding to two accounts having the same attribute value are connected by a connecting edge;
a graph embedding unit 802 configured to perform graph embedding processing on the knowledge graph based on node features of each node and edge features of each connecting edge in the knowledge graph by using a pre-trained graph embedding model to obtain an embedding vector corresponding to each node;
the prediction unit 803 is configured to input the embedded vectors corresponding to at least one pair of nodes into a pre-trained prediction model, and obtain probability values corresponding to the same entity for each pair of nodes, where a pair of nodes is two nodes having a connecting edge between them in the knowledge graph.
In one embodiment, the attribute information includes any one or more of an address class attribute item, a static identity class attribute item, a merchant class attribute item, and a business feature class attribute item.
In a more specific embodiment, the entity is a merchant, and the address class attribute item includes any one or more of incoming address information, payer LBS convergence address information, cashier equipment address information and geographic location information; the static identity attribute items comprise any one or more of a merchant name, legal information, a settlement bank card and an addressee; merchant category attribute items including any one or more of chain stores, affiliate stores, and other specified entity types; and the operation characteristic class attribute items comprise any one or more of the amount of a single transaction, the transaction frequency, the transaction time interval and the transaction object.
In a more specific embodiment, the plurality of attribute items include a plurality of attribute items divided into different levels; each connecting edge has respective edge type, and the edge type of any connecting edge corresponds to the level of the attribute item corresponding to the connecting edge; different types of connecting edges have different edge characteristics.
In a more specific embodiment, the different levels include a strong attribute item, a medium attribute item, and a weak attribute item divided according to the strength of the relationship represented by the attribute items; a primary connecting edge, a secondary connecting edge and a tertiary connecting edge are respectively established between nodes corresponding to two account numbers with the same strong attribute value, medium attribute value and weak attribute value, wherein the initial values of the edge characteristics from the primary connecting edge to the tertiary connecting edge are respectively IV1, IV2 and IV3, and IV1 is more than IV2 and more than IV 3.
In a more specific embodiment, the number of connecting edges between two nodes in the knowledge-graph corresponds to the number of attribute items having the same attribute value between the two nodes; an acquisition unit further configured to: more than two connecting edges between two nodes are fused into one connecting edge, and the edge characteristic of one connecting edge is obtained based on the initial characteristic and the weight corresponding to each connecting edge in more than two connecting edges.
In one embodiment, a graph embedding unit configured to: extracting graph structure information based on the knowledge graph; determining node characteristics of each node and edge characteristics of each edge in the knowledge graph; and inputting the graph structure information, the node characteristics and the edge characteristics into a graph embedding model obtained by pre-training, and carrying out graph embedding processing on the node characteristics corresponding to the nodes through the graph embedding model.
In one embodiment, the graph embedding unit 802 is specifically configured to: determining a neighbor node set of the first node by taking any node in the knowledge graph as the first node, and taking a connecting edge between each node in the neighbor node set and the first node as a target edge; and performing at least one-stage vector embedding according to the node characteristics of each node in the first node and the neighbor node set and the edge characteristics of each target edge to obtain an embedded vector corresponding to the first node.
In a more specific embodiment, the graph embedding unit 802 is specifically configured to: determining a primary embedding vector of a first node according to the original node characteristics of the first node; and performing one-level or multi-level vector aggregation based on the primary embedded vector and a neighbor node set of the first node, wherein each-level vector aggregation comprises performing neighbor aggregation on the previous-level embedded vector of each neighbor node in the neighbor node set, and determining the current-level embedded vector of the first node according to the neighbor aggregation result and the previous-level embedded vector of the first node.
In one embodiment, the graph embedding model includes an encoder and a decoder. The account number relationship recognition apparatus disclosed in the present specification further includes a training unit 804 configured to:
inputting the graph structure information, each node feature and each edge feature into an encoder, and executing graph embedding operation through the encoder to obtain an embedded vector corresponding to each node; extracting at least one triple from the knowledge graph, inputting the triple into a decoder, wherein the triple comprises a first node, a second node and a first connecting edge connecting the first node and the second node; calculating the evaluation score of the triple through a decoder, determining a loss value corresponding to the current evaluation score according to a preset loss function, and reversely transmitting the loss value to an encoder; the evaluation score of the triple is obtained based on the sum of the embedding vector of the first node and the edge characteristics of the first connecting edge and the difference value of the embedding vector of the second node; the encoder is updated with the goal of minimizing the loss value.
In one embodiment, the encoder is implemented based on any one of a graph neural network GeniePath of an adaptive perceptual path, a graph neural network GraphSage of inductive learning, and a gaussian mixture model GMM.
In one embodiment, the prediction unit 803 is further configured to: extracting a plurality of triples based on the knowledge graph, wherein two nodes in the triples are used as a pair of nodes; and sequentially inputting each pair of nodes into a pre-trained prediction model, and respectively obtaining probability values of two nodes in each pair of nodes corresponding to the same entity.
In one embodiment, the prediction model is any one of an XGboost, a logistic regression model, and a neural network model.
In a more specific embodiment, the prediction unit 803 is further configured to: when the probability value reaches a preset threshold value, the pair of nodes are used as a pair of candidate nodes corresponding to the same entity; constructing a maximum connected graph corresponding to the entity based on a plurality of pairs of candidate nodes corresponding to the same entity; and determining a plurality of account numbers corresponding to a plurality of nodes contained in the maximum connected subgraph as corresponding to the same entity.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.
According to an embodiment of another aspect, there is also provided a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in conjunction with fig. 2.
Those skilled in the art will recognize that the functionality described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof, in one or more of the examples described above. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (28)

1. An account relation identification method comprises the following steps:
acquiring a knowledge graph constructed according to attribute information of accounts, wherein the knowledge graph comprises a plurality of nodes corresponding to a plurality of accounts, the attribute information comprises information of a plurality of attribute items, and for each attribute item, two nodes corresponding to two accounts with the same attribute value are connected through a connecting edge;
Carrying out graph embedding processing on the knowledge graph by utilizing a pre-trained graph embedding model based on the node characteristics of each node and the edge characteristics of each connecting edge in the knowledge graph to obtain an embedding vector corresponding to each node;
and inputting the embedded vectors corresponding to at least one pair of nodes into a pre-trained prediction model to obtain the probability value of each pair of nodes corresponding to the same entity, wherein the pair of nodes are two nodes with connecting edges between each other in the knowledge graph.
2. The method of claim 1, wherein the attribute information comprises any one or more of an address class attribute item, a static identity class attribute item, a merchant class attribute item, and an administration profile class attribute item.
3. The method according to claim 2, wherein the entity is a merchant, and the address class attribute item includes any one or more of incoming address information, payer LBS convergence address information, cashier equipment address information and geographic location information;
the static identity attribute items comprise any one or more of merchant names, legal information, settlement bank cards and addressees;
the merchant category attribute items comprise any one or more of chain stores, member stores and other specified entity types;
The operation characteristic class attribute items comprise any one or more of the amount of a single transaction, transaction frequency, transaction time interval and transaction objects.
4. The method of claim 1, wherein the number of attribute items includes, a plurality of attribute items divided into different levels;
each connecting edge has a respective edge type, and the edge type of any connecting edge corresponds to the level of the attribute item corresponding to the connecting edge; different types of connecting edges have different edge characteristics.
5. The method of claim 4, wherein the different levels include strong, medium, and weak attribute items divided according to the strength of relationship represented by the attribute items;
a primary connecting edge, a secondary connecting edge and a tertiary connecting edge are respectively established between nodes corresponding to two account numbers with the same strong attribute value, medium attribute value and weak attribute value, wherein the initial values of edge characteristics from the primary connecting edge to the tertiary connecting edge are IV1, IV2 and IV3 respectively, and IV1 is greater than IV2 and is greater than IV 3.
6. The method of claim 1, wherein performing graph embedding processing on the knowledge-graph based on node features of each node and edge features of each connecting edge in the knowledge-graph comprises:
Extracting graph structure information based on the knowledge-graph;
determining node characteristics of each node and edge characteristics of each edge in the knowledge graph;
and inputting the graph structure information, the node characteristics and the edge characteristics into a graph embedding model obtained by pre-training, and carrying out graph embedding processing on the node characteristics corresponding to the nodes through the graph embedding model.
7. The method of claim 6, wherein performing graph embedding on the node features corresponding to the nodes through the graph embedding model comprises:
determining a neighbor node set of the first node by taking any node in the knowledge graph as the first node, and taking a connecting edge between each node in the neighbor node set and the first node as a target edge;
and performing at least one-stage vector embedding according to the node characteristics of each node in the first node and the neighbor node set and the edge characteristics of each target edge to obtain an embedded vector corresponding to the first node.
8. The method of claim 7, wherein performing at least one level of vector embedding according to the node characteristics of each node in the first node and the set of neighboring nodes and the edge characteristics of each target edge comprises:
Determining a primary embedding vector of the first node according to the original node characteristics of the first node;
and performing one-level or multi-level vector aggregation based on the primary embedded vector and the neighbor node set of the first node, wherein each level of vector aggregation comprises performing neighbor aggregation on the previous-level embedded vector of each neighbor node in the neighbor node set, and determining the current-level embedded vector of the first node according to the neighbor aggregation result and the previous-level embedded vector of the first node.
9. The method of claim 6, wherein the graph embedding model comprises an encoder and a decoder;
based on the knowledge graph, before graph embedding processing is performed on node features corresponding to each node, pre-training is further performed on the graph embedding model, and the method specifically comprises the following steps:
inputting the graph structure information, the node characteristics and the edge characteristics into the encoder, and executing graph embedding operation through the encoder to obtain an embedded vector corresponding to each node;
extracting at least one triple from the knowledge graph, inputting the triple into the decoder, wherein the triple comprises a first node, a second node and a first connecting edge connecting the first node and the second node; calculating the evaluation score of the triple through the decoder, and determining a loss value corresponding to the current evaluation score according to a preset loss function; the evaluation score of the triple is obtained based on the difference between the sum of the embedded vector of the first node and the edge vector of the first connecting edge and the embedded vector of the second node;
Updating the encoder with a goal of minimizing the loss value.
10. The method of claim 9, wherein the encoder is implemented based on any one of a graphical neural network GeniePath of an adaptive perceptual path, a graphical neural network GraphSage of inductive learning, and a gaussian mixture model GMM.
11. The method of claim 1, wherein before inputting the embedded vectors corresponding to the at least one pair of nodes into the pre-trained predictive model, further comprising:
extracting a plurality of triples based on the knowledge graph, wherein two nodes in the triples are used as a pair of nodes;
inputting the embedded vectors corresponding to at least one pair of nodes into a pre-trained prediction model to obtain probability values of the nodes corresponding to the same entity, including:
and sequentially inputting each pair of nodes into a pre-trained prediction model, and respectively obtaining probability values of two nodes in each pair of nodes corresponding to the same entity.
12. The method according to any one of claims 1-11, wherein the predictive model is any one of an XGboost, a logistic regression model, a neural network model.
13. The method of claim 1, wherein obtaining the probability values that the at least one pair of nodes correspond to the same entity further comprises:
When the probability value reaches a preset threshold value, taking the pair of nodes as a pair of candidate nodes corresponding to the same entity;
constructing a maximum connected graph corresponding to the entity based on a plurality of pairs of candidate nodes corresponding to the same entity;
and determining a plurality of accounts corresponding to a plurality of nodes contained in the maximum connected graph as corresponding to the same entity.
14. An account relation recognition apparatus, comprising:
the acquisition unit is configured to acquire a knowledge graph constructed according to attribute information of the account, wherein the knowledge graph comprises a plurality of nodes corresponding to a plurality of accounts, the attribute information comprises information of a plurality of attribute items, and for each attribute item, two nodes corresponding to two accounts with the same attribute value are connected through a connecting edge;
the graph embedding unit is configured to utilize a pre-trained graph embedding model, and based on node features of all nodes in the knowledge graph and edge features of all connecting edges, carry out graph embedding processing on the knowledge graph to obtain embedding vectors corresponding to all nodes;
and the prediction unit is configured to input the embedded vectors corresponding to at least one pair of nodes into a pre-trained prediction model to obtain the probability value of each pair of nodes corresponding to the same entity, wherein the pair of nodes are two nodes with connecting edges between each other in the knowledge graph.
15. The apparatus of claim 14, wherein the attribute information comprises any one or more of an address class attribute item, a static identity class attribute item, a merchant category attribute item, and an administration feature class attribute item.
16. The apparatus according to claim 15, wherein the entity is a merchant, and the address class attribute item includes any one or more of shipping address information, payer LBS aggregated address information, cash register device address information, and geographic location information;
the static identity attribute items comprise any one or more of merchant names, legal information, settlement bank cards and addressees;
the merchant category attribute items comprise any one or more of chain stores, franchise stores and other specified entity types;
the operation characteristic type attribute items comprise any one or more of the amount of a single transaction, the transaction frequency, the transaction time interval and the transaction object.
17. The apparatus of claim 14, wherein the number of attribute items includes, a plurality of attribute items divided into different levels;
each connecting edge has a respective edge type, and the edge type of any connecting edge corresponds to the level of the attribute item corresponding to the connecting edge; different types of connecting edges have different edge characteristics.
18. The apparatus of claim 17, wherein the different levels include strong, medium, and weak attribute items divided according to a strength of relationship represented by the attribute items;
a primary connecting edge, a secondary connecting edge and a tertiary connecting edge are respectively established between nodes corresponding to two account numbers with the same strong attribute value, medium attribute value and weak attribute value, wherein the initial values of edge characteristics from the primary connecting edge to the tertiary connecting edge are IV1, IV2 and IV3 respectively, and IV1 is greater than IV2 and is greater than IV 3.
19. The apparatus of claim 14, wherein the graph embedding unit is configured to:
extracting graph structure information based on the knowledge-graph;
determining node characteristics of each node and edge characteristics of each edge in the knowledge graph;
and inputting the graph structure information, the node characteristics and the edge characteristics into a graph embedding model obtained by pre-training, and carrying out graph embedding processing on the node characteristics corresponding to the nodes through the graph embedding model.
20. The apparatus of claim 19, wherein the graph embedding unit is specifically configured to:
determining a neighbor node set of the first node by taking any node in the knowledge graph as the first node, and taking a connecting edge between each node in the neighbor node set and the first node as a target edge;
And performing at least one-stage vector embedding according to the node characteristics of each node in the first node and the neighbor node set and the edge characteristics of each target edge to obtain an embedded vector corresponding to the first node.
21. The apparatus of claim 20, wherein the graph embedding unit is specifically configured to:
determining a primary embedding vector of the first node according to the original node characteristics of the first node;
and executing one-level or multi-level vector aggregation based on the primary embedded vector and the neighbor node set of the first node, wherein each level of vector aggregation comprises neighbor aggregation of the previous-level embedded vector of each neighbor node in the neighbor node set, and determining the current-level embedded vector of the first node according to the neighbor aggregation result and the previous-level embedded vector of the first node.
22. The apparatus of claim 19, wherein the graph embedding model comprises an encoder and a decoder;
the apparatus further comprises a training unit configured to:
inputting the graph structure information, the node characteristics and the edge characteristics into the encoder, and executing graph embedding operation through the encoder to obtain an embedded vector corresponding to each node;
Extracting at least one triple from the knowledge graph, inputting the triple into the decoder, wherein the triple comprises a first node, a second node and a first connecting edge connecting the first node and the second node; calculating the evaluation score of the triple through the decoder, and determining a loss value corresponding to the current evaluation score according to a preset loss function; the evaluation score of the triple is obtained based on the difference value of the sum of the embedded vector of the first node and the edge vector of the first connecting edge and the embedded vector of the second node;
updating the encoder with a goal of minimizing the loss value.
23. The apparatus of claim 22, wherein the encoder is implemented based on any one of a graph neural network GeniePath for adaptive perceptual paths, a graph neural network GraphSage for inductive learning, and a gaussian mixture model GMM.
24. The apparatus of claim 14, wherein the prediction unit is further configured to:
extracting a plurality of triples based on the knowledge graph, wherein two nodes in the triples are used as a pair of nodes;
and sequentially inputting each pair of nodes into a pre-trained prediction model, and respectively obtaining probability values of two nodes in each pair of nodes corresponding to the same entity.
25. The apparatus of any one of claims 14-24, wherein the predictive model is any one of an XGboost, a logistic regression model, a neural network model.
26. The apparatus of claim 14, wherein the predictive model is further configured to:
when the probability value reaches a preset threshold value, the pair of nodes are used as a pair of candidate nodes corresponding to the same entity; constructing a maximum connected graph corresponding to the entity based on a plurality of pairs of candidate nodes corresponding to the same entity; and determining a plurality of accounts corresponding to a plurality of nodes contained in the maximum connected graph as corresponding to the same entity.
27. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-13.
28. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-13.
CN202011105858.6A 2020-10-15 2020-10-15 Account relation identification method and device Active CN112215500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011105858.6A CN112215500B (en) 2020-10-15 2020-10-15 Account relation identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011105858.6A CN112215500B (en) 2020-10-15 2020-10-15 Account relation identification method and device

Publications (2)

Publication Number Publication Date
CN112215500A CN112215500A (en) 2021-01-12
CN112215500B true CN112215500B (en) 2022-06-28

Family

ID=74054833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011105858.6A Active CN112215500B (en) 2020-10-15 2020-10-15 Account relation identification method and device

Country Status (1)

Country Link
CN (1) CN112215500B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883363A (en) * 2021-02-05 2021-06-01 上海识装信息科技有限公司 Method for identifying fingerprint collision of equipment
CN113378899B (en) * 2021-05-28 2024-05-28 百果园技术(新加坡)有限公司 Abnormal account identification method, device, equipment and storage medium
CN113837886B (en) * 2021-09-16 2024-05-31 之江实验室 Knowledge-graph-based vehicle insurance claim fraud risk identification method and system
CN113626624B (en) * 2021-10-12 2021-12-21 腾讯科技(深圳)有限公司 Resource identification method and related device
CN114006737B (en) * 2021-10-25 2023-09-01 北京三快在线科技有限公司 Account safety detection method and detection device
CN114282011B (en) * 2022-03-01 2022-08-23 支付宝(杭州)信息技术有限公司 Knowledge graph construction method and device, and graph calculation method and device
CN114820001A (en) * 2022-05-27 2022-07-29 中国建设银行股份有限公司 Target customer screening method, device, equipment and medium
WO2024098195A1 (en) * 2022-11-07 2024-05-16 华为技术有限公司 Embedding representation management method and apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852755A (en) * 2019-11-06 2020-02-28 支付宝(杭州)信息技术有限公司 User identity identification method and device for transaction scene
CN110866190A (en) * 2019-11-18 2020-03-06 支付宝(杭州)信息技术有限公司 Method and device for training neural network model for representing knowledge graph
CN111159426A (en) * 2019-12-30 2020-05-15 武汉理工大学 Industrial map fusion method based on graph convolution neural network
CN111191471A (en) * 2019-12-30 2020-05-22 北京航空航天大学 Knowledge graph fusion method based on entity sequence coding
CN111428093A (en) * 2020-03-27 2020-07-17 成都数联铭品科技有限公司 Entity alignment-based visual map fusion method and system
CN111563192A (en) * 2020-04-28 2020-08-21 腾讯科技(深圳)有限公司 Entity alignment method and device, electronic equipment and storage medium
CN111652667A (en) * 2019-12-31 2020-09-11 成都数联铭品科技有限公司 Method for aligning entity data of main related natural persons of enterprise

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852755A (en) * 2019-11-06 2020-02-28 支付宝(杭州)信息技术有限公司 User identity identification method and device for transaction scene
CN110866190A (en) * 2019-11-18 2020-03-06 支付宝(杭州)信息技术有限公司 Method and device for training neural network model for representing knowledge graph
CN111159426A (en) * 2019-12-30 2020-05-15 武汉理工大学 Industrial map fusion method based on graph convolution neural network
CN111191471A (en) * 2019-12-30 2020-05-22 北京航空航天大学 Knowledge graph fusion method based on entity sequence coding
CN111652667A (en) * 2019-12-31 2020-09-11 成都数联铭品科技有限公司 Method for aligning entity data of main related natural persons of enterprise
CN111428093A (en) * 2020-03-27 2020-07-17 成都数联铭品科技有限公司 Entity alignment-based visual map fusion method and system
CN111563192A (en) * 2020-04-28 2020-08-21 腾讯科技(深圳)有限公司 Entity alignment method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112215500A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN112215500B (en) Account relation identification method and device
CN110009174B (en) Risk recognition model training method and device and server
US20210073283A1 (en) Machine learning and prediction using graph communities
CN109102393B (en) Method and device for training and using relational network embedded model
WO2019196546A1 (en) Method and apparatus for determining risk probability of service request event
CN110599336B (en) Financial product purchase prediction method and system
US20200175518A1 (en) Apparatus and method for real-time detection of fraudulent digital transactions
CN106296195A (en) A kind of Risk Identification Method and device
CN108550065B (en) Comment data processing method, device and equipment
CN102262648A (en) Evaluation predicting device, evaluation predicting method, and program
US11468272B2 (en) Method, system, and computer program product for detecting fraudulent interactions
US20220292861A1 (en) Docket Analysis Methods and Systems
CN114862587A (en) Abnormal transaction account identification method and device and computer readable storage medium
CN116485406A (en) Account detection method and device, storage medium and electronic equipment
AU2018306317A1 (en) System and method for detecting and responding to transaction patterns
US11694208B2 (en) Self learning machine learning transaction scores adjustment via normalization thereof accounting for underlying transaction score bases relating to an occurrence of fraud in a transaction
CN111311420A (en) Business data pushing method and device
CN115982391A (en) Information processing method and device
CN112446777A (en) Credit evaluation method, device, equipment and storage medium
CN113065573B (en) User classification method, user classification device and electronic equipment
CN110472680B (en) Object classification method, device and computer-readable storage medium
CN113191570A (en) Fund planning recommendation method, device and equipment based on deep learning
US12014430B2 (en) Time-based input and output monitoring and analysis to predict future inputs and outputs
US20090228232A1 (en) Range-based evaluation
US20230351783A1 (en) Application of heuristics to handwritten character recognition to identify names using neural network techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40044730

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant