CN111581450A

CN111581450A - Method and device for determining service attribute of user

Info

Publication number: CN111581450A
Application number: CN202010588745.XA
Authority: CN
Inventors: 胡斌斌; 周俊; 贾全慧; 方彦明; 张志强; 杨双红; 余泉; 方精丽
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-08-25
Anticipated expiration: 2040-06-24
Also published as: CN111581450B

Abstract

The embodiment of the specification provides a method for determining the service attribute of a user, on one hand, the method determines the prediction vector of the user based on a heterogeneous graph, fuses the expression vectors of the user obtained under each association relationship, integrates various possible information, enriches user information from multiple dimensions, and explores rich semantics under multiple relationships by utilizing information complementarity, thereby avoiding the situation that the service attribute of the user cannot be predicted due to the fact that the user cannot be accurately described by single information loss; on the other hand, in the process of determining the expression vector of the user under a single incidence relation, not only incidence influence between the user and other users is considered, but also influence of the service attribute corresponding to the connecting edge on the incidence relation is considered, and the local structure information of the user is fully utilized to enhance the representation capability of the user, so that the accuracy of predicting the service attribute of the user is improved.

Description

Method and device for determining service attribute of user

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for determining a service attribute of a user by using a computer.

Background

With the development of computer technology, daily business processing is increasingly unable to be assisted by a computer. For example, in a shopping platform, a computer records items browsed, clicked and purchased by a user to recommend more interesting items to the user, in a news APP of a terminal, the terminal records pages searched and browsed by the user to recommend more suitable news to the user, and the like. Especially, the development of artificial intelligence technology enables computers to process daily businesses more intelligently. For example, in a shopping platform, some service attribute (such as gender) of a user is predicted, so that goods meeting the corresponding service attribute are pushed for the user.

Disclosure of Invention

One or more embodiments of the present specification describe a method and an apparatus for determining a service attribute of a user, so as to determine a service attribute missing from a certain user in a service scenario.

According to a first aspect, a method for determining a service attribute of a user is provided, which is used for determining a predetermined service attribute of the user based on a heterogeneous graph, wherein the heterogeneous graph includes a plurality of nodes, each node corresponds to each user, the plurality of nodes have a plurality of independent association relations, every two nodes having the association relations are connected through a connecting edge, the plurality of association relations include a first association relation, in the first association relation, a single node corresponds to a single initial node vector determined based on a plurality of user attributes of the corresponding user, and a single connecting edge corresponds to a single initial edge vector determined based on the association attributes between the corresponding two users; for a first user to be determined a predetermined service attribute corresponding to a first node in the heterogeneous graph, the method comprises: performing iterative fusion on node vectors of neighbor nodes of a first node and edge vectors of corresponding connecting edges for a predetermined number of times based on the first association relation to obtain a first expression vector of the first node under the first association relation, wherein initial vectors of the iterative fusion are each initial node vector corresponding to each node and each initial edge vector corresponding to each connecting edge, the iterative fusion is performed on a single vector of the first node according to contribution degrees corresponding to each neighbor node/connecting edge, and the contribution degree of a single neighbor node/connecting edge is determined based on each current node vector corresponding to each neighbor node and the current edge vector of the corresponding connecting edge; fusing the first expression vector and other expression vectors respectively corresponding to other incidence relations to obtain a prediction vector of the first user; and determining the preset service attribute of the first user according to the prediction vector.

According to an embodiment, in the first association relationship, the neighbor nodes of the first node include a second node, the first node and the second node are connected by a first connecting edge, and a single iterative fusion process performed on the node vector of the first node includes: splicing current node vectors corresponding to the first node and the second node respectively and current edge vectors corresponding to the first connecting edge to obtain a second spliced vector; detecting a second contribution degree corresponding to the second node according to a second processing result of the first auxiliary matrix to the second splicing vector; and fusing the current node vector corresponding to the second node and the current edge vector corresponding to the first connecting edge into the current node vector of the first node according to the second contribution degree.

According to an embodiment, the detecting a second contribution degree corresponding to the second node according to a second processing result of the second splicing vector by the first auxiliary matrix comprises: acquiring other splicing vectors respectively corresponding to other neighbor nodes of the first node through a first auxiliary matrix, and respectively processing the other splicing vectors to obtain other processing results; and determining a second contribution degree corresponding to the second node according to the second processing result and the normalization result of each other processing result.

According to an embodiment, the fusing the current node vector corresponding to the second node and the current edge vector corresponding to the first connecting edge into the current node vector of the first node according to the second contribution degree includes: summing a current node vector corresponding to the node and a current edge vector corresponding to the first connecting edge to obtain a second sum vector; and fusing each neighbor node of the first node and the vector of the corresponding connecting edge by taking the second contribution degree as the weight of the second sum vector to obtain a neighbor fusion vector aiming at the first node.

According to an embodiment, the fusing the current node vector corresponding to the second node and the current edge vector corresponding to the first connecting edge into the current node vector of the first node according to the second contribution degree includes: forming a first splicing vector by using the neighbor fusion vector and the current node vector of the first node; and performing dimensionality reduction on the first splicing vector to obtain a current node vector of the first node.

According to an embodiment, the fusing the first expression vector and each other expression vector corresponding to each other association relationship to obtain the prediction vector of the first user includes: detecting a first importance degree of the first association relation for the first user by using the first expression vector and other expression vectors corresponding to other association relations respectively; and according to the first importance, fusing the first expression vector and each other expression vector together to obtain the prediction vector of the first user.

According to an embodiment, the detecting a first importance of the first association relation for the first user by using the first expression vector and each other expression vector corresponding to each other association relation respectively includes: detecting a first influence degree of the first expression vector for the first user by utilizing a processing result of a second auxiliary matrix on the first expression vector; and determining the first importance of the first association relation to the first user according to the normalization result of the first influence degree relative to each influence degree corresponding to each expression vector.

According to a second aspect, an apparatus for determining a service attribute of a user is provided, configured to determine a predetermined service attribute of the user based on a heterogeneous graph, where the heterogeneous graph includes a plurality of nodes, each node corresponds to each user, the plurality of nodes have multiple association relationships that are independent of each other, and every two nodes having an association relationship are connected by a connecting edge, the multiple association relationships include a first association relationship in which a single node corresponds to a single initial node vector determined based on multiple user attributes of the corresponding user, and a single connecting edge corresponds to a single initial edge vector determined based on the association attributes between the corresponding two users; the apparatus includes means for, for a first user to be determined a predetermined service attribute corresponding to a first node in the heterogeneous graph:

a determining unit, configured to perform iterative fusion for a predetermined number of times on node vectors of neighboring nodes of a first node and edge vectors of corresponding connecting edges based on the first association relationship, to obtain a first expression vector of the first node under the first association relationship, where an initial vector of the iterative fusion is each initial node vector corresponding to each node and each initial edge vector corresponding to each connecting edge, and the initial vector of the iterative fusion is performed for a single vector iterative fusion of the first node according to contribution degrees corresponding to each neighboring node/connecting edge, and the contribution degree of a single neighboring node/connecting edge is determined based on each current node vector corresponding to each neighboring node and a current edge vector of a corresponding connecting edge;

the fusion unit is configured to fuse the first expression vector and other expression vectors respectively corresponding to other incidence relations to obtain a prediction vector of the first user;

a prediction unit configured to determine a predetermined service attribute of the first user according to the prediction vector.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.

According to the method and the device provided by the embodiment of the specification, on one hand, the prediction vector of the user is determined based on the heterogeneous graph, the expression vectors of the user obtained under each incidence relation are fused, various possible information is integrated, the information of the user is enriched from multiple dimensions, and the rich semantics under multiple relations are explored by utilizing information complementarity, so that the situation that the service attribute of the user cannot be predicted due to the fact that the user cannot be accurately described due to single information loss is avoided; on the other hand, in the process of determining the expression vector of the user under a single incidence relation, not only incidence influence between the user and other users is considered, but also influence of the service attribute corresponding to the connecting edge on the incidence relation is considered, and the local structure information of the user is fully utilized to enhance the representation capability of the user, so that the accuracy of predicting the service attribute of the user is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an architecture for implementing the present description;

FIG. 2 illustrates a flow diagram of a method of determining a business attribute of a user, according to one embodiment;

FIG. 3 illustrates a diagram of neighbor node vector fusion for a node, according to a specific example;

FIG. 4 illustrates a detailed flow diagram for determining a business attribute of a user according to one embodiment;

fig. 5 shows a schematic block diagram of an arrangement for determining service attributes of a user according to an embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

First, a description will be given with reference to an embodiment shown in fig. 1. As shown in fig. 1, the specific implementation scenario is that a service attribute of one user is determined according to a heterogeneous graph formed by nodes corresponding to multiple users. The business attributes illustrated in FIG. 1 are financial breach risk attributes, and in other implementations scenarios, other business attributes may be determined, such as gender attributes, income attributes, and so forth. In the heterogeneous graph of the implementation scenario, one user may correspond to one node. A user, as referred to herein, may correspond to a user ID on the platform. Each user may interact with the server through the terminal application. The server may record various information of the user, for example, registration information of the user registered on the platform, browsing/clicking information generated on the platform, service information generated by processing a service through the platform (e.g., notifying a coloring ring back tone to open a service, etc.), and the like. Some user information used in processing the service through the platform may be referred to as service attributes, such as a gender attribute, a default risk attribute, and the like.

In the conventional technology, the association relationship between users is usually described by using graph data with a single structure, or only the influence of a neighbor node is considered. In practice, however, the reasons for affecting a user's specific business attributes (e.g., financial breach risk attributes, etc.) may be manifold. Taking the business attribute of the financial default as an example, the user may generate frequent transfer or transaction activities before the appointed period, and the new user may have less user information and less interaction information with other users.

Therefore, under the implementation framework of the description, more possible user information is mined by adopting the heteromorphic graph, and the accuracy of service attribute prediction is improved. The term "different pattern" is understood to mean a pattern including different structures in a literal sense. In fact, the heterogeneous graph here may be graph data for describing connection relationships between multiple users under multiple association types, and the multiple association types and their corresponding connection relationships may be independent of each other. That is, each connection relationship may constitute an independent graph structure (hereinafter, may also be referred to as graph data). The graph data can be stored separately according to different connection relations or can be fused together for storage. It is noted that merging of nodes (e.g., nodes corresponding to the same entity are represented by the same node) may be involved when fusing together for storage, rather than merging of connections. For example, in one embodiment, user a and user b correspond to node a and node b, respectively, in connection r₁And r₂May be represented by (a, r)₁B) and (a, r)₂And b). The two records are embodied in the heteromorphic graph, and can be two connecting edges respectively connected between the node a and the node b under two different association relations.

In an alternative embodiment, the association relationship between the nodes may be described by a predetermined meta path under each association relationship. For example, for a business scenario of a financial platform, the association relationship can be described by the following meta-paths:

(a) userA- (save) -userB: if the address book of the user address book path contains B, a meta path A-save-B is formed;

(b) userA- (saved) -userB: if the user is stored in the path, if A is stored in the address book of B, a meta path A-saved-B is formed;

(c) userA-appD-userB: a terminal application sharing path, if a user A and a user B both use a terminal application D, a meta path A-APP D-B is formed;

(d) userA-Wi-Fi C-userB: a network sharing path, if a user A and a user B both access the Internet through a wireless network WiFi C, a meta path A-Wi-Fi C-B is formed;

(e) userA- (interct) -userB: if the user A and the user B have an interactive relation, an interaction-B element path is formed;

and so on.

Wherein, the interactive relationship in (e) can be a connection generated by mutual chatting, having interactive behaviors of transferring accounts, giving red packages and the like.

The various meta-paths above correspond to various connection relationship types, respectively. The association relations of the users under various connection relation types jointly form an abnormal picture. It is worth mentioning that in each path, the same user may have the same user identification, for example, through a terminal device unique identity, a user name (userID) registered by the user in the current platform, and so on. Thus, although the heterogeneous graph contains the relationships among users described by a plurality of meta paths, the corresponding relationships among the users in various meta paths can still be clarified due to the description by the consistent user identification. When the same user is described by different identifiers, the corresponding relationship of the same user in different meta paths can be recorded by a table and the like.

It will be appreciated that each user may have corresponding user attributes. Such as: in the user address book path, the user characteristics can comprise attributes such as the number of people in the user address book; the user stored path may include attributes such as the number of times the user is stored, the type of storage (tagging) relationship, etc.; the terminal application sharing path may include attributes such as the number of terminal applications used by the user, the number of users of the shared terminal applications, and the like; the network shared path may include attributes such as the frequency of the user connecting to the network and the change frequency of the user connecting to the network; in the interactive path, attributes … … such as the number of interactive users of the corresponding user can be corresponded

Therefore, such a heterogeneous graph may also be referred to as an attribute multiplicity graph, i.e., a graph describing multiple associations between user attributes. An attribute multi-graph (an anomaly graph) shown in fig. 1 describes an illustration of an anomaly graph among a plurality of users under an association relationship such as a common APP connection relationship, a transfer connection relationship, an address book connection relationship, and the like. In the present specification, the names of the connection relationships in the heteromorphic diagram shown in fig. 1 do not limit the actual meanings to be expressed. In the heterogeneous graph, since each node corresponds to each user, the meta-path can be described by a connecting edge. That is, the properties of the meta path are embodied by the properties of the connection edges.

In the conventional technology, for processing graph data, only the interaction between neighboring nodes is usually considered, and in fact, a node usually contains information of an entity (user) corresponding to the node, and the connection relationship between entities is usually recorded by a connection edge, and the connection edge also contains rich information. For example, in the address book or the stored path, the connection edge may record information such as a time length of the stored party appearing in the address book of the storage party, in the terminal application shared path, the connection edge may record information such as the number of APPs shared by two users and the nature of the shared APPs, in the network shared path, information such as the frequency, time, and number of networks shared by two users may be recorded, and in the interaction path, information such as the frequency of interaction behavior of two users may be recorded, and so on.

That is, the attribute corresponding to the user may be more biased toward the attribute describing the user itself, and the information recorded by the connection edge may be more biased toward the attribute describing the association relationship between the users, and therefore, the attribute described by the connection edge may be referred to as an association attribute. In the data processing process, data is generally converted into a character form which can be processed by a machine, for example, various user attributes are abstracted into numerical values, and a vector representation is formed by a plurality of numerical values, so that a user is represented by a node vector. In the implementation architecture of the present specification, the association attribute corresponding to the connection edge may be described by a numerical value, so that the connection edge is represented by an edge vector. In addition, in the process of determining the vector expression of the nodes, the vector description of the connecting edges is considered, so that richer characteristic information is mined for the user, and the accuracy of predicting the service attributes of the user is improved. The technical idea of the present specification is described in detail below.

Fig. 2 is a flow diagram illustrating a process for determining a business attribute of a user according to one embodiment of the present disclosure. The execution subject of the flow may be a computer, device, server, etc. with certain computing power, such as the computing platform shown in fig. 1. The process of determining the service attribute of the user may be performed based on a heterogeneous graph. The abnormal graph may include a plurality of nodes corresponding to a plurality of users, and the nodes may have a plurality of association relations independent of each other. Every two nodes with the incidence relation can be connected through a connecting edge. Such as the heteromorphic pattern shown in fig. 1, will not be described in detail herein.

In the abnormal graph, under a single association relationship, a single node corresponds to a single initial node vector determined based on user attributes under the corresponding association relationship, and a single connecting edge corresponds to a single initial edge vector determined by the association relationship under the corresponding local connection relationship. The initial node vector or the initial edge vector may be used to describe characteristics of the corresponding node or the connecting edge, which are expressed on the corresponding service data. The initial node vector or the initial edge vector is, for example, a traffic feature vector obtained by extracting feature values according to traffic features of a predetermined node or a connection edge. It will be appreciated that for a single user, the description may be made by different user attributes under different associations with other users. For example, in the foregoing example, the user attributes such as the number of APPs currently used by the user (corresponding to terminal installation), APP category ratio, and the like may be referred to in the common APP association, and the user attributes such as the network name and IP address used historically may be referred to in the common network association. When the user attribute is related to a service, the user attribute may also be referred to as a service attribute. Accordingly, the vector representation of the same node may be different in different association relationships.

In an alternative embodiment, the initial vector representation of the node may be obtained by representing the corresponding service attribute by a numerical value through feature extraction. Correspondingly, the connection edge can also extract the characteristic value corresponding to the corresponding correlation attribute according to the correlation relationship to obtain the edge initial vector representation.

In an alternative embodiment, a single node in the heterogeneous graph may correspond to a unified initial node vector, and the unified initial node vector may describe user attributes used by corresponding users under various association relationships.

In a further optional embodiment, various service features may be extracted uniformly for a single user, and feature transformation may be performed in different association relationships to obtain an initial expression vector in a feature space in a corresponding association relationship. For example:

wherein,

and

respectively, an original node feature vector and an original edge feature vector, for example consisting of feature values of traffic features extracted directly from traffic data,

and

the parameter matrix is used for extracting an initial node vector and an initial edge phasor under a corresponding incidence relation from an original node feature vector and an original edge feature vector. Parameter matrix

And

in the method, a vector is extracted according to the setting of the element value of the corresponding position, for example, the element value of the position corresponding to the service feature used under the current association relationship is 1, and the other positions are 0.

Taking a first user with a predetermined service attribute to be determined as an example, assuming that the first user corresponds to a first node in an abnormal graph, as shown in fig. 2, the process includes: step 201, determining each expression vector corresponding to the first user under various incidence relations based on the heterogeneous graph, wherein under a single incidence relation, a single expression vector corresponding to the first user is determined based on each initial node vector corresponding to each neighbor node of the first node under the single incidence relation and each edge initial vector corresponding to each connecting edge between the single expression vector and the neighbor node; step 202, fusing the expression vectors to obtain a prediction vector of the first user; step 203, determining the predetermined service attribute of the first user according to the prediction vector.

First, in step 201, respective expression vectors corresponding to the first user in various association relationships are determined based on the heterogeneous graph. It can be understood that the heterogeneous graph contains multiple independent association relations among users, and the same node can correspond to different neighbor nodes and different connection edges between the same node and the neighbor nodes under different association relations, so that an expression vector of the user under the current association relation is determined respectively according to each association relation, and a more accurate result can be obtained.

Under the technical concept of the present specification, in a single association relationship, a single expression vector corresponding to a first user may be determined based on initial node vectors corresponding to neighboring nodes of the first node corresponding to the single expression vector in the single association relationship, and initial edge vectors corresponding to connection edges between the initial node vectors and the neighboring nodes. It can be understood that, for a single node, each connecting edge is targeted, and the association between the corresponding user and the user corresponding to the node connected to the other end of the connecting edge can be accurately described. Therefore, in a single incidence relation, when the expression vector of the current node is determined according to the neighbor node, the corresponding connection edges can be considered at the same time.

The following describes a process of determining a first expression vector corresponding to a first user in a single association relationship, taking an arbitrary first association relationship among a plurality of association relationships of an abnormal composition as an example. Here, the first association relation and the "first" in the first expression vector indicate a corresponding relation between the two, and no limitation is made to specific relations and vectors, an order, and the like.

Specifically, based on the first association relationship, a node vector of a neighbor node of the first node and an edge vector of a corresponding connection edge may be subjected to iterative fusion for a predetermined number of times, so as to obtain a first expression vector of the first node in the first association relationship. Wherein the predetermined number of times is at least 1. Under the condition that the preset times are 1, vector fusion is carried out on the neighbor nodes and the connecting edges (each neighbor node corresponds to one connecting edge) of the first node to obtain the expression vector of the first node, and the iteration times are 1, which can also be understood as no iteration. When the predetermined number of times is 2 or more, in a single iteration, vector update may be performed on each node separately in a manner corresponding to the first node.

In a possible embodiment, in a single association relationship, each service feature of the connection edge corresponds to each service feature of the node one to one, and the node vector of the neighbor node and the edge vector of the corresponding connection edge may be fused to the current node in a manner of multiplying (element-to-one multiplication) the edge vector corresponding to the connection edge and the node vector point corresponding to the node. For example, user a corresponds to node a in the heteromorphic graph, and its neighboring nodes in the first association relationship include nodes b, c, and d, which correspond to the connecting edge B, C, D respectively, so as to use

An initial node vector representing the node a,

representing the initial vectors of nodes b, c, d respectively,

the initial vectors respectively representing the connecting edges B, C, D, and after one vector fusion, the neighbor fusion vector of the node a may be:

further, the current node vector of node a may be:

here, the expression may also be understood as that the current node itself is regarded as an edge vector corresponding to a connection edge of itself, which is a special neighbor node whose elements are all 1. In this embodiment, the vector expression of the current node may be regarded as the sum of the neighbor node vectors of its respective neighbor nodes and the respective product vectors of the edge vectors of the corresponding connecting edges. The above only shows a simple vector fusion process, and in other embodiments, other vector fusion modes may also be used, for example:

and so on.

In an alternative embodiment, all nodes may also be traversed, the current node vector of each node is updated, and through multiple iterations, more neighbor node information is aggregated at the current node (e.g., a). It can be understood that after traversing all the nodes once, the current node vector of each node is updated, the current node vector of each node is aggregated by the node vectors of its own neighbor nodes, and the edge vector corresponding to the connecting edge is unchanged. The method is equivalent to that each node aggregates the characteristics of the first-order neighbor node of each node, and the incidence relation between every two nodes is unchanged. And updating the node vectors of all the nodes through next traversal, namely aggregating the characteristics of second-order neighbor nodes of all the node vectors, and so on, and aggregating the characteristics of multi-order neighbor nodes in the expression vectors of all the nodes through multiple iterations.

In another possible implementation manner, the neighboring nodes of the first node may be spliced with the edge vectors of the corresponding connecting edges to obtain corresponding splicing vectors, and the vector expression of the current node is obtained by processing each splicing vector. For example, in one embodiment, the stitching vectors may be superimposed to obtain the expression vector of the current node. In another embodiment, each splicing vector can be processed by using the auxiliary matrix to reduce the dimension of the splicing vector, so that overlarge data volume is avoided, and then the splicing vectors after dimension reduction are superposed. Optionally, the current node may be regarded as a neighbor node of itself, and the corresponding connection edge vector is a vector whose each dimension is 1.

In one particular example, the implementation may be implemented, for example, by a graph-convolutional neural network. Assuming that the current relationship is r, in the second place

The second iteration (e.g. the first of the convolutional neural network)

Layer), the vector expression of the current node (first node) may be:

it can be seen that in the second place

Layer, vector representation of current node is expressed by

The expression vectors of the neighboring nodes of the layer are determined. Wherein,

representing an edge vector connecting the edges.

In a further optional embodiment, it may also be determined for each neighbor node that it is in

The contribution of the expression vector of the layer to the current node, for example:

wherein,

representing the contribution of the neighbor node p to the current node,

indicating that the neighbor node p is at

The expression vector of the layer(s),

indicating that the neighbor node p is at

And the expression vectors corresponding to the connecting edges of the layers.

Further, the attention value of each neighbor node can be determined according to the contribution of the neighbor node. For example, as a result of normalization in the form of an exponential of the contribution to each neighboring node:

is a neighbor node set of the current node. The current node is at

The expression vector of the layer mayFor each neighbor nodeThe expression vector of a layer is a weighted sum weighted by the attention value, such as:

through a layer of graph convolution neural network, the information of a first-order neighbor node of the current node can be aggregated in the expression vector of the current node, and through a layer of graph convolution neural network, the information of a multi-order neighbor node can be aggregated at the current node, so that better expression is provided for the current node.

In yet another possible implementation, information aggregation may be performed on neighboring nodes of the current node based on concatenation of the current vectors of the current node, the neighboring nodes, and the connecting edges, and the aggregation result and the expression vector of the current node are fused together to update the expression vector of the current node. Specifically, assuming that, under the first association relationship, the neighbor node of the first node includes the second node, the first node and the second node are connected by the first connecting edge, and the single iterative fusion process performed on the node vector of the first node may include: the current node vectors corresponding to the first node and the second node respectively and the current edge vector corresponding to the first connecting edge are spliced to obtain a second spliced vector, the second spliced vector is processed through the first auxiliary matrix to obtain a second processing result, a second contribution degree corresponding to the second node is detected based on the second processing result, and then the current node vector corresponding to the second node and the current edge vector corresponding to the first connecting edge can be fused into the current node vector of the first node according to the second contribution degree.

As a specific example, the first node is represented by node u, the second node is represented by node i, and the second stitching vector may be:

。

suppose thatThe first auxiliary matrix is

Processing the second splicing vector through the first auxiliary matrix to obtain a second processing result, which may be:

or

Wherein,

being a weight matrix or an auxiliary matrix, | | | represents vector concatenation,

respectively representing node vectors for node u and node i,

representing the edge vector corresponding to the connecting edge,

is another auxiliary vector.

One such processing result may correspond to each neighbor node of the first node. The processing results can be superposed and averaged to fuse the node vector and the edge vector of each neighbor node of the first node to the node vector of the first node. In an alternative embodiment, the attention concept may also be utilized to detect the respective contribution degrees (e.g., attention values) of the neighboring nodes of the first node, and perform fusion of the node vectors and the edge vectors of the neighboring nodes of the first node to the node vector of the first node according to the contribution degrees (attention values).

In a further optional embodiment, the contribution degree corresponding to the neighboring node i may be a normalized result of the processing result corresponding to the node i to the processing result of each neighboring node of the node u. For example:

wherein,

is a neighbor node set of the node u under the association relation r,

node j in the neighbor node set representing node u,

the contribution value (or referred to as attention value, exponential normalization result, etc.) of the node i relative to the node u under the incidence relation r can be represented.

Aggregating the current node vector of each neighboring node of the node u and the edge vector of the corresponding connecting edge according to the obtained contribution degree to obtain the current aggregate vector of the node u, wherein the aggregation mode is, for example, performing weighted average on the expression vectors of each neighboring node by taking the contribution degree value as a weight, such as:

alternatively, the polymerization mode is: taking the contribution value as weight, carrying out weighted average on the expression vector of each neighbor node and the sum vector of the corresponding edge vector of the corresponding connecting edge, such as:

the aggregation mode of the neighbor nodes may also be other various reasonable modes, which is not limited in this specification.

And then, fusing the neighbor node aggregation result into the current node vector of the node u to obtain the updated node vector of the node u. For example, the neighbor node aggregation result may be added to the current node vector of node u. In an implementation manner, the neighbor node aggregation result may be spliced with the current node vector of the node u, and then dimension reduction processing is performed, for example:

wherein the auxiliary matrix

Is used for reducing the dimension of the splicing vector,

is an activation function (other activation functions may be substituted).

In the process, each neighbor node is fused with the current vector of the connecting edge, and then the current vector is fused with the vector of the current node to obtain an expression vector capable of updating the node vector of the current node, as shown in fig. 3. In fig. 3, it is assumed that a node u is a current first node,

nodes

1, 2, and 3 … … are neighbor nodes of the node u, and the corresponding node vectors are

… … and corresponding to connecting

edges

1, 2 and 3, respectively, the corresponding edge vector is

… … are provided. Firstly, the node vectors corresponding to the

nodes

1, 2 and 3 … … are obtained

… …, and corresponding edge vectors

… … to obtain a fused vector

Then the current node vector of the current node u

And the fusion vector

Further fusing to obtain the updated expression vector of the node u

. In the process, the connection side information is fully utilized in the process of fusing the neighbor node vectors, and the expression result is more accurate. In addition, the vectors of the neighbor node and the current node are fused on different layers, the proportion of the node vector of the current node in the new node vector can be increased, and the expression rule of the node vector of the current node is met, so that the expression capability of the node vector is enhanced.

Those skilled in the art will appreciate that a single vector aggregation may not be sufficient to capture more complex interactions in associations, such as interactions with multi-level transfer relationships. To improve the expressiveness of the model, stacking layers of multiple node expression vector updates to explore high-order connected information may be considered. Namely, the current node vector corresponding to each node is updated iteratively. For example, in the first place

In the second iteration, through

Determining the node vector and the edge vector after the second iteration

The process of expressing the vector for the sub-iteration is represented as:

therefore, through the transmission of multiple iterations, the feature information of the higher-order neighbor node can be aggregated to the current node, and the expression vector of the current node is obtained.

In other possible embodiments, under a single association relationship, the expression vector of the node corresponding to the current user may also be determined in other reasonable manners, which is not described herein again. It should be noted that each activation function mentioned in the above embodiments may be replaced by another activation function, and the activation functions in the above formulas are only examples.

In a similar manner, an expression vector may be determined for a user currently to be processed (e.g., the first user) in each association. When the expression vector is determined, the process of fusing the expression vectors of the neighbor nodes depends on the expression vectors corresponding to the connecting edges besides the neighbor nodes, so that the expression vector finally determined by the current user has more accurate expression capability for the current user.

Step 202, fusing the expression vectors to obtain a prediction vector for the first user. It can be understood that, according to each expression vector, a comprehensive expression vector can be determined for the current user, and this comprehensive expression vector can represent various information related to the service attribute to be predicted of the current user, and in this specification, may be referred to as a prediction vector. The prediction vector of the first user is a comprehensive description of the first user on the predetermined service attribute, and each expression vector is a description of the first user under each association relationship of the heteromorphic graph.

There are many ways to fuse the individual expression vectors, such as averaging, taking the maximum value, etc. It will be appreciated that the characteristics of entities under different connection relationships have different importance for a particular service. For example, in the user risk prediction service, the relationship network with the association type being the interaction relationship between users is more important, and the relationship network with the association type being the common terminal APP and the common network is less important. Therefore, the way of fusing the expression vectors may also include weighted summation. In the method of obtaining the weighted sum, the weight may be set in advance or may be determined by calculation according to a predetermined method. In one embodiment, the importance coefficients of the relationship network may be preset empirically. For example, the importance coefficient of the relationship network describing the interactive relationship between users is 0.5, and the importance of the relationship network of the terminal application public network is 0.1.

In an optional implementation manner, each expression vector may be utilized to determine each importance of each association relation for the first user, and then each expression vector is fused together according to each importance to obtain a prediction vector of the first user. For the first association relationship, the first importance of the first association relationship for the first user may be detected by using the first expression vector and each of the other expression vectors corresponding to each of the other association relationships, respectively, and then the first expression vector and each of the other expression vectors may be fused together according to the first importance to obtain the prediction vector of the first user. Here, the weighted weight of the expression vector may be determined according to the corresponding importance. Wherein the importance of each expression vector can be determined in various ways.

In one embodiment, the expression vectors may be averaged to obtain an average vector, and then the corresponding importance may be determined according to the distance between each expression vector and the average vector. For a single expression vector, its importance measure value may be inversely related to distance, i.e., the closer to the average vector, the greater the importance.

In another embodiment, since the importance may depict preferences for various associations (meta-paths) in the current business process, such preferences may be described by attention values.

As an example, the importance (attention value) of the current user (first user) under one of the associations may be determined by:

firstly, under the incidence relation r, the influence value of the node u is:

wherein,

is an auxiliary matrix. The influence degree here is only related to the current expression vector itself, but is not related to the expression vectors in other association relations. One such influence level may be determined for the expression vectors in each association.

The importance (attention value) of the expression vector of the node u under the association r is an exponential normalization result of the influence degree corresponding to each expression vector, such as:

wherein R is a set of incidence relations. The attention value here can be used as the importance coefficient of the expression vector of the node u under the association r.

Further, the vector representations can be fused by the importance degrees to obtain the prediction vector of the first user. For example, a prediction vector obtained by weighted averaging each importance as a weight

Can be as follows:

it should be noted that the process of obtaining the prediction vector by fusing the expression vectors according to the importance degree is described above by way of specific examples, but the specific process of fusing the expression vectors is not limited. In practice, the expression vectors can be fused in other various reasonable ways to obtain the prediction vector.

Next, in step 203, a predetermined service attribute of the first user is determined based on the prediction vector. The predetermined business attribute may be various business-related attributes to be evaluated, such as the gender of the user, the likelihood of purchasing goods from the user, the level of risk of breach of the financial community by the user, the likelihood of the user viewing pushed documents, and the like.

The process of determining the user service attribute through the prediction vector can be realized through a machine learning model. It will be appreciated that a user business attribute may generally be an attribute of a category of a plurality of categories (e.g., gender male or female), or a possible score with a certain attribute (e.g., a risk score), etc.

For clarity, the flow described in the above steps may be embodied in the form of fig. 4. As shown in fig. 4, the abnormal picture is constructed in advance by the related information of N users under the association relationship 1, the association relationship 2, and the association relationship 3. In the service processing process of predicting the predetermined service attribute of the first user, the expression vectors of the first user can be determined through the heteromorphic graph under various connection relations, wherein the expression vectors are expression vector 1, expression vector 2 and expression vector 3. And then, fusing the expression vectors to obtain a prediction vector, and predicting the preset service attribute of the first user through the prediction vector.

In a process implemented by a machine learning model, a single training sample may correspond to a single user. Step 203 may be implemented, for example, over a fully connected network. For example, each element value in the prediction vector is an input value of each neuron of the fully-connected neural network, and at least one output value can be obtained at the output layer through the corresponding weight of each neuron. And obtaining an output value corresponding to the possibility that the user to be predicted belongs to a certain class on the preset service attribute under the condition that the output layer is a neuron. In case the output layer comprises a plurality of neurons, the possibility can be derived that the user to be predicted belongs to the respective category on the predetermined service attribute. In practice, those skilled in the art may also implement the prediction from the prediction vector to the predetermined attribute in this step by other machine learning manners, which will not be described herein again.

In one possible design, the process of step 203 may be implemented solely by a machine learning model. At this time, the prediction vectors (each element corresponds to each feature of the machine learning model) corresponding to each training sample may be determined in the manner of step 201 and step 202, and the sample labels corresponding to the elements may be obtained. In the machine learning model training process, each element in the prediction vector corresponding to each training sample can be used as a feature, and the obtained output result is compared with the sample label corresponding to the training sample, so that the weight parameter from the prediction vector to the output result is adjusted in the direction of reducing loss.

In another possible design, step 201, step 202, and step 203 may be implemented as a whole by a machine learning model. At this time, the machine learning model may include various implementation processes involved in step 201, step 202, and step 203, such as weight calculation, graph convolution neural network parameters, and the like. Since the heterogeneous graph can be pre-established, each training sample can correspond to each user, and the association relationship between the users can also be updated in real time, when the machine learning model is trained, the heterogeneous graph can be updated according to the time corresponding to the training sample, the prediction vector in the step 202 is determined by using the data of the heterogeneous graph at the corresponding time as a basis, and then an output result is obtained according to the prediction vector. According to the comparison between the output result and the sample label, the weight of each parameter in the prediction vector can be adjusted, and the backward transfer can be continued to adjust the model parameters when determining the prediction vector or determining the expression vector, such as the parameters of the auxiliary matrix and the like.

In an optional implementation manner, the predetermined service attribute includes a plurality of categories, each category may correspond to each category vector, and in this step 203, the service attribute of the first user may be determined according to the category vector to which the prediction vector is closest. In this case, in the training process of the machine learning model, the sample labels may correspond to the corresponding class vectors, and the model parameters may be adjusted in a direction in which the loss is reduced, compared with the class vectors corresponding to the sample labels, based on the output result of the machine learning model.

Reviewing the above process, the method provided by the embodiment of the specification: on one hand, a prediction vector of a user is determined based on a heterogeneous graph, expression vectors of the user obtained under each incidence relation are fused, various possible information is integrated, the information of the user is enriched from multiple dimensions, and the rich semantics under multiple relations are explored by using information complementarity, so that the situation that the service attribute of the user cannot be predicted due to the fact that the user cannot be accurately described due to single information loss is avoided; on the other hand, in the process of determining the expression vector of the user under a single incidence relation, not only incidence influence between the user and other users is considered, but also influence of the service attribute corresponding to the connecting edge on the incidence relation is considered, and the local structure information of the user is fully utilized to enhance the representation capability of the user, so that the accuracy of predicting the service attribute of the user is improved.

According to another embodiment, there is also provided an apparatus for determining a service attribute of a user, which is configured to determine a predetermined service attribute of the user based on a heterogeneous graph. The heterogeneous graph may include a plurality of nodes, each node corresponds to each user, and the plurality of nodes have a plurality of independent association relationships. Every two nodes with the incidence relation are connected through a connecting edge. Under each incidence relation, a single node corresponds to a single initial node vector determined based on a plurality of user attributes of corresponding users, and a single connecting edge corresponds to a single initial edge vector determined based on the incidence attributes between two corresponding users. Assuming that any of the plurality of associations is a first association, as shown in fig. 5, the apparatus 500 may include, for a first user to be determined a predetermined service attribute corresponding to a first node in a heterogeneous graph:

a determining unit 51, configured to perform iterative fusion on node vectors of neighboring nodes of the first node and edge vectors of corresponding connecting edges for a predetermined number of times based on the first association relationship to obtain a first expression vector of the first node under the first association relationship, where an initial vector of the iterative fusion is each initial node vector corresponding to each node and each initial edge vector corresponding to each connecting edge, and the initial vector of the iterative fusion is performed for a single vector iterative fusion of the first node according to a contribution degree corresponding to each neighboring node/connecting edge, and the contribution degree of a single neighboring node/connecting edge is determined based on each current node vector corresponding to each neighboring node and a current edge vector of corresponding connecting edge;

a fusion unit 52 configured to fuse the first expression vector and each other expression vector corresponding to each other association relationship, respectively, to obtain a prediction vector of the first user;

a prediction unit 53 configured to determine a predetermined service attribute of the first user based on the prediction vector.

According to an embodiment of an aspect, in the first association relationship, the neighbor nodes of the first node include the second node, the first node and the second node are connected by the first connecting edge, the determining unit 52 includes the following sub-units, and in a single iterative fusion process performed on the node vector of the first node:

the splicing subunit is configured to splice current node vectors corresponding to the first node and the second node respectively and current edge vectors corresponding to the first connecting edge to obtain a second splicing vector;

the detection subunit is configured to detect a second contribution degree corresponding to the second node according to a second processing result of the first auxiliary matrix on the second splicing vector;

and the fusion subunit is configured to fuse the current node vector corresponding to the second node and the current edge vector corresponding to the first connecting edge into the current node vector of the first node according to the second contribution degree.

In a further optional implementation, the detection subunit is further configured to:

acquiring other splicing vectors respectively corresponding to other neighbor nodes of the first node through the first auxiliary matrix, and respectively processing the other splicing vectors to obtain other processing results;

and determining a second contribution degree corresponding to the second node according to the second processing result and the normalization result of each other processing result.

In an alternative embodiment, the fusion subunit is further configured to:

summing a current node vector corresponding to the node and a current edge vector corresponding to the first connecting edge to obtain a second sum vector;

and fusing each neighbor node of the first node and the vector of the corresponding connecting edge by taking the second contribution degree as the weight of the second sum vector to obtain a neighbor fusion vector for the first node.

In a further alternative embodiment, the fusion subunit is further configured to:

forming a first splicing vector by using the neighbor fusion vector and the current node vector of the first node;

and performing dimensionality reduction on the first splicing vector to obtain a current node vector of the first node.

According to an embodiment of another aspect, the fusion unit 53 is further configured to:

detecting a first importance degree of the first association relation to the first user by using the first expression vector and each other expression vector corresponding to each other association relation;

and according to the first importance, fusing the first expression vector and each other expression vector together to obtain a prediction vector of the first user.

In one embodiment, the fusion unit 53 may be further configured to:

detecting a first influence degree of the first expression vector aiming at the first user by utilizing a processing result of the second auxiliary matrix to the first expression vector;

and determining the first importance of the first association relation to the first user according to the normalization result of the first influence degree relative to each influence degree corresponding to each expression vector.

It should be noted that the apparatus 500 shown in fig. 5 is an apparatus embodiment corresponding to the method embodiment shown in fig. 2, and the corresponding description in the method embodiment shown in fig. 2 is also applicable to the apparatus 500, and is not repeated herein.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.

Claims

1. A method for determining service attributes of users is used for determining preset service attributes of users based on a heterogeneous graph, wherein the heterogeneous graph comprises a plurality of nodes, each node corresponds to each user, the nodes have mutually independent multiple incidence relations, every two nodes with incidence relations are connected through a connecting edge, the multiple incidence relations comprise a first incidence relation, under the first incidence relation, a single node corresponds to a single initial node vector determined based on the multiple user attributes of the corresponding user, and a single connecting edge corresponds to a single initial edge vector determined based on the incidence attributes between the corresponding two users; for a first user to be determined a predetermined service attribute corresponding to a first node in the heterogeneous graph, the method comprises:

performing iterative fusion on node vectors of neighbor nodes of the first node and edge vectors of corresponding connecting edges for a predetermined number of times based on the first association relation to obtain a first expression vector of the first node under the first association relation, wherein initial vectors of the iterative fusion are initial node vectors corresponding to the nodes and initial edge vectors corresponding to the connecting edges respectively, the iterative fusion is performed on a single vector of the first node according to contribution degrees corresponding to the neighbor nodes/connecting edges, and the contribution degree of a single neighbor node/connecting edge is determined based on current node vectors corresponding to the neighbor nodes and current edge vectors of the corresponding connecting edges;

fusing the first expression vector and other expression vectors respectively corresponding to other incidence relations to obtain a prediction vector of the first user;

and determining the preset service attribute of the first user according to the prediction vector.

2. The method of claim 1, wherein, in the first association relationship, the neighbor nodes of the first node include a second node, the first node and the second node are connected by a first connecting edge, and a single iterative fusion process performed on a node vector of the first node includes:

splicing current node vectors corresponding to the first node and the second node respectively and current edge vectors corresponding to the first connecting edge to obtain a second spliced vector;

detecting a second contribution degree corresponding to the second node according to a second processing result of the first auxiliary matrix to the second splicing vector;

and fusing the current node vector corresponding to the second node and the current edge vector corresponding to the first connecting edge into the current node vector of the first node according to the second contribution degree.

3. The method of claim 2, wherein the detecting a second contribution corresponding to the second node from the second processing result of the second mosaic vector by the first auxiliary matrix comprises:

acquiring other splicing vectors respectively corresponding to other neighbor nodes of the first node through a first auxiliary matrix, and respectively processing the other splicing vectors to obtain other processing results;

4. The method according to claim 2 or 3, wherein the fusing the current node vector corresponding to the second node and the current edge vector corresponding to the first connecting edge into the current node vector of the first node according to the second contribution degree comprises:

and fusing each neighbor node of the first node and the current edge vector of the corresponding connecting edge by taking the second contribution degree as the weight of the second sum vector to obtain a neighbor fusion vector aiming at the first node.

5. The method of claim 4, wherein the fusing the current node vector corresponding to the second node and the current edge vector corresponding to the first connecting edge into the current node vector of the first node according to the second contribution degree comprises:

6. The method according to claim 1, wherein the fusing the first expression vector and each other expression vector corresponding to each other association relationship to obtain the prediction vector of the first user comprises:

detecting a first importance degree of the first association relation for the first user by using the first expression vector and other expression vectors corresponding to other association relations respectively;

and according to the first importance, fusing the first expression vector and each other expression vector together to obtain the prediction vector of the first user.

7. The method according to claim 6, wherein the detecting a first importance of the first association relation for the first user by using the first expression vector and each other expression vector corresponding to each other association relation comprises:

detecting a first influence degree of the first expression vector for the first user by utilizing a processing result of a second auxiliary matrix on the first expression vector;

8. A device for determining service attributes of users is used for determining preset service attributes of the users based on a heterogeneous graph, wherein the heterogeneous graph comprises a plurality of nodes, each node corresponds to each user, the nodes have mutually independent multiple incidence relations, every two nodes with the incidence relations are connected through a connecting edge, the multiple incidence relations comprise a first incidence relation, under the first incidence relation, a single node corresponds to a single initial node vector determined based on the multiple user attributes of the corresponding user, and a single connecting edge corresponds to a single initial edge vector determined based on the incidence attributes between the corresponding two users; the apparatus includes means for, for a first user to be determined a predetermined service attribute corresponding to a first node in the heterogeneous graph:

9. The apparatus according to claim 8, wherein, in the first association relationship, the neighbor nodes of the first node include a second node, and the first node and the second node are connected by a first connecting edge, and the determining unit includes the following sub-units, during a single iterative fusion process performed on the node vector of the first node:

10. The apparatus of claim 9, wherein the detection subunit is further configured to:

11. The apparatus of claim 9 or 10, wherein the fusion subunit is further configured to:

and fusing each neighbor node of the first node and the vector of the corresponding connecting edge by taking the second contribution degree as the weight of the second sum vector to obtain a neighbor fusion vector aiming at the first node.

12. The apparatus of claim 11, wherein the fusion subunit is further configured to:

13. The apparatus of claim 8, wherein the fusion unit is further configured to:

14. The apparatus of claim 13, wherein the fusion unit is further configured to:

15. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.

16. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, implements the method of any of claims 1-7.