CN111460171A - Target user identification method and device for server - Google Patents

Target user identification method and device for server Download PDF

Info

Publication number
CN111460171A
CN111460171A CN202010238154.XA CN202010238154A CN111460171A CN 111460171 A CN111460171 A CN 111460171A CN 202010238154 A CN202010238154 A CN 202010238154A CN 111460171 A CN111460171 A CN 111460171A
Authority
CN
China
Prior art keywords
node
vector
entity
connecting edge
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010238154.XA
Other languages
Chinese (zh)
Other versions
CN111460171B (en
Inventor
曾威龙
王膂
钱隽夫
刘丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010238154.XA priority Critical patent/CN111460171B/en
Publication of CN111460171A publication Critical patent/CN111460171A/en
Application granted granted Critical
Publication of CN111460171B publication Critical patent/CN111460171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In the identification method, a knowledge graph is obtained and comprises a plurality of nodes, wherein each node represents an entity and corresponds to an entity vector. In the plurality of nodes, the nodes with the association relationship are connected through connecting edges, wherein each connecting edge corresponds to an association vector. And for the first node of any first class service party in the knowledge graph, when the target user is identified, the second node of the second class service party and the third node of the target user of the second class service party are determined in the knowledge graph. And determining a reference entity vector based on the entity vector of the first node and the association vector of the connecting edge of the second node and the third node. The distance between the reference entity vector and the entity vector of the node of each individual user is calculated. And selecting a target node based on the distance, and taking the individual user represented by the target node as a target user of the first class of service party.

Description

Target user identification method and device for server
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for identifying a target user of a server.
Background
In the internet field, identification of the identity of a user is often required in order to achieve risk control. The users herein may include both individual users and unit users (also referred to as service parties). For an individual user, the identity of the user is typically identified based on descriptive information about the user, such as age, salary, and occupation. And for the service party, the identity of the service party is realized by identifying the identity of the target user of the service party. The target users herein may include, but are not limited to, end beneficiaries, and the like. Wherein the final beneficiary refers to a natural person who ultimately controls the unit in a direct or indirect manner.
In the conventional technology, a stock right penetration method is generally adopted to mine a target user of a server, but the target user of the server cannot be identified based on the conventional method because unit stock control data may be incomplete and the like.
Therefore, there is a need to provide a scheme to effectively identify the target user of the service party.
Disclosure of Invention
One or more embodiments of the present disclosure describe a method and an apparatus for identifying a target user of a server, which can effectively identify the target user of the server.
In a first aspect, a method for identifying a target user of a service provider is provided, including:
acquiring a knowledge graph, wherein the knowledge graph comprises a plurality of nodes, each node represents an entity and corresponds to an entity vector; the entity comprises any one of the following: the service system comprises a first type of service party which does not identify a target user, a second type of service party which identifies the target user and an individual user; in the plurality of nodes, the nodes with the incidence relation are connected through connecting edges, wherein each connecting edge corresponds to an incidence vector; the association vector is used for representing the association relation of two nodes corresponding to the connecting edge;
for a first node of which any represented entity in the knowledge graph is a first class service party, when a target user is identified for the corresponding first class service party, determining a second node of which the represented entity is a second class service party and a third node of which the represented entity is the target user of the second class service party in the knowledge graph; wherein the target user of the second service party is one of the individual users;
determining a reference entity vector based on the entity vector of the first node and the association vector of the connecting edge of the second node and the third node;
calculating the distance between the reference entity vector and the entity vector of each node of which the represented entity is the individual user;
and selecting a target node from the nodes of which the represented entity is the individual user based on the distance, and taking the individual user represented by the target node as a target user of the first class of service party.
In a second aspect, an apparatus for identifying a target user of a service provider is provided, including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a knowledge graph which comprises a plurality of nodes, and each node represents an entity and corresponds to an entity vector; the entity comprises any one of the following: the service system comprises a first type of service party which does not identify a target user, a second type of service party which identifies the target user and an individual user; in the plurality of nodes, the nodes with the incidence relation are connected through connecting edges, wherein each connecting edge corresponds to an incidence vector; the association vector is used for representing the association relation of two nodes corresponding to the connecting edge;
a determining unit, configured to determine, for a first node in the knowledge graph acquired by the acquiring unit, that any entity represented in the knowledge graph is a first class service provider, when a target user is identified for a corresponding first class service provider, a second node in which the entity represented in the knowledge graph is a second class service provider and a third node in which the entity represented in the knowledge graph is a target user of the second class service provider; wherein the target user of the second service party is one of the individual users;
the determining unit is further configured to determine a reference entity vector based on the entity vector of the first node and an association vector of a connecting edge of the second node and the third node;
a calculation unit, configured to calculate a distance between the reference entity vector determined by the determination unit and an entity vector of each node whose represented entity is an individual user;
and the selecting unit is used for selecting a target node from all nodes of which the represented entity is the personal user based on the distance calculated by the calculating unit, and taking the personal user represented by the target node as the target user of the first class service party.
In a third aspect, there is provided a computer storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
In a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
In the method and the apparatus for identifying a target user of a service party provided in one or more embodiments of the present specification, for any first class of service party that does not identify a target user in a knowledge graph, a reference entity vector for identifying the target user of the first class of service party may be determined based on an entity vector corresponding to the first class of service party and an association vector corresponding to an association relationship between a second class of service party that has identified the target user in the knowledge graph and the corresponding target user. Target users of the first class of service are then identified from the individual users in the knowledge-graph based on the reference entity vector. Thereby, an efficient identification of the target user of the service party can be achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is an exemplary schematic diagram of a knowledge graph provided herein;
FIG. 2 is a flowchart of a vectorization method for nodes and connecting edges in a knowledge graph provided in the present specification;
FIG. 3 is a flowchart of a method for identifying a target user of a server according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a target subscriber identification device of a service provider according to an embodiment of the present disclosure.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
To achieve efficient identification of a target user of a server, the inventors of the present application propose to identify the target user of the server based on a knowledge graph. The service side described in this specification may be, for example, an organization such as a business, a school, and a hospital. Target users of the service include, but are not limited to, end beneficiaries, and the like.
The knowledge graph described in this specification may be constructed based on several service parties and associations between each service party and respective individual users. The service parties may include, but are not limited to, a first type of service party that does not identify the target user and a second type of service party that identifies the target user. In addition, the association relationship may include, but is not limited to, holdings, funding, media, address, and finally benefit ownership (UBO) relationships, and the like. Wherein, the UBO relationship can be obtained by adopting a traditional stock right penetration method. It can be understood that the association relationship between the second class of service party and the corresponding target user is a UBO relationship. The target user here is one of the individual users mentioned above.
For the above knowledge graph, it can be generally organized in the form of a node connection graph, which includes a plurality of nodes, each node representing an entity. The entities here may include any of the following: a first class of service party for which the target user is not identified, a second class of service party for which the target user is identified, and an individual user. In addition, among the plurality of nodes, nodes having an association relationship are connected by a connecting edge.
In one example, the knowledge graph organized in the form of a node connection graph may be as shown in FIG. 1. In fig. 1, the knowledge-graph may include 7 nodes: e1-e 8. The entity represented by the node e1 is the second type of service party identified to the target user, and the entity represented by the node e2 is the first type of service party not identified to the target user. The entities represented by nodes e3-e8 are individual users, and the individual user corresponding to node e3 is the target user of the second class of service. Further, node e1 is connected to each of nodes e3-e6 by connecting edges, and node e2 is connected to each of nodes e6-e8 by connecting edges.
It should be appreciated that in FIG. 1, in the case where the target user is an end beneficiary, the association between the entity represented by node e1 (i.e., the second class of servers) and the entity represented by node e3 (i.e., the target user of the second class of servers) is a UBO relationship.
In addition, fig. 1 is only an example of the knowledge graph given in this specification, and in practical applications, the knowledge graph may include a plurality of nodes representing different first-type service providers and may also include a plurality of nodes representing different second-type service providers, which is not limited in this specification.
It should be noted that after the knowledge graph is arranged into the form of the node connection graph, entity vectors of each node and associated vectors of each connection edge in the knowledge graph may also be determined, that is, vectorization is performed on the nodes and the connection edges in the knowledge graph, which is referred to as graph vectorization for short. The association vector is used for representing the association relation between two nodes of the corresponding connecting edge.
In summary, the graph-vectorization operation may specifically include: and acquiring an array set corresponding to the knowledge graph. Each positive sample array and corresponding negative sample array is determined based on the array set. Based on each positive sample array and the corresponding negative sample array, a prediction penalty is determined. And vectorizing the nodes and the connecting edges in the knowledge graph based on the prediction loss.
In practical applications, the other steps of the above-described vectorization operation, except for the step of acquiring the array set, are executed iteratively, and the iteration stop conditions thereof are described later. After the iteration is finished, the entity vector of each node in the knowledge graph and the association vector of each connecting edge can be obtained. The graph-vectorization operation is described in detail below with reference to fig. 2.
Fig. 2 is a vectorization method of nodes and connecting edges in a knowledge graph provided in this specification. As shown in fig. 2, the method may include the steps of:
at step 202, an array set corresponding to the knowledge-graph is obtained.
Typically, the knowledge-graph is stored in the form of a set of arrays. Each array in the array set here includes nodes and connecting edges. In one example, the nodes in each array include a head node and a tail node. It should be understood that the head node is a node in which the entity represented in the knowledge-graph is a first class server or a second class server, and may be, for example, node e1 or node e2 in fig. 1. And the tail node may be a node in which the entity represented in the knowledge-graph is an individual user, e.g., any one of nodes e3-e8 in fig. 1. Further, for each array in the set of arrays, when the head node is denoted by h, the connecting edge is denoted by r, and the tail node is denoted by t, the array may be represented as a triplet as follows: { h, r, t }. Thus, the array set is a set of triples.
After the set of arrays of the knowledge-graph is obtained, the set of nodes E ═ { E1, E2, …, em } may also be determined based on h or r of each triplet therein. It should be noted that each node in the node set represents a different entity. Further, based on R of each triplet therein, the edge set R ═ { R1, R2, …, rn } may be determined. The incidence relations represented by the connecting edges in the edge set are different.
It should be noted that, since the acquired array set corresponds to the knowledge graph, the node set E may also be understood as being determined based on a plurality of nodes in the knowledge graph. The edge set R may be determined based on each connected edge in the knowledge graph.
For each node ei (1 ≦ i ≦ m) in the node set E described above, it may be initialized to a vector containing k elements, where k is a positive integer. In one example, the range of values for each of the k elements may be:
Figure BDA0002431701260000061
in another embodiment, the value range of each of the k elements may also be: (-1,1). Then, the vector containing k elements may be normalized to obtain the current entity vector of each node ei.
Likewise, for each connected edge ri (1 ≦ i ≦ n) in the edge set R, it may also be initialized to a vector containing k elements. The value range of each element is as described above. Then, the vector containing k elements is normalized to obtain the current association vector of each connecting edge ri.
And step 204, iteratively executing the steps a-d based on the array set until an iteration stop condition is met.
In a preferred example, before performing the following steps a-d, the following screening operation may be performed on the array set: and screening the triples of which the entities represented by the head nodes are the second type of service parties from the array set, and taking the array set formed by the screened triples as the array set used in the following steps. That is, the entities represented by the head nodes of the triples in the array set described below in this specification are all the second class servers.
Further, the iteration stop condition may include, but is not limited to, the number of iterations reaching a predetermined number (e.g., may be 500) and the prediction loss being less than a predetermined threshold (e.g., 0.05, etc.).
Step a, selecting a plurality of arrays from the array set as positive sample arrays.
Taking FIG. 1 as an example, the triples corresponding to the knowledge-graph in FIG. 1: { e1, r1, e3} and { e1, r2, e4} are positive sample arrays.
In one example, several positive sample arrays may constitute a set of positive sample arrays, denoted as S1
And b, changing the nodes in each positive sample array to obtain a negative sample array corresponding to each positive sample array.
Taking any first positive sample array in each positive sample array as an example, the head node or the tail node may be replaced by any other node in the knowledge graph to obtain a negative sample array corresponding to the first sample array. Also for example in fig. 1, where the positive sample array is: { e1, r1, e3}, the corresponding negative sample array may be: { e1, r1, e7}, i.e., the tail node e3 of the positive sample array is replaced. Of course, the corresponding negative sample array may also be: { e2, r1, e3}, etc., which are not enumerated herein.
In one example, negative sample arrays corresponding to respective positive sample arrays may constitute a set of negative sample arrays, denoted as S2
And c, determining the prediction loss based on each positive sample array and the corresponding negative sample array.
The step of determining the predicted loss may specifically be: for each positive sample array in the positive sample arrays, summing the current entity vector of the head node and the current association vector of the connecting edge, and calculating a first distance between the summation result and the current entity vector of the tail node. For the negative sample array corresponding to the positive sample array, summing the current entity vector of the changed head node and the current association vector of the connecting edge, and calculating a second distance between the summation result and the current entity vector of the tail node; or, summing the current entity vector of the head node and the current association vector of the connecting edge, and calculating a second distance between the summation result and the current entity vector of the changed tail node. And obtaining the vector distance difference of the positive sample array and the corresponding negative sample array by calculating the difference of the first distance and the second distance. And determining the prediction loss based on the vector distance difference of each positive sample array and the corresponding negative sample array.
In one example, the predicted loss may be determined based on the following equation:
Figure BDA0002431701260000081
wherein S is1Is a set of positive sample arrays, and (h, r, t) is a positive sample array in the set of positive sample arrays. S2Is a set of negative sample arrays, and (h ', r, t') is one of the negative sample arrays in the set of negative sample arrays. D () is a function for calculating the euclidean distance and γ is a hyperparameter.
In addition, h 'in the above formula is the changed head node in the negative sample array, and t' is the changed tail node in the negative sample array. It should be noted that, to simplify the above formula, the negative sample array is uniformly expressed as (h ', r, t'). In practical applications, the negative sample array in the formula should be expressed as: (h ', r, t) or (h, r, t').
And d, adjusting the current entity vector of each node and the current association vector of each connecting edge in the knowledge graph based on the prediction loss.
Specifically, the current entity vector of each node in the node set may be adjusted, and the current association vector of each connecting edge in the edge set may be adjusted based on the prediction loss.
In one example, the adjustment process of the current entity vector of each node in the node set may be: for each node ei in the node set E, a gradient value is calculated for each element based on the determined prediction loss and the element value of the respective element of the node. And adjusting the element value of each element of the node based on the gradient value of each element to obtain the adjusted current entity vector of the node.
In one example, the element values of the respective elements of each node ei may be adjusted based on the following formula.
Figure BDA0002431701260000082
Wherein m is the number of nodes, k is the number of elements of each node, eij is the jth element of the node ei, η is a hyperparameter, loss is the prediction loss,
Figure BDA0002431701260000083
the gradient value of the jth element of node ei.
In one example, the adjustment process of the current association vector of each connected edge in the edge set may be: for each connected edge ri in the edge set, a gradient value is calculated for each element based on the predicted loss and the element value of the element of the connected edge. And adjusting the element value of each element of the connecting edge based on the gradient value of each element to obtain the adjusted current association vector of the connecting edge.
In one example, the element values of the respective elements of each connecting edge ri may be adjusted based on the following formula.
Figure BDA0002431701260000091
Wherein n is the number of the connecting edges, k is the number of elements of each connecting edge, rij is the jth element of the connecting edge ri, η is a hyperparameter, loss is a prediction loss,
Figure BDA0002431701260000092
is the gradient value of the jth element connecting the edges ri.
And step 206, after the iteration is finished, taking the current entity vector of each node as the entity vector thereof, and taking the current association vector of each connection edge as the association vector thereof.
In this specification, after vectorizing the nodes and the connecting edges in the knowledge graph, the relationship between the entity vectors of the nodes may maintain the structural relationship between the nodes in the knowledge graph. In addition, the association vector of each connection edge can represent the association relationship corresponding to each connection edge.
Furthermore, after the above-described graph-vectorization completion operation is performed, the identification of the target user of the service side may be achieved based on the graph-vectorized knowledge graph. For example, target users of the first class of service in the knowledge-graph are identified. This identification process is explained in detail below.
Fig. 3 is a method for identifying a target user of a service provider according to an embodiment of the present disclosure. The execution subject of the method may be a device with processing capabilities: a server or a system or device. As shown in fig. 3, the method may specifically include the following steps:
step 302, a knowledge graph is obtained.
The knowledge graph obtained here may refer to the graph-vectorized knowledge graph, that is, each node corresponds to an entity vector, and each connecting edge corresponds to an association vector.
Step 304, for a first node of which any represented entity is a first class service provider, when identifying a target user for a corresponding first class service provider, determining a second node of which the represented entity is a second class service provider and a third node of which the represented entity is a target user of the second class service provider in the knowledge graph.
It should be noted that, when the entities represented by the plurality of nodes in the knowledge graph are all the second-class servers, the number of the second nodes determined here is multiple, and the number of the third nodes determined here is also multiple.
Taking the knowledge graph shown in fig. 1 as an example, for the node e2, when the target user is identified for the corresponding first class of service, the node e1 whose represented entity is the second class of service and the node e3 whose represented entity is the target user of the second class of service may be determined in the knowledge graph.
Step 306, a reference entity vector is determined based on the entity vector of the first node and the association vector of the connecting edge of the second node and the third node.
It should be noted that the reference entity vector is a vector referred to in the process of selecting the target user of the first class of service provider.
In one example, the entity vector of the first node and the association vector of the connecting edge of the second node and the third node may be summed, with the result of the summation being the reference entity vector. Assuming that each node corresponds to an entity vector and each connecting edge corresponds to an association vector, as in the previous example, the entity vector of node e2 and the association vectors of the connecting edges of node e1 and node e3 may be summed to obtain the reference entity vector.
It should be understood that when the number of second nodes and third nodes determined in step 304 is plural, then the number of reference entity vectors determined herein is also plural.
In step 308, the distance between the reference entity vector and the entity vector of each node whose represented entity is an individual user is calculated.
If there are a plurality of reference entity vectors, the distance between each reference entity vector in the reference entity vectors and the entity vector of each node whose representative entity is an individual user may be calculated. The distance here may be any of the following: euclidean distance, manhattan distance, chebyshev distance, and minkowski distance, among others.
Also for the example of the knowledge-graph shown in FIG. 1, the distance between the reference entity vector and the entity vector of each of node e 3-node e8 may be calculated.
And 310, selecting a target node from the nodes of which the represented entity is the individual user based on the calculated distance, and taking the individual user represented by the selected target node as a target user of the first class service party.
In one example, a node corresponding to the minimum distance among the nodes of which the represented entity is an individual user may be taken as a target node. As in the previous example, assume that the number of reference entity vectors is 1, and the distances between the reference entity vector and the entity vectors of each of the nodes e3-e8 are: D1-D6, and D6 is minimal, then the individual user represented by node e8 may be targeted users for the first class of service.
In summary, according to the scheme provided by the embodiment of the present disclosure, for any first class of service party in the knowledge graph that does not identify the target user, the corresponding target user can be accurately and effectively identified. In addition, the scheme also has good expansibility. For example, when a new service party needing to identify a target user needs to be added, the new service party only needs to be added into the knowledge graph, and the identification of the target user of the new service party can be realized after the entity vectors of the new service party and the associated individual users are determined based on the vectorized knowledge graph.
Corresponding to the method for identifying a target user of a service party, an embodiment of the present specification further provides an apparatus for identifying a target user of a service party, as shown in fig. 4, the apparatus may include:
an obtaining unit 402, configured to obtain a knowledge graph. The knowledge-graph includes a plurality of nodes, each of which represents an entity and corresponds to an entity vector. The entity includes any one of: a first class of service party for which the target user is not identified, a second class of service party for which the target user is identified, and an individual user. In the plurality of nodes, the nodes with the association relationship are connected through a connecting edge, wherein each connecting edge corresponds to an association vector, and the association vector is used for representing the association relationship of two nodes corresponding to the connecting edge.
A determining unit 404, configured to determine, for a first node in which any entity represented in the knowledge graph acquired by the acquiring unit 402 is a first class service provider, when a target user is identified for a corresponding first class service provider, a second node in which the represented entity is a second class service provider and a third node in which the represented entity is a target user of the second class service provider in the knowledge graph. And the target user of the second service party is one of the individual users.
The determining unit 404 is further configured to determine a reference entity vector based on the entity vector of the first node and the association vector of the connecting edge of the second node and the third node.
The determining unit 404 may specifically be configured to:
and summing the entity vector of the first node and the association vector of the connecting edge of the second node and the third node, and taking the summation result as a reference entity vector.
A calculating unit 406, configured to calculate a distance between the reference entity vector determined by the determining unit 404 and the entity vector of each node of which the represented entity is an individual user.
The distance may include any one of: the euclidean distance, the manhattan distance, the chebyshev distance, and the minkowski distance.
A selecting unit 408, configured to select a target node from nodes of which the represented entity is an individual user based on the distance calculated by the calculating unit 406, and use the individual user represented by the target node as a target user of the first class service provider.
Optionally, the apparatus may further include: an execution unit (not shown in the figure).
An obtaining unit 402, configured to obtain an array set corresponding to the knowledge-graph, where each array in the array set includes a node and a connecting edge.
An execution unit, configured to iteratively perform the following steps based on the array set acquired by the acquisition unit 402 until an iteration stop condition is satisfied:
a number of arrays are selected from the set of arrays as positive sample arrays.
And changing the nodes in each positive sample array to obtain a negative sample array corresponding to each positive sample array.
Based on each positive sample array and the corresponding negative sample array, a prediction penalty is determined.
Based on the prediction loss, the current entity vector of each node in the knowledge-graph and the current association vector of each connecting edge are adjusted.
A determining unit 404, configured to, after the iteration is finished, take the current entity vector of each node as its entity vector, and take the current association vector of each connecting edge as its association vector.
The above-mentioned iteration stop condition includes any one of: the number of iterations reaches a predetermined number and the prediction loss is less than a predetermined threshold.
Optionally, the nodes in each array include a head node and a tail node, each positive sample array includes a first sample array, and the execution unit is specifically configured to: and for the first sample array, replacing the head node or the tail node of the first sample array with any other node in the knowledge graph to obtain a negative sample array corresponding to the first sample array.
Optionally, the apparatus may further include: an initialization unit (not shown in the figure).
A determining unit 404, further configured to determine a node set based on a plurality of nodes in the knowledge-graph, and determine an edge set based on each connecting edge in the knowledge-graph;
an initializing unit, configured to initialize each node in the node set determined by the determining unit 404 to a first vector containing k elements, and use the first vector as a current entity vector of the node. For each connected edge in the set of edges, initializing it to a second vector containing k elements, where k is a positive integer,
the execution unit is specifically configured to:
based on the prediction loss, adjusting the current entity vector of each node in the node set, and adjusting the current association vector of each connecting edge in the edge set.
Optionally, the execution unit is further specifically configured to:
for each node in the set of nodes, a gradient value is calculated for each element of the node based on the predicted loss and the element value for each element. And adjusting the element value of each element of the node based on the gradient value of each element to obtain the adjusted current entity vector of the node.
For each connected edge in the set of edges, a gradient value is calculated for each element based on the predicted loss and the element value of the element of the connected edge. And adjusting the element value of each element of the connecting edge based on the gradient value of each element to obtain the adjusted current association vector of the connecting edge.
Optionally, the nodes in each array include a head node and a tail node, and the execution unit is further specifically configured to:
for each positive sample array in the positive sample arrays, summing the current entity vector of the head node and the current association vector of the connecting edge, and calculating a first distance between the summation result and the current entity vector of the tail node.
For the negative sample array corresponding to the positive sample array, summing the current entity vector of the changed head node and the current association vector of the connecting edge, and calculating a second distance between the summation result and the current entity vector of the tail node; or, summing the current entity vector of the head node and the current association vector of the connecting edge, and calculating a second distance between the summation result and the current entity vector of the changed tail node.
And obtaining the vector distance difference between the positive sample array and the corresponding negative sample array by calculating the difference between the first distance and the second distance.
The prediction loss is determined based on the vector distance difference of each positive sample array and the corresponding negative sample array.
The functions of each functional module of the device in the above embodiments of the present description may be implemented through each step of the above method embodiments, and therefore, a specific working process of the device provided in one embodiment of the present description is not repeated herein.
The target user identification device of the server provided by one embodiment of the present specification can effectively identify the target user of the server.
In another aspect, embodiments of the present specification provide a computer-readable storage medium having stored thereon a computer program, which, when executed in a computer, causes the computer to perform the method shown in fig. 2 or fig. 3.
In another aspect, embodiments of the present description provide a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method shown in fig. 2 or fig. 3.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or may be embodied in software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a server. Of course, the processor and the storage medium may reside as discrete components in a server.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above-mentioned embodiments, objects, technical solutions and advantages of the present specification are further described in detail, it should be understood that the above-mentioned embodiments are only specific embodiments of the present specification, and are not intended to limit the scope of the present specification, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present specification should be included in the scope of the present specification.

Claims (20)

1. A target user identification method of a server side comprises the following steps:
acquiring a knowledge graph, wherein the knowledge graph comprises a plurality of nodes, each node represents an entity and corresponds to an entity vector; the entity comprises any one of the following: the service system comprises a first type of service party which does not identify a target user, a second type of service party which identifies the target user and an individual user; in the plurality of nodes, the nodes with the incidence relation are connected through connecting edges, wherein each connecting edge corresponds to an incidence vector; the association vector is used for representing the association relation of two nodes corresponding to the connecting edge;
for a first node of which any represented entity in the knowledge graph is a first class service party, when a target user is identified for the corresponding first class service party, determining a second node of which the represented entity is a second class service party and a third node of which the represented entity is the target user of the second class service party in the knowledge graph; wherein the target user of the second service party is one of the individual users;
determining a reference entity vector based on the entity vector of the first node and the association vector of the connecting edge of the second node and the third node;
calculating the distance between the reference entity vector and the entity vector of each node of which the represented entity is the individual user;
and selecting a target node from the nodes of which the represented entity is the individual user based on the distance, and taking the individual user represented by the target node as a target user of the first class of service party.
2. The method of claim 1, wherein the entity vector of each node in the knowledge-graph and the association vector of each connecting edge are obtained by:
acquiring an array set corresponding to the knowledge graph, wherein each array in the array set comprises a node and a connecting edge;
iteratively performing the following steps based on the set of arrays until an iteration stop condition is satisfied:
selecting a plurality of arrays from the array set as positive sample arrays;
changing nodes in each positive sample array to obtain a negative sample array corresponding to each positive sample array;
determining a prediction loss based on the positive sample arrays and the corresponding negative sample arrays;
based on the prediction loss, adjusting the current entity vector of each node in the knowledge graph and the current association vector of each connecting edge;
and after the iteration is finished, taking the current entity vector of each node as the entity vector thereof, and taking the current association vector of each connecting edge as the association vector thereof.
3. The method of claim 2, further comprising, prior to performing the iterating step:
determining a set of nodes based on a plurality of nodes in the knowledge-graph, and determining a set of edges based on each connecting edge in the knowledge-graph;
initializing each node in the node set into a first vector containing k elements, and taking the first vector as a current entity vector of the node; for each connecting edge in the edge set, initializing the connecting edge into a second vector containing k elements, and taking the second vector as a current association vector of the connecting edge; wherein k is a positive integer;
adjusting the current entity vector of each node and the current association vector of each connecting edge in the knowledge-graph based on the prediction loss comprises:
based on the prediction loss, adjusting the current entity vector of each node in the node set, and adjusting the current association vector of each connecting edge in the edge set.
4. The method of claim 3, wherein adjusting the current entity vector of each node in the set of nodes and adjusting the current association vector of each connected edge in the set of edges based on the prediction loss comprises:
for each node in the set of nodes, calculating a gradient value of each element based on the predicted loss and the element value of each element of the node; based on the gradient value of each element, adjusting the element value of each element of the node to obtain the adjusted current entity vector of the node;
for each connected edge in the edge set, calculating a gradient value of each element based on the prediction loss and the element value of each element of the connected edge; and adjusting the element values of the elements of the connecting edge based on the gradient values of the elements to obtain the adjusted current association vector of the connecting edge.
5. The method of claim 2, the nodes in each array comprising a head node and a tail node; each positive sample array comprises a first sample array;
the changing nodes in each positive sample array includes:
and for the first sample array, replacing a head node or a tail node of the first sample array with any other node in the knowledge graph to obtain a negative sample array corresponding to the first sample array.
6. The method of claim 2, the nodes in each array comprising a head node and a tail node; determining a prediction loss based on the positive sample arrays and the corresponding negative sample arrays, comprising:
for each positive sample array in the positive sample arrays, summing the current entity vector of the head node and the current association vector of the connecting edge, and calculating a first distance between the summation result and the current entity vector of the tail node;
for the negative sample array corresponding to the positive sample array, summing the current entity vector of the changed head node and the current association vector of the connecting edge, and calculating a second distance between the summation result and the current entity vector of the tail node; or, summing the current entity vector of the head node and the current association vector of the connecting edge, and calculating a second distance between the summation result and the changed current entity vector of the tail node;
obtaining a vector distance difference between the positive sample array and the corresponding negative sample array by calculating the difference between the first distance and the second distance;
and determining the prediction loss based on the vector distance difference of each positive sample array and the corresponding negative sample array.
7. The method of claim 2, the iteration stop condition comprising any of: the number of iterations reaches a predetermined number and the prediction loss is less than a predetermined threshold.
8. The method of claim 1, the determining a reference entity vector based on an entity vector of the first node and an association vector of a connecting edge of the second node and the third node, comprising:
and summing the entity vector of the first node and the association vector of the connecting edge of the second node and the third node, and taking the summation result as the reference entity vector.
9. The method of claim 1, the distance comprising any of: the euclidean distance, the manhattan distance, the chebyshev distance, and the minkowski distance.
10. A target subscriber identification device of a server, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a knowledge graph which comprises a plurality of nodes, and each node represents an entity and corresponds to an entity vector; the entity comprises any one of the following: the service system comprises a first type of service party which does not identify a target user, a second type of service party which identifies the target user and an individual user; in the plurality of nodes, the nodes with the incidence relation are connected through connecting edges, wherein each connecting edge corresponds to an incidence vector; the association vector is used for representing the association relation of two nodes corresponding to the connecting edge;
a determining unit, configured to determine, for a first node in the knowledge graph acquired by the acquiring unit, that any entity represented in the knowledge graph is a first class service provider, when a target user is identified for a corresponding first class service provider, a second node in which the entity represented in the knowledge graph is a second class service provider and a third node in which the entity represented in the knowledge graph is a target user of the second class service provider; wherein the target user of the second service party is one of the individual users;
the determining unit is further configured to determine a reference entity vector based on the entity vector of the first node and an association vector of a connecting edge of the second node and the third node;
a calculation unit, configured to calculate a distance between the reference entity vector determined by the determination unit and an entity vector of each node whose represented entity is an individual user;
and the selecting unit is used for selecting a target node from all nodes of which the represented entity is the personal user based on the distance calculated by the calculating unit, and taking the personal user represented by the target node as the target user of the first class service party.
11. The apparatus of claim 10, further comprising: an execution unit;
the acquisition unit is used for acquiring an array set corresponding to the knowledge graph, and each array in the array set comprises a node and a connecting edge;
the execution unit is configured to iteratively execute the following steps based on the array set acquired by the acquisition unit until an iteration stop condition is satisfied:
selecting a plurality of arrays from the array set as positive sample arrays;
changing nodes in each positive sample array to obtain a negative sample array corresponding to each positive sample array;
determining a prediction loss based on the positive sample arrays and the corresponding negative sample arrays;
based on the prediction loss, adjusting the current entity vector of each node in the knowledge graph and the current association vector of each connecting edge;
and the determining unit is used for taking the current entity vector of each node as the entity vector thereof and taking the current association vector of each connecting edge as the association vector thereof after the iteration is finished.
12. The apparatus of claim 11, further comprising: an initialization unit;
the determining unit is further configured to determine a node set based on a plurality of nodes in the knowledge graph, and determine an edge set based on each connecting edge in the knowledge graph;
the initialization unit is configured to initialize each node in the node set determined by the determination unit to a first vector containing k elements, and use the first vector as a current entity vector of the node; for each connecting edge in the edge set, initializing the connecting edge into a second vector containing k elements, and taking the second vector as a current association vector of the connecting edge; wherein k is a positive integer;
the execution unit is specifically configured to:
based on the prediction loss, adjusting the current entity vector of each node in the node set, and adjusting the current association vector of each connecting edge in the edge set.
13. The apparatus of claim 12, the execution unit further specifically configured to:
for each node in the set of nodes, calculating a gradient value of each element based on the predicted loss and the element value of each element of the node; based on the gradient value of each element, adjusting the element value of each element of the node to obtain the adjusted current entity vector of the node;
for each connected edge in the edge set, calculating a gradient value of each element based on the prediction loss and the element value of each element of the connected edge; and adjusting the element values of the elements of the connecting edge based on the gradient values of the elements to obtain the adjusted current association vector of the connecting edge.
14. The apparatus of claim 11, the nodes in each array comprising a head node and a tail node; each positive sample array comprises a first sample array;
the execution unit is specifically configured to:
and for the first sample array, replacing a head node or a tail node of the first sample array with any other node in the knowledge graph to obtain a negative sample array corresponding to the first sample array.
15. The apparatus of claim 11, the nodes in each array comprising a head node and a tail node; the execution unit is specifically configured to:
for each positive sample array in the positive sample arrays, summing the current entity vector of the head node and the current association vector of the connecting edge, and calculating a first distance between the summation result and the current entity vector of the tail node;
for the negative sample array corresponding to the positive sample array, summing the current entity vector of the changed head node and the current association vector of the connecting edge, and calculating a second distance between the summation result and the current entity vector of the tail node; or, summing the current entity vector of the head node and the current association vector of the connecting edge, and calculating a second distance between the summation result and the changed current entity vector of the tail node;
obtaining a vector distance difference between the positive sample array and the corresponding negative sample array by calculating the difference between the first distance and the second distance;
and determining the prediction loss based on the vector distance difference of each positive sample array and the corresponding negative sample array.
16. The apparatus of claim 11, the iteration stop condition comprising any of: the number of iterations reaches a predetermined number and the prediction loss is less than a predetermined threshold.
17. The apparatus according to claim 10, wherein the determining unit is specifically configured to:
and summing the entity vector of the first node and the association vector of the connecting edge of the second node and the third node, and taking the summation result as the reference entity vector.
18. The apparatus of claim 10, the distance comprising any of: the euclidean distance, the manhattan distance, the chebyshev distance, and the minkowski distance.
19. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-9.
20. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-9.
CN202010238154.XA 2020-03-30 2020-03-30 Target user identification method and device for server Active CN111460171B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010238154.XA CN111460171B (en) 2020-03-30 2020-03-30 Target user identification method and device for server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010238154.XA CN111460171B (en) 2020-03-30 2020-03-30 Target user identification method and device for server

Publications (2)

Publication Number Publication Date
CN111460171A true CN111460171A (en) 2020-07-28
CN111460171B CN111460171B (en) 2023-04-07

Family

ID=71682426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010238154.XA Active CN111460171B (en) 2020-03-30 2020-03-30 Target user identification method and device for server

Country Status (1)

Country Link
CN (1) CN111460171B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112785350A (en) * 2021-02-24 2021-05-11 深圳市慧择时代科技有限公司 Method and device for determining product vector
CN117891929A (en) * 2024-03-18 2024-04-16 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242633A (en) * 2018-09-20 2019-01-18 阿里巴巴集团控股有限公司 A kind of commodity method for pushing and device based on bigraph (bipartite graph) network
CN110188147A (en) * 2019-05-22 2019-08-30 厦门无常师教育科技有限公司 The document entity relationship of knowledge based map finds method and system
US10496678B1 (en) * 2016-05-12 2019-12-03 Federal Home Loan Mortgage Corporation (Freddie Mac) Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis
CN110609902A (en) * 2018-05-28 2019-12-24 华为技术有限公司 Text processing method and device based on fusion knowledge graph
CN110688489A (en) * 2019-09-09 2020-01-14 中国电子科技集团公司电子科学研究院 Knowledge graph deduction method and device based on interactive attention and storage medium
CN110852755A (en) * 2019-11-06 2020-02-28 支付宝(杭州)信息技术有限公司 User identity identification method and device for transaction scene
CN110866190A (en) * 2019-11-18 2020-03-06 支付宝(杭州)信息技术有限公司 Method and device for training neural network model for representing knowledge graph

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10496678B1 (en) * 2016-05-12 2019-12-03 Federal Home Loan Mortgage Corporation (Freddie Mac) Systems and methods for generating and implementing knowledge graphs for knowledge representation and analysis
CN110609902A (en) * 2018-05-28 2019-12-24 华为技术有限公司 Text processing method and device based on fusion knowledge graph
CN109242633A (en) * 2018-09-20 2019-01-18 阿里巴巴集团控股有限公司 A kind of commodity method for pushing and device based on bigraph (bipartite graph) network
CN110188147A (en) * 2019-05-22 2019-08-30 厦门无常师教育科技有限公司 The document entity relationship of knowledge based map finds method and system
CN110688489A (en) * 2019-09-09 2020-01-14 中国电子科技集团公司电子科学研究院 Knowledge graph deduction method and device based on interactive attention and storage medium
CN110852755A (en) * 2019-11-06 2020-02-28 支付宝(杭州)信息技术有限公司 User identity identification method and device for transaction scene
CN110866190A (en) * 2019-11-18 2020-03-06 支付宝(杭州)信息技术有限公司 Method and device for training neural network model for representing knowledge graph

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112785350A (en) * 2021-02-24 2021-05-11 深圳市慧择时代科技有限公司 Method and device for determining product vector
CN112785350B (en) * 2021-02-24 2023-09-19 深圳市慧择时代科技有限公司 Product vector determining method and device
CN117891929A (en) * 2024-03-18 2024-04-16 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm
CN117891929B (en) * 2024-03-18 2024-05-17 南京华飞数据技术有限公司 Knowledge graph intelligent question-answer information identification method of improved deep learning algorithm

Also Published As

Publication number Publication date
CN111460171B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
Singh et al. Investigating the impact of data normalization on classification performance
US8886649B2 (en) Multi-center canopy clustering
CN110032606B (en) Sample clustering method and device
US20210117733A1 (en) Pattern recognition apparatus, pattern recognition method, and computer-readable recording medium
US20100310134A1 (en) Assisted face recognition tagging
CN111460171B (en) Target user identification method and device for server
WO2014194161A2 (en) Systems and methods for performing bayesian optimization
CN107451854B (en) Method and device for determining user type and electronic equipment
CN112487489B (en) Joint data processing method and device for protecting privacy
Marques et al. Clusterdv: a simple density-based clustering method that is robust, general and automatic
Karampatziakis et al. Empirical likelihood for contextual bandits
CN113837252A (en) Clustering processing method and device
Javed et al. Multi-denoising based impulse noise removal from images using robust statistical features and genetic programming
Yang et al. Optimal clustering with bandit feedback
JP7316722B2 (en) Computational Efficiency in Symbolic Sequence Analysis Using Random Sequence Embedding
Du et al. Remasker: Imputing tabular data with masked autoencoding
Xie et al. Watermelon: a novel feature selection method based on bayes error rate estimation and a new interpretation of feature relevance and redundancy
CN109597851B (en) Feature extraction method and device based on incidence relation
Zhou et al. Weighted Subspace Fuzzy Clustering with Adaptive Projection
Peng et al. Weighted sparse simplex representation: a unified framework for subspace clustering, constrained clustering, and active learning
Zhao Analysis challenges for high dimensional data
US11841863B1 (en) Generating relaxed synthetic data using adaptive projection
EP4125010A1 (en) Adaptive learning based systems and methods for optimization of unsupervised clustering
US20230342672A1 (en) Method and system for classification and/or prediction on unbalanced datasets
Liu et al. Localizing and Amortizing: Efficient Inference for Gaussian Processes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40034045

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant