CN114064705A - User information fusion method, terminal, storage medium and system under multilayer association - Google Patents

User information fusion method, terminal, storage medium and system under multilayer association Download PDF

Info

Publication number
CN114064705A
CN114064705A CN202111216588.0A CN202111216588A CN114064705A CN 114064705 A CN114064705 A CN 114064705A CN 202111216588 A CN202111216588 A CN 202111216588A CN 114064705 A CN114064705 A CN 114064705A
Authority
CN
China
Prior art keywords
user
graph
user information
entity
channel type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111216588.0A
Other languages
Chinese (zh)
Inventor
胡嘉宏
徐亚波
李旭日
古嘉宏
苏淦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Datastory Information Technology Co ltd
Original Assignee
Guangzhou Datastory Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Datastory Information Technology Co ltd filed Critical Guangzhou Datastory Information Technology Co ltd
Priority to CN202111216588.0A priority Critical patent/CN114064705A/en
Publication of CN114064705A publication Critical patent/CN114064705A/en
Priority to PCT/CN2022/098808 priority patent/WO2023065691A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/2454Optimisation of common expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a user information fusion method, a terminal, a storage medium and a system under multilayer association, which solve the problem that the current user information fusion method can not realize the user information fusion under multilayer association, firstly, a data source of user information to be integrated is selected, the data source corresponds to a material table, channel type information is further obtained according to the association relationship in the material table, a user information graph is constructed according to the channel type information, the user information graph is divided into independent user communicating graphs, the graph relationship is expanded into a graph relationship of a two-dimensional plane from a linear material table relationship, multilayer complex association is conveniently supported, and records belonging to the same natural user are found in the complex association of a large amount of data; and on the basis of the user connected graph, an entity unique ID connected graph is further constructed by combining the historical material table, and two independent user entities in the historical material table data are fused together with the help of new association provided by the newly added user information data.

Description

User information fusion method, terminal, storage medium and system under multilayer association
Technical Field
The present invention relates to the technical field of user information fusion, and more particularly, to a method, a terminal, a storage medium, and a system for user information fusion under multi-layer association.
Background
The Client Data Platform (CDP) contains relevant Data of audiences, users or clients generated and accumulated by enterprises in the operation process. In CDPs, member (user entity) data is spread across multiple information systems/data sources of a business, such as Customer Relationship Management (CRM), POS transactions, behavioral log collection data, and so forth. The same natural user member has different unique identifiers in different information systems or channels, for example, the purchasing records of the Kyoto mall include Kyoto user ID and mobile phone, the CRM of the enterprise has a corresponding user ID, and the user IDs of the same natural user member in the Kyoto mall and the CRM are different, so that the subsequent data processing, labeling, marketing activity pushing and the like are difficult; moreover, a natural user may have multiple accounts and be marked as different users in the system, so that a unique identifier is required to identify a natural user, user data from different systems and different channels are communicated, the user data actually behind the same natural user is identified, and an entity unique ID (fused entity unique identifier) is given to the user data for subsequent operations such as data processing, user tag system construction, user operation statistical analysis, marketing activity push and the like.
3/14/2015, a method and a system for identifying user identities are disclosed in a Chinese invention patent (publication number: CN104394118A), basic information formed by user registration, including user ID, user name, Email, telephone, computer IP and the like, is used for extracting website user behavior data, information such as user ID, user name, Email, telephone number, Cookie, computer IP and the like is related in comprehensive behavior data, a user information association relation between the user ID and the user name is established and unique identification identity is given, users in a B2B website can be subjected to unified identity identification, an identity characteristic relation is established, new and old users are distinguished, user behaviors can be effectively tracked, a series of applications can be established for the users, user experience is improved, but the patent cannot process a multi-layer associated user information fusion scene, such as coupon discount purchased by a user mobile phone number A (corresponding to the user ID1), then, the coupon is purchased through a spare mobile phone number B (corresponding to the user ID2), the mobile phone number A and the mobile phone number B are originally recorded as two different user IDs, and the two user IDs can be considered to belong to the same natural user through the purchasing behavior; however, the method uses the user ID as the unique user identifier, no association can be generated for the scene, and the two pieces of user information cannot be fused.
Disclosure of Invention
In order to solve the problem that the current user information fusion method cannot realize user information fusion under multilayer association, the invention provides a user information fusion method, a terminal, a storage medium and a system under multilayer association.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a method for fusing user information under multi-layer association at least comprises the following steps:
s1, selecting data sources of user information to be integrated, wherein each data source corresponds to a material table, and determining fields capable of identifying users in the material tables and corresponding channel types of the fields to obtain channel type information;
s2, determining a vertex and an edge according to the channel type information, constructing a user information graph based on the vertex and the edge, and then splitting the user information graph into independent user connected graphs;
s3, querying historical data of entity unique IDs corresponding to each vertex of the user connected graph by using a channel type association table to obtain all entity unique IDs corresponding to the user connected graph, and further constructing the entity unique ID connected graph by using the entity unique IDs as the vertices and using the relation of a plurality of entity unique IDs in the user connected graph as edges, so that the user connected graph is associated by using the historical data;
s4, judging whether the unique ID of each user entity only appears in one user connected graph, and if so, fusing user information to each user connected graph as required; otherwise, constructing an entity unique ID connected graph, and executing the step S5;
s5, determining the entity unique IDs which are connected together in the entity unique ID connected graph, and associating the user connected graphs corresponding to the entity unique IDs which are connected together;
s6, reading entity unique IDs corresponding to all vertexes of the user connected graph, and performing duplicate removal processing on the entity unique IDs;
and S7, updating the channel type association table by using the entity unique ID after the deduplication processing.
In the technical scheme, a user supports multilayer complex association relations and not fixed simple association rules by constructing and splitting a user connection graph, the user only needs to concern pairwise association of material tables and does not need to concern how all the material tables are associated, and operation of newly added user information is supported, on the basis of the user connection graph, the user further constructs an entity unique ID connection graph by combining a channel type association table (a table capable of reflecting the mapping relation between channel type values and entity unique IDs under historical data), so that two independent user entities in the historical data can be considered to be fused together under the help of new association provided by new incremental data, the application range is not limited to specific scenes and business logic, channel type relations can be arranged according to business reality, different scenes can be adapted, and user information is fused, the method is expanded to information fusion of commodities and places.
Preferably, in the client data platform CDP of the enterprise, the user information is distributed over a plurality of data sources of the enterprise, each data source corresponding to a material table, the material table comprising: an enterprise member information table, a mall user information table and a coupon use record table; the channel types in the material list comprise: the method comprises the following steps that a user mobile phone number, a member ID, an Email, a WeChat unionID and a WeChat openID are used, and records with the same value in different material tables of channel types correspond to the same natural user; the channel type field is a combination of each channel type and a material list corresponding to each channel type.
Preferably, taking the channel type and channel type field value as a vertex attribute, and performing hash value on the vertex attribute to obtain a vertex ID, wherein the vertex ID records the attribute of a natural user on a certain channel type;
taking a plurality of channel type field values appearing in a record in a material list as edges, wherein the edges are connecting lines between vertexes and record the association of a natural user on different channel types;
connecting the vertex and the edge to construct a user information graph; and then, splitting the user information graph by using a connected component algorithm to obtain the minimum vertex ID of the user information graph where each vertex is located, and performing grouping and aggregation on the minimum vertex IDs to obtain all the vertices of each independent user connected graph, namely each independent user connected graph is obtained.
The channel type and channel type field value are used as the vertex attribute, information of a material list where the channel type and channel type field value are located does not need to be recorded, the operation is simple and easy, the linear material list relation is expanded into a two-dimensional plane graph relation, and multilayer complex association is conveniently supported.
Preferably, if the unique ID of each user entity appears in only one user connected graph, a plurality of natural users generated by the newly added user information are also independent natural users in the history material table, and the user information is fused for each user connected graph as required.
Preferably, the entity unique ID connected graph is constructed by taking each entity unique ID as a vertex and taking the relation of a plurality of entity unique IDs appearing in the same user connected graph as an edge, so that the data volume needing to participate in calculation is greatly reduced, and the calculation efficiency is effectively improved.
Preferably, the method for determining the entity unique IDs connected together in the entity unique ID connection diagram in step S5 is a connection component algorithm; and after associating the user connection graphs corresponding to the unique IDs of the connected entities, each user connection graph corresponds to a natural user.
Preferably, the deduplication processing performed on the entity unique ID in step S6 includes:
a. the user connectivity graph has no entity unique ID: the current natural user is a new user, an entity unique ID is generated based on the user connection diagram, and UUID is used to ensure the uniqueness of the entity unique ID;
b. user connectivity graph only one entity unique ID: if the new and old user information data of the current natural user belong to the same natural user, the unique ID of the old entity is used;
c. the user connectivity graph has a plurality of entity unique IDs: the new user information generates user information fusion, and only one unique ID of any entity is reserved.
The invention provides a terminal, which comprises a processor, a memory and a computer program stored on the memory, wherein the processor executes the computer program stored on the memory so as to realize the steps of the user information fusion method under the multilayer association.
The invention provides a computer storage medium, wherein computer program instructions are stored on the computer readable storage medium, and when the instructions are executed by a processor, the steps of the user information fusion method under the multilayer association are realized.
The invention also provides a user information fusion system under multilayer association, which is used for realizing the user information fusion method under multilayer association and comprises the following steps:
the channel type information acquisition module is used for selecting data sources of user information to be integrated, each data source corresponds to a material table, and fields capable of identifying users in the material tables and corresponding channel types of the fields are determined to obtain channel type information;
the user connection graph building module is used for determining a vertex and an edge according to the channel type information, building a user information graph based on the vertex and the edge, and then splitting the user information graph into independent user connection graphs;
the historical data association module is used for inquiring historical data of entity unique IDs corresponding to each vertex of the user connected graph by using the channel type association table to obtain all entity unique IDs corresponding to the user connected graph, further using the entity unique IDs as the vertices and using the relation of a plurality of entity unique IDs in the user connected graph as edges to construct the entity unique ID connected graph, and thus associating the user connected graph by using the historical data;
the judging module is used for judging whether the unique ID of each user entity only appears in one user connected graph or not, and if yes, fusing user information to each user connected graph as required; otherwise, constructing an entity unique ID connected graph;
the user connection graph association module is used for determining the entity unique IDs which are connected together in the entity unique ID connection graph and associating the user connection graphs corresponding to the entity unique IDs which are connected together;
the duplication elimination processing module is used for reading entity unique IDs corresponding to all vertexes of the user connected graph and carrying out duplication elimination processing on the entity unique IDs;
and the updating module is used for updating the channel type association table by using the entity unique ID after the deduplication processing.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a user information fusion method, a terminal, a storage medium and a system under multilayer association.A data source of user information to be integrated is selected, the data source corresponds to a material table, channel type information is further obtained according to the association relation in the material table, a user information graph is constructed according to the channel type information, then the user information graph is divided into independent user connection graphs, the linear material table relation is expanded into a two-dimensional plane graph relation, multilayer complex association is conveniently supported, and records belonging to the same natural user are found in the complex association of a large amount of data, so that the method can be used for subsequent operations such as data processing, user label system construction, user operation statistical analysis, marketing activity push and the like; on the basis of the user connected graph, the entity unique ID connected graph is further constructed by combining the historical material table, and two independent user entities in the historical material table data are fused together with the help of new association provided by the newly added user information data, so that the data volume needing to participate in calculation is greatly reduced, and the calculation efficiency is effectively improved.
Drawings
Fig. 1 is a schematic flow chart illustrating a user information fusion method under multi-layer association according to embodiment 1 of the present invention;
fig. 2 shows an independent user connectivity graph obtained by splitting a user information graph according to embodiment 1 of the present invention;
fig. 3 is a user connection diagram corresponding to history data in the history material table according to embodiment 1 of the present invention;
fig. 4 is a schematic diagram illustrating a method for searching a unique ID of a user entity in a channel type association table corresponding to each vertex of a current user connectivity graph according to embodiment 1 of the present invention;
fig. 5 is a schematic diagram illustrating association of user connectivity graphs corresponding to entity unique IDs to be connected together according to embodiment 1 of the present invention;
FIG. 6 is a diagram showing the final user connectivity proposed in embodiment 1 of the present invention;
fig. 7 is a schematic structural diagram of a user information fusion system in multi-layer association according to embodiment 3 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it will be understood by those skilled in the art that certain well-known descriptions of the figures may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
example 1
As shown in fig. 1, in this embodiment, a method for fusing user information under multi-layer association is first proposed, where the method includes:
s1, selecting data sources of user information to be integrated, wherein each data source corresponds to a material table, and determining fields capable of identifying users in the material tables and corresponding channel types of the fields to obtain channel type information;
with a client data platform CDP of an enterprise as a background, user information is distributed throughout a plurality of data sources of the enterprise, each data source corresponds to a material table, and the material table comprises: an enterprise member information table, a mall user information table and a coupon use record table; the channel types in the material list comprise: the method comprises the following steps that a user mobile phone number, a member ID, an Email, a WeChat unionID and a WeChat openID are used, and records with the same value in different material tables of channel types correspond to the same natural user; the channel type field is a combination of each channel type and a material list corresponding to each channel type. Firstly, finding out data sources needing information integration in a system, analyzing a material list corresponding to the data sources, and finding out fields capable of identifying users; a certain value of the field corresponds to one user, but one user can have a plurality of values in the field of the material list, such as ID of the coupon, one user can buy and use a plurality of coupons, but one coupon is only bought and used by one user; the field is a channel type field, such as coupon ID, Email and mobile phone number, the channel type represents a scene of a relationship between a user entity and the outside, channel type fields which can be associated between the material tables are determined, and each field set is marked as a channel type. For example, the "mobile phone number" field of the coupon use record table, the "mobile phone number" of the user information table, and the fields representing the mobile phone numbers of the consumer users in other material tables constitute a channel type. Similarly, there may be channel types such as identity cards, Email, etc.
S2, determining a vertex and an edge according to the channel type information, constructing a user information graph based on the vertex and the edge, and then splitting the user information graph into independent user connected graphs;
the same as the graph in the meaning of the data structure, the user information graph is also formed by connecting lines between vertexes, according to the definition of the channel type in the step S1, records with the same value in different material tables of the channel type should correspond to the same natural user, for example, a record with the mobile phone number 13344445555 of a coupon use record table and a record with the mobile phone number 13344445555 in a mall user information table belong to the same natural user, so that the channel type and the channel type field are taken as the vertex attribute, the vertex attribute is subjected to hash value taking to obtain a vertex ID, and the vertex ID records the attribute of one natural user on a certain channel type; the hash calculation in this process is implemented by using the currently disclosed MurMusHash algorithm, which is a conventional technique in the art and is not described herein again.
Taking a plurality of channel type field values appearing in a record in a material list as edges, wherein the edges are connecting lines between vertexes and record the association of a natural user on different channel types; in addition, the weight value can be the update time of the record, and can be used for solving the conflict of the subsequent user connected graph.
Connecting vertexes and edges to construct a user information graph, ideally, an independent user connected graph corresponds to a unique natural user, the connected graph does not have any connected edges with other user connected graphs, the user information graph is split by using a connected component algorithm, the connected component algorithm also has a relatively mature programming language which can be directly applied, such as Spark graph X and the like, the direct result of calculation obtains the minimum vertex ID of the user information graph where each vertex is located, the minimum vertex IDs are grouped and aggregated, the GROUP function is adopted to realize the operation, all vertexes of each independent user connected graph are obtained, and each independent user connected graph information (the vertexes and the corresponding edges) is also obtained.
S3, querying historical data of entity unique IDs corresponding to each vertex of the user connected graph by using a channel type association table to obtain all entity unique IDs corresponding to the user connected graph, and further constructing the entity unique ID connected graph by using the entity unique IDs as the vertices and using the relation of a plurality of entity unique IDs in the user connected graph as edges, so that the user connected graph is associated by using the historical data;
the step is a history data association process, if the method provided by the invention is executed for the first time, history association is not needed, the step can be skipped, the data source of the user information to be integrated belongs to a newly added calculation task, the mapping can be assumed to exist, the formed user connection diagram does not represent comprehensive user channel type information, and accurate user information fusion calculation can be carried out only by associating the history data;
s4, judging whether the unique ID of each user entity only appears in one user connected graph, and if so, fusing user information to each user connected graph as required; otherwise, constructing an entity unique ID connected graph, and executing the step S5; if the unique ID of each user entity only appears in one user connected graph, a plurality of natural users generated by the newly added user information are also independent natural users in the historical material table, so that the user information only needs to be fused into each user connected graph according to needs in the follow-up process; if the plurality of user connected graphs have the same user entity unique ID, a plurality of natural users generated by the data of the newly added user information belong to the same natural user in the old data, and can be fused, and the plurality of connected graphs having the same entity unique ID need to be connected together, and can be realized by constructing the connected graph having the entity unique ID.
The entity unique ID connected graph is constructed by taking each entity unique ID as a vertex and taking the relation of a plurality of entity unique IDs appearing in the same user connected graph as an edge, so that the data volume needing to participate in calculation is greatly reduced, and the calculation efficiency is effectively improved.
S5, determining the entity unique IDs which are connected together in the entity unique ID connected graph, and associating the user connected graphs corresponding to the entity unique IDs which are connected together;
s6, reading entity unique IDs corresponding to all vertexes of the user connected graph, and performing duplicate removal processing on the entity unique IDs; the method for determining the entity unique IDs connected together in the entity unique ID connection graph in step S5 is also a connected component algorithm, and belongs to one of graph algorithms. If any two vertexes in the graph G are connected, the graph G is called a connected graph, otherwise, the graph G is called a non-connected graph. After associating the user connected graphs corresponding to the unique IDs of the connected entities, each user connected graph corresponds to a natural user;
when the duplicate removal processing is carried out on the entity unique ID, the method comprises the following steps:
a. the user connectivity graph has no entity unique ID: the current natural user is a new user, an entity unique ID is generated based on the user connection diagram, and UUID is used to ensure the uniqueness of the entity unique ID; UUID is an abbreviation of universally unique identifier, which is a relatively conventional technology in the prior art.
b. User connectivity graph only one entity unique ID: if the new and old user information data of the current natural user belong to the same natural user, the unique ID of the old entity is used;
c. the user connectivity graph has a plurality of entity unique IDs: the new user information generates user information fusion, only one entity unique ID is reserved, and the first entity is generally reserved because the specific storage has small influence on the calculation result.
S7, updating a channel type association table by using the entity unique ID after the deduplication processing;
specifically, traversing all vertexes, wherein attributes of the vertexes include channel types and information of values of the channel types, a relationship between the channel types and the entity unique ID is stored in a channel type association table, and a table name of the association table is channel _ entry _ relationship _ < channel type >, such as channel _ entry _ relationship _ mobile phone number, and the structure is as follows:
channel type field value Entity unique ID
13344445555 33f5ec7daf694f9ca09c66767272d415
…… ……
In summary, by constructing and splitting a user connection graph, a multi-layer complex association relationship is supported, rather than a fixed simple association rule, a user only needs to concern about pairwise association of material tables, does not need to concern about how all the material tables are associated, and supports operation of newly added user information.
The method provided in this embodiment is further described below with reference to specific implementation scenarios, which are simplified to be as follows, taking a CDP system as an example: the user picks up the coupon at the membership system and then uses the coupon to shop in the kyoto mall.
Assume that the data source contains only 3 material tables: a coupon use record table, a Jingdong user information table and an internal member information table.
The association relationship of the above material tables is analyzed and represented by a channel type and channel type field table, which is specifically shown in table 1.
TABLE 1
Figure BDA0003310888690000091
The coupon use record table has records shown in table 2.
TABLE 2
Mobile phone number Coupon number Member ID ……
13344445555 T00003 C001
13455556666 T00004 C001
13344445555 T00005 C001
13566667777 T00006 C002
The kyoton user information table has records shown in table 3.
TABLE 3
Kyoto user ID Mobile phone number ……
JD001 13344445555
JD002 13455556666
JD003 13566667777
The internal member information table has records shown in table 4.
TABLE 4
Figure BDA0003310888690000092
Figure BDA0003310888690000101
All channel types are: "mobile phone number", "member ID", and "kyoto user ID".
And constructing a user information graph according to the sorted channel type information, and splitting the user information graph into independent user connection graphs. The vertex of the user information graph is calculated according to the material list information, and as shown in table 5, for convenience of example description, the identification degree of the example is improved, and the actual vertex attribute hash value is referred to by a simpler numerical value.
TABLE 5
Vertex ID Vertex attributes
1111 [ cell phone, 13344445555]
2222 [ cell phone, 13455556666]
3333 [ cell phone _13566667777]
4444 [ Member ID _, C001]
5555 [ Member ID _, C002]
6666 [ Kyoto user ID, JD001]
7777 [ Kyoto user ID, JD002]
8888 [ Kyoto user ID, JD003]
The minimum vertex ID of the connected graph where each vertex is located is obtained by calculation through a connected component algorithm, and the calculation result is shown in Table 6:
TABLE 6
Vertex ID Minimum vertex ID of connected graph Vertex attributes
1111 1111 [ cell phone, 13344445555]
2222 1111 [ cell phone, 13455556666]
3333 3333 [ cell phone, 13566667777]
4444 1111 [ Member ID, C001]
5555 3333 [ Member ID, C002]
6666 1111 [ Kyoto user ID, JD001]
7777 1111 [ Kyoto user ID, JD002]
8888 3333 [ Kyoto user ID, JD003]
According to the above results, the user connectivity graph is obtained as shown in fig. 2, and as can be seen from fig. 2, according to the column of "the minimum vertex ID of the connectivity graph, the entire user information graph is split into two independent user connectivity graphs, which respectively correspond to two natural users, including the user connectivity graph 1 and the user connectivity graph 2.
If the calculation is not the first calculation, the historical data in the historical material table is calculated, a channel type association table is generated, and the historical data is a coupon use record table with records shown in table 7:
TABLE 7
Mobile phone number Coupon number Member ID ……
13566667777 T00001 C001
13344445555 T00002 C002
The last calculation result is shown in fig. 3, and two unique IDs for natural use respectively corresponding to the records shown in table 7 are obtained, so that the content of the "mobile phone number" channel type association table is shown in table 8.
TABLE 8
Mobile phone number Entity unique ID
13566667777 5109b4d4e0e7411c992da7d5ea112538
13344445555 73c823904daa4c20a4c2035132d0e7c2
The contents of the "Member ID" channel type association table are shown in Table 9.
TABLE 9
Member ID Entity unique ID
C001 5109b4d4e0e7411c992da7d5ea112538
C002 73c823904daa4c20a4c2035132d0e7c2
At this time, all vertex attributes of each user connected graph are inquired into entity unique IDs in the corresponding channel type association table and recorded as attributes of the user connected graph as shown in fig. 4, an entity unique ID connected graph is constructed according to the entity unique IDs, connections are generated corresponding to the user connected graph, and a new user connected graph is obtained as shown in fig. 5.
After the historical data in the historical material table is associated, because some vertexes of the new two user connected graphs belong to the same user connected graph in the historical data, the two user connected graphs can be associated together, as shown in fig. 5, all the data form the same natural user at present, and after the unique ID of the entity owned by the natural user is removed, the two user connected graphs comprise:
5109b4d4e0e7411c992da7d5ea112538
73c823904daa4c20a4c2035132d0e7c2
then, in the calculation, due to the new user coupon use record, two independent users are associated between the old user information, the old user information is the same user, and user information fusion occurs.
Only one unique user ID is reserved at this time, such as reservation: 5109b4d4e0e7411c992da7d5ea112538, as shown in figure 6.
And finally, updating the channel type association table to facilitate subsequent calculation. The updated channel type association table includes a "mobile phone number" channel type association table shown in table 10, a "member ID" channel type association table shown in table 11, and a "kyoto user ID" channel type association table shown in table 12,
the subsequent use of the CDP data is based on the entity unique ID. If the user portrait is calculated, the entity unique ID is calculated to obtain the user (entity) portrait with each entity unique ID; subsequently, if marketing activities need to be conducted in a member system, people can be screened according to the user portrait to obtain a series of entity unique IDs corresponding to the user portrait people, and then the corresponding Jingdong user IDs can be searched according to an association table of a Jingdong user ID channel, so that touch operations such as marketing activity pushing are conducted.
Example 2
The invention provides a terminal, which comprises a processor, a memory and a computer program stored on the memory, wherein the processor executes the computer program stored on the memory to implement the steps of the user information fusion method under the multilayer association described in embodiment 1, wherein the memory can be a magnetic disk, a flash memory or any other non-volatile storage medium, the processor is connected with the memory and can be implemented as one or more integrated circuits, specifically, a microprocessor or a microcontroller, and when the computer program stored on the memory is executed, the user information fusion under the multilayer association is implemented.
The present invention provides a computer storage medium, where computer program instructions are stored on the computer readable storage medium, and when the instructions are executed by a processor, the steps of the user information fusion method under multi-layer association described in embodiment 1 are implemented.
Example 2
Referring to fig. 7, the present invention further provides a system for fusing user information under multilayer association, where the system is configured to implement the method for fusing user information under multilayer association described in embodiment 1, and the method includes:
the channel type information acquisition module is used for selecting data sources of user information to be integrated, each data source corresponds to a material table, and the channel type field which identify the incidence relation in the material tables are determined to obtain channel type information;
the user connection graph building module is used for determining a vertex and an edge according to the channel type information, building a user information graph based on the vertex and the edge, and then splitting the user information graph into independent user connection graphs;
the historical data association module is used for collecting a historical material list, determining the mapping between each channel type in the historical material list and the unique ID of the user entity, forming an initial channel type association list, and further searching the unique ID of the user entity in the channel type association list corresponding to each vertex of the user connected graph;
the judging module is used for judging whether the unique ID of each user entity only appears in one user connected graph or not, and if yes, fusing user information to each user connected graph as required; otherwise, constructing an entity unique ID connected graph;
the user connection graph association module is used for determining the entity unique IDs which are connected together in the entity unique ID connection graph and associating the user connection graphs corresponding to the entity unique IDs which are connected together;
the duplication elimination processing module is used for reading entity unique IDs corresponding to all vertexes of the user connected graph and carrying out duplication elimination processing on the entity unique IDs;
and the updating module is used for updating the channel type association table by using the entity unique ID after the deduplication processing.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A method for fusing user information under multi-layer association is characterized by at least comprising the following steps:
s1, selecting data sources of user information to be integrated, wherein each data source corresponds to a material table, and determining fields capable of identifying users in the material tables and corresponding channel types of the fields to obtain channel type information;
s2, determining a vertex and an edge according to the channel type information, constructing a user information graph based on the vertex and the edge, and then splitting the user information graph into independent user connected graphs;
s3, querying historical data of entity unique IDs corresponding to each vertex of the user connected graph by using a channel type association table to obtain all entity unique IDs corresponding to the user connected graph, and further constructing the entity unique ID connected graph by using the entity unique IDs as the vertices and using the relation of a plurality of entity unique IDs in the user connected graph as edges, so that the historical data is used for associating the user connected graph;
s4, judging whether the unique ID of each user entity only appears in one user connected graph, and if so, fusing user information to each user connected graph as required; otherwise, constructing an entity unique ID connected graph, and executing the step S5;
s5, determining the entity unique IDs which are connected together in the entity unique ID connected graph, and associating the user connected graphs corresponding to the entity unique IDs which are connected together;
s6, reading entity unique IDs corresponding to all vertexes of the user connected graph, and performing duplicate removal processing on the entity unique IDs;
and S7, updating the channel type association table by using the entity unique ID after the deduplication processing.
2. The method according to claim 1, wherein in the client data platform CDP of the enterprise, the user information is distributed over a plurality of data sources of the enterprise, each data source corresponding to a material table, and the material table comprises: an enterprise member information table, a mall user information table and a coupon use record table; the channel types in the material list comprise: the method comprises the following steps that a user mobile phone number, a member ID, an Email, a WeChat unionID and a WeChat openID are used, and records with the same value in different material tables of channel types correspond to the same natural user; the channel type field is a combination of each channel type and a material list corresponding to each channel type.
3. The user information fusion method under multilayer association according to claim 2, wherein in step S2, taking "channel type + channel type field value" as a vertex attribute, performing hash value on the vertex attribute to obtain a vertex ID, where the vertex ID records an attribute of a natural user on a certain channel type;
taking a plurality of channel type field values appearing in a record in a material list as edges, wherein the edges are connecting lines between vertexes and record the association of a natural user on different channel types;
connecting the vertex and the edge to construct a user information graph; and then, splitting the user information graph by using a connected component algorithm to obtain the minimum vertex ID of the user information graph where each vertex is located, and performing grouping and aggregation on the minimum vertex IDs to obtain all the vertices of each independent user connected graph, namely each independent user connected graph is obtained.
4. The method for fusing user information under multi-layer association according to claim 3, wherein in step S4, if the unique ID of each user entity appears in only one user connectivity graph, the plurality of natural users generated by the new user information are also independent natural users in the history material table, and the user information is fused for each user connectivity graph as required.
5. The method as claimed in claim 4, wherein the entity unique ID connection graph is constructed by using each entity unique ID as a vertex and using a relationship of a plurality of entity unique IDs appearing in the same user connection graph as an edge.
6. The method for fusing user information under multi-layer association according to claim 5, wherein the method for determining the entity unique IDs linked together in the entity unique ID connected graph in step S5 is a connected component algorithm; and after associating the user connection graphs corresponding to the unique IDs of the connected entities, each user connection graph corresponds to a natural user.
7. The method for fusing user information under multi-layer association according to claim 6, wherein the step S6 of performing de-duplication processing on the entity unique ID includes:
a. the user connectivity graph has no entity unique ID: the current natural user is a new user, an entity unique ID is generated based on the user connection diagram, and UUID is used to ensure the uniqueness of the entity unique ID;
b. user connectivity graph only one entity unique ID: if the new and old user information data of the current natural user belong to the same natural user, the unique ID of the old entity is used;
c. the user connectivity graph has a plurality of entity unique IDs: the new user information generates user information fusion, and only one unique ID of any entity is reserved.
8. A terminal, comprising a processor, a memory and a computer program stored in the memory, wherein the processor executes the computer program stored in the memory to implement the steps of the method for fusing user information under multi-layer association according to any one of claims 1 to 7.
9. A computer storage medium, characterized in that the computer readable storage medium stores computer program instructions, which when executed by a processor, implement the steps of the method for fusing user information under multi-layer association according to any one of claims 1 to 7.
10. A system for fusing user information under multi-layer association, the system being configured to implement the method for fusing user information under multi-layer association according to claim 1, the method comprising:
the channel type information acquisition module is used for selecting data sources of user information to be integrated, each data source corresponds to a material table, and fields capable of identifying users in the material tables and corresponding channel types of the fields are determined to obtain channel type information;
the user connection graph building module is used for determining a vertex and an edge according to the channel type information, building a user information graph based on the vertex and the edge, and then splitting the user information graph into independent user connection graphs;
the historical data association module is used for inquiring historical data of entity unique IDs corresponding to each vertex of the user connected graph by using the channel type association table to obtain all entity unique IDs corresponding to the user connected graph, further using the entity unique IDs as the vertices and using the relation of a plurality of entity unique IDs in the user connected graph as edges to construct the entity unique ID connected graph, and thus associating the user connected graph by using the historical data;
the judging module is used for judging whether the unique ID of each user entity only appears in one user connected graph or not, and if yes, fusing user information to each user connected graph as required; otherwise, constructing an entity unique ID connected graph;
the user connection graph association module is used for determining the entity unique IDs which are connected together in the entity unique ID connection graph and associating the user connection graphs corresponding to the entity unique IDs which are connected together;
the duplication elimination processing module is used for reading entity unique IDs corresponding to all vertexes of the user connected graph and carrying out duplication elimination processing on the entity unique IDs;
and the updating module is used for updating the channel type association table by using the entity unique ID after the deduplication processing.
CN202111216588.0A 2021-10-19 2021-10-19 User information fusion method, terminal, storage medium and system under multilayer association Pending CN114064705A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111216588.0A CN114064705A (en) 2021-10-19 2021-10-19 User information fusion method, terminal, storage medium and system under multilayer association
PCT/CN2022/098808 WO2023065691A1 (en) 2021-10-19 2022-06-15 User information fusion method and system under multilayer association, and terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111216588.0A CN114064705A (en) 2021-10-19 2021-10-19 User information fusion method, terminal, storage medium and system under multilayer association

Publications (1)

Publication Number Publication Date
CN114064705A true CN114064705A (en) 2022-02-18

Family

ID=80234917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111216588.0A Pending CN114064705A (en) 2021-10-19 2021-10-19 User information fusion method, terminal, storage medium and system under multilayer association

Country Status (2)

Country Link
CN (1) CN114064705A (en)
WO (1) WO2023065691A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676288A (en) * 2022-03-17 2022-06-28 北京悠易网际科技发展有限公司 ID pull-through method and device
WO2023065691A1 (en) * 2021-10-19 2023-04-27 广州数说故事信息科技有限公司 User information fusion method and system under multilayer association, and terminal and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501726B (en) * 2023-06-20 2023-09-29 中国人寿保险股份有限公司上海数据中心 Information creation cloud platform data operation system based on GraphX graph calculation
CN117591705B (en) * 2024-01-19 2024-05-24 北京志翔科技股份有限公司 Sub-table association method and device based on graph search

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1843258A1 (en) * 2006-04-06 2007-10-10 Microsoft Corporation Modeling data from disparate data sources
CN107193894B (en) * 2017-05-05 2020-06-16 北京星选科技有限公司 Data processing method, individual identification method and related device
CN107577787B (en) * 2017-09-15 2020-02-07 广东万丈金数信息技术股份有限公司 Method and system for storing associated data information
US20190236597A1 (en) * 2018-01-26 2019-08-01 Walmart Apollo, Llc Systems and methods for associating a user's shopping experiences across multiple channels
CN108322473B (en) * 2018-02-12 2020-05-01 京东数字科技控股有限公司 User behavior analysis method and device
CN110543586B (en) * 2019-09-04 2022-11-15 北京百度网讯科技有限公司 Multi-user identity fusion method, device, equipment and storage medium
CN114064705A (en) * 2021-10-19 2022-02-18 广州数说故事信息科技有限公司 User information fusion method, terminal, storage medium and system under multilayer association

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023065691A1 (en) * 2021-10-19 2023-04-27 广州数说故事信息科技有限公司 User information fusion method and system under multilayer association, and terminal and storage medium
CN114676288A (en) * 2022-03-17 2022-06-28 北京悠易网际科技发展有限公司 ID pull-through method and device
CN114676288B (en) * 2022-03-17 2024-06-28 北京悠易网际科技发展有限公司 ID pull-through method and device

Also Published As

Publication number Publication date
WO2023065691A1 (en) 2023-04-27

Similar Documents

Publication Publication Date Title
CN114064705A (en) User information fusion method, terminal, storage medium and system under multilayer association
KR102114765B1 (en) How to discover the social account of the target object, server and storage media
CN110472068B (en) Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph
CN111459985B (en) Identification information processing method and device
CN107451831B (en) Task pushing method and device and storage medium
CN105893526A (en) Multi-source data fusion system and method
CN104933049A (en) Method and system for generating digital human
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
US10467636B2 (en) Implementing retail customer analytics data model in a distributed computing environment
CN110929105B (en) User ID (identity) association method based on big data technology
US20210081171A1 (en) Effectively fusing database tables
CN112506925A (en) Data retrieval system and method based on block chain
CN113722520A (en) Graph data query method and device
CN105678323A (en) Image-based-on method and system for analysis of users
CN110990403A (en) Business data storage method, system, computer equipment and storage medium
US11797487B2 (en) Maintaining stable record identifiers in the presence of updated data records
Sreemathy et al. Data integration and ETL: a theoretical perspective
CN110705297A (en) Enterprise name-identifying method, system, medium and equipment
CN117235285B (en) Method and device for fusing knowledge graph data
CN102193988A (en) Method and system for retrieving node data in graphic database
WO2016119276A1 (en) Large-scale object recognition method based on hadoop frame
CN116186286A (en) International logistics information recommendation method, system and medium based on enterprise knowledge graph
CN102193986B (en) Method of implementing online transaction in graphic database
JP6457290B2 (en) Method for pruning a graph, non-transitory computer-readable storage medium storing instructions for causing a computer to perform the method for pruning the graph, and a computer system for pruning a graph
CN107705135A (en) A kind of method that potential commercial value is evaluated based on company&#39;s storage contact data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination