CN116362737A - Account clustering method and device, computer readable storage medium and terminal - Google Patents

Account clustering method and device, computer readable storage medium and terminal Download PDF

Info

Publication number
CN116362737A
CN116362737A CN202310625405.3A CN202310625405A CN116362737A CN 116362737 A CN116362737 A CN 116362737A CN 202310625405 A CN202310625405 A CN 202310625405A CN 116362737 A CN116362737 A CN 116362737A
Authority
CN
China
Prior art keywords
account
account information
preliminary
clustered
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310625405.3A
Other languages
Chinese (zh)
Other versions
CN116362737B (en
Inventor
方园
宋向平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Shuyun Information Technology Co ltd
Original Assignee
Hangzhou Shuyun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Shuyun Information Technology Co ltd filed Critical Hangzhou Shuyun Information Technology Co ltd
Priority to CN202310625405.3A priority Critical patent/CN116362737B/en
Publication of CN116362737A publication Critical patent/CN116362737A/en
Application granted granted Critical
Publication of CN116362737B publication Critical patent/CN116362737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/10Payment architectures specially adapted for electronic funds transfer [EFT] systems; specially adapted for home banking systems
    • G06Q20/102Bill distribution or payments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An account clustering method and device, a computer readable storage medium and a terminal, wherein the method comprises the following steps: determining a plurality of account information to be clustered; preliminary grouping is carried out on account information to be clustered to obtain a plurality of preliminary account groups, wherein account information contained in each preliminary account group belongs to the same user; performing intra-group pairing on account information in at least a part of the preliminary account groups, wherein each preliminary account group subjected to intra-group pairing obtains one or more corresponding account pairs; inputting each account pair into a preset graph calculation model to generate an account association relationship graph; splitting the account association relation graph to obtain a plurality of initialized account association subgraphs, and then performing iterative operation until no connection relation exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining clustered account association subgraphs. By adopting the scheme, more accurate and complete account clustering results can be obtained.

Description

Account clustering method and device, computer readable storage medium and terminal
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and apparatus for account clustering, a computer readable storage medium, and a terminal.
Background
With the development of internet technology and the rise of e-commerce platforms, the same natural person (user) often adopts different member accounts (or identity accounts) in online or offline business transaction with different business service parties (e.g., merchants or shops of different brands). For example, for a user a, when he enters or places an order at online store a1 of e-commerce platform a, the first member account is used; when the online store B1 of the e-commerce platform B enters a meeting or places a bill, if the second member account … can accurately determine the association relationship between different identity accounts, that is, based on that a plurality of different identity accounts are accurately positioned to a user belonging together, the online store B1 can assist the business service side to perform multi-channel operation and marketing activities, so that the problem of data island can be solved.
In the prior art, different identity accounts are generally associated or clustered according to information such as equipment, communication numbers, co-occurring geographic positions and the like commonly used by different business data (for example, transaction orders signed by users and different merchants), and an obtained account association result or clustering result is a set of a plurality of identity accounts belonging to the same user.
However, the limitation of the above prior art is that after obtaining a plurality of clusters, the account correlation between different clusters cannot be identified. For example, for the account cluster Q1 obtained according to the same communication number, not only the account information contained in Q1 (or Q2) belongs to the same user, but also the account information between Q1 and Q2 may have a correlation with the account cluster Q2 obtained based on the co-occurrence geographic location (i.e., all the account information of Q1 and Q2 belong to the same user). However, the existing clustering scheme cannot identify the relevance, so that the accuracy and reliability are required to be improved.
Disclosure of Invention
The technical problem solved by the embodiment of the invention is how to obtain more accurate and complete account clustering results.
In order to solve the technical problems, an embodiment of the present invention provides an account clustering method, including the following steps: determining a plurality of account information to be clustered; preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, wherein the account information contained in each preliminary account group belongs to the same user; performing intra-group pairing on account information in at least a part of the preliminary account groups, wherein each preliminary account group subjected to intra-group pairing obtains one or more corresponding account pairs; inputting the obtained account pairs into a preset graph calculation model to generate an account association relationship graph; splitting the account association relationship graph to obtain a plurality of initialization account association subgraphs, and then carrying out iterative operation based on each initialization account association subgraph until no connection relationship exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining clustered account association subgraphs, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relationship are combined into a single account association subgraph; the nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
Optionally, the determining the plurality of account information to be clustered includes: acquiring a plurality of business data from different business platforms, wherein each business data comprises a main identity of a user to which each business data belongs, and each business platform comprises a plurality of off-line entity stores and/or a plurality of on-line virtual stores; and for each service data, extracting the main identity of the user belonging to the service data, and taking the extracted main identities as the account information to be clustered.
Optionally, the service data is selected from: trade order data, member meeting data, interaction data.
Optionally, each service data further includes one or more secondary identities of the users to which each service data belongs; preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, and the method comprises the following steps: and respectively determining service data of account information sources to be clustered, and dividing the account information extracted from each service data containing the same secondary identity into a group to obtain a plurality of preliminary account groups.
Optionally, the secondary identity is selected from: communication number, social software account number, identity of service platform.
Optionally, performing intra-group pairing on account information in at least a part of the preliminary account number group includes: and for at least a part of the preliminary account groups, selecting one piece of account information to be paired from each preliminary account group, and forming paired account pairs by the rest account information in the preliminary account groups and the account information to be paired respectively.
Optionally, the graph calculation model is a Spark-graph model.
Optionally, the merging the multiple account related subgraphs with the nodes having the connection relationship into a single account related subgraph includes: and merging the plurality of account related subgraphs with the nodes with the connection relationship by adopting a graph traversal algorithm pregel so as to determine the plurality of account related subgraphs.
Optionally, after obtaining the clustered account association subgraphs, the method further includes: and generating a unique identity identifier OneID of the user to which the account associated subgraph belongs for each clustered account associated subgraph.
The embodiment of the invention also provides an account clustering device, which comprises: the account information to be clustered determining module is used for determining a plurality of account information to be clustered; the primary grouping module is used for carrying out primary grouping on the account information to be clustered to obtain a plurality of primary account groups, wherein the account information contained in each primary account group belongs to the same user; the intra-group pairing module is used for performing intra-group pairing on account information in at least a part of the primary account groups, and each primary account group performing intra-group pairing obtains one or more corresponding account pairs; the diagram generating module is used for inputting the obtained account pairs into a preset diagram calculation model so as to generate an account association relation diagram; the clustering module is used for splitting the account association relation graph to obtain a plurality of initialization account association subgraphs, then carrying out iterative operation on the basis of each initialization account association subgraph until no connection relation exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining each clustered account association subgraph, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relation are combined into a single account association subgraph; the nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being run by a processor, performs the steps of the account clustering method.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the account clustering method when running the computer program.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the account information to be clustered is initially grouped (or called primary clustering) so as to initially divide each account information belonging to the same user into a group; then, carrying out intra-group pairing on the preliminary grouping result to obtain paired account pairs; and performing secondary clustering by adopting a graph calculation model and an iterative operation method to obtain clustered account association subgraphs (namely, final account clustering results). Compared with the existing account clustering method which generally clusters based on commonly used equipment, communication numbers, commonly-occurring geographic positions and other information, whether account information in different obtained clusters belongs to the same user or not cannot be determined. Therefore, the method can identify the relevance of the account information in different preliminary clusters (namely whether the account information belongs to the same user) based on the iterative operation of the graph so as to optimize the preliminary clustering result, and can expand the number of the accounts in a single account cluster on the premise that the account information to be clustered is determined, so that a more accurate and complete account clustering result is obtained.
Further, the determining a plurality of account information to be clustered includes: acquiring a plurality of business data from different business platforms, wherein each business data comprises a main identity of a user to which each business data belongs, and each business platform comprises a plurality of off-line entity stores and/or a plurality of on-line virtual stores; and for each service data, extracting the main identity of the user belonging to the service data, and taking the extracted main identities as the account information to be clustered. Compared with the prior art that the account information obtained based on a single channel (for example, a single service platform) is generally clustered, the embodiment of the invention clusters based on the account information obtained by a plurality of service platforms, and the obtained clustering result can realize the cross-channel user identity positioning, that is, a plurality of account information belonging to the same user and contained in each account cluster are sourced from different platforms. Thus, the adoption of the embodiment facilitates the business service body to realize the cross-platform, multi-channel operation and marketing activities of the clients or consumers.
Drawings
FIG. 1 is a flowchart of an account clustering method in an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an account clustering device in an embodiment of the present invention.
Detailed Description
In order to make the above objects, features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
Referring to fig. 1, fig. 1 is a flowchart of an account clustering method in an embodiment of the present invention. The method may include steps S11 to S15:
step S11: determining a plurality of account information to be clustered;
step S12: preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, wherein the account information contained in each preliminary account group belongs to the same user;
step S13: performing intra-group pairing on account information in at least a part of the preliminary account groups, wherein each preliminary account group subjected to intra-group pairing obtains one or more corresponding account pairs;
step S14: inputting the obtained account pairs into a preset graph calculation model to generate an account association relationship graph;
step S15: splitting the account association relation graph to obtain a plurality of initialization account association subgraphs, and then carrying out iterative operation based on each initialization account association subgraph until no connection relation exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining clustered account association subgraphs, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relation are combined into a single account association subgraph.
The nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
In a specific implementation of step S11, the determining a plurality of account information to be clustered may include: acquiring a plurality of business data from different business platforms, wherein each business data comprises a main identity of a user to which each business data belongs, and each business platform comprises a plurality of off-line entity stores and/or a plurality of on-line virtual stores; and for each service data, extracting the main identity of the user belonging to the service data, and taking the extracted main identities as the account information to be clustered.
In particular, the business platform may be selected from, for example, different forms of e-commerce platforms including, but not limited to, an online shopping platform, an online meal ordering platform, an online financial business platform, an online medical services platform, an online educational training platform, and various online life/entertainment services platforms, and the like. The form of the e-commerce platform may include: application (APP), applet, public number, website, etc. For another example, the service platform may also be an off-line entity service platform including, but not limited to, an off-line shopping mall, an off-line dining platform, an off-line financial service center, an off-line medical facility, an off-line educational training facility, an off-line beauty/hair/health maintenance platform, and the like.
Without limitation, the traffic data may be selected from: trade order data, member enrollment data, interaction data (e.g., user login data, lesson selection data, comment data, etc.). The service data includes a main identity of the user/client (or referred to as a service object), where the main identity is usually a member identity, and may be used to uniquely identify the identity in the process of interaction between the service object and the service platform.
For example, after the customer's first order is placed from the online shopping platform A, trade order data is created or generated in which at least the customer's first primary identity (typically an order identity account or member account) should be included. In addition, one or more secondary identities (e.g., cell phone number, mailbox number, and common social software account number, etc.) and other transaction related information may be included.
As another example, after the customer B becomes a member of the offline beauty institution B, member entry data (e.g., a member agreement) is created or generated, and at least a primary identity (typically a member account number) of the customer B should be included in the member entry data. In addition, one or more secondary identities (e.g., cell phone number, mailbox number, and common social software account number, etc.) and other member entry-related information may be included.
In the embodiment of the invention, compared with the prior art that the account information obtained based on a single channel (for example, a single service platform) is clustered, the embodiment of the invention clusters based on the account information obtained by a plurality of service platforms, and the obtained clustering result can realize the cross-channel user identity positioning, that is, a plurality of account information belonging to the same user and contained in each account cluster are sourced from different platforms. Thus, the adoption of the embodiment facilitates the business service body to realize the cross-platform, multi-channel operation and marketing activities of the clients or consumers.
It should be noted that, besides being able to be derived from different service data, the account information to be clustered may also be extracted from other data sources including user account information, for example, network traffic data.
In specific implementation, the flow data in the preset network range can be obtained from a total network interface corresponding to the preset network range in a data packet grabbing manner, wherein the total network interface comprises an internal network total port or an external network total port in the preset network range, and a switch or an adapter for communicating the preset network range with an external network is arranged between the internal network total port and the external network total port. For example, a data backup device or program is set in the total network interface to backup the traffic data passing through the total interface, so as to obtain the traffic data in the network range; and then extracting account information of different users from the flow data, and taking the account information as the account information to be clustered.
Specifically, in the process of network data transmission, keywords used for identifying account information of various communication software, social software and websites can be saved; if the stored keyword exists in the flow data, the account information can be determined.
In the implementation of step S12, the account information to be clustered is initially grouped to obtain a plurality of preliminary account groups, where the account information included in each preliminary account group belongs to the same user.
Further, in the step S12, the preliminary grouping is performed on the account information to be clustered to obtain a plurality of preliminary account groups, including: respectively determining service data of account information sources to be clustered, and dividing account information extracted from each service data containing the same secondary identity into a group to obtain a plurality of preliminary account groups; wherein each business data also contains one or more secondary identities of the users to which the business data belong.
The secondary identity may include, but is not limited to, a communication number (e.g., a mobile phone number, a mailbox number), a common social software account number, an identity of a service platform, and the like.
For example, the service data a includes a main identity id_a of a user and a communication number m1; the service data B contains the primary identity id_b of a certain user and also contains the communication number m1. Since the service data a and the service data B include the same communication number m1, the primary id_a included in the service data a and the primary id_b included in the service data B can be considered to belong to the same user.
Specifically, the "identity of a service platform" may refer to an identity (may be referred to as a "primary identity") given to a client by a certain service platform. Since the business platform may include multiple stores, each customer may have a member identification (which may be referred to as a "secondary identification") at each store. Thus, each "primary identity" may correspond to a plurality of "secondary identities".
It should be noted that, in the specific implementation, the preliminary grouping may also be performed in other suitable manners according to the source of the account information to be clustered. For example, as described in the foregoing step S12, the account information to be clustered is extracted from the traffic data, and the preliminary grouping of the account information to be clustered may include: and respectively determining flow data of each account information source to be clustered, and dividing the account information extracted from each flow data containing the same keywords (such as IP address, communication number and social software account) into a group to obtain a plurality of preliminary account groups.
In the implementation of step S13, the account information in at least a part of the primary account groups is subjected to intra-group pairing, and each of the primary account groups subjected to intra-group pairing obtains one or more corresponding account pairs.
Further, in the step S13, performing intra-group pairing on account information in at least a part of the preliminary account number groups includes: and for at least a part of the preliminary account groups, selecting one piece of account information to be paired from each preliminary account group, and forming paired account pairs by the rest account information in the preliminary account groups and the account information to be paired respectively.
The at least one portion of the preliminary account number groups may be a preliminary account number group selected from the plurality of preliminary account number groups obtained by performing the preliminary grouping in step S12, where the preliminary account number group includes at least two account number information (i.e., the account number information is greater than 1).
The selecting one piece of account information to be paired from each preliminary account group may specifically include: randomly selecting one account information from each preliminary account group as the account information to be paired; or, the account number may be identified for the account information in each preliminary account group, and then the account information with the preset account number is selected as the account information to be paired.
For example, account information included in a certain preliminary account group includes: ID_1, ID_2, ID_3 and ID_4, and taking ID_1 as the account information to be paired through random selection, the result of intra-group pairing on the preliminary account group is as follows: account number pair 1 (id_1 and id_2), account number pair 2 (id_1 and id_3), account number pair 3 (id_1 and id_4).
In the implementation of step S14, the obtained account pairs are input into a preset graph calculation model to generate an account association relationship graph.
The account association relationship graph is a graph data structure, and is a mesh data structure composed of a vertex (or node) set (vertex) and a relationship set (edge) between vertices. Each vertex or node user of the account association relationship graph indicates the account information (the nodes are in one-to-one correspondence with the account information), and each node with a connection relationship (or an edge) is used for indicating the account information belonging to the same user. For example, if two nodes in the graph have one connecting edge, this means that account information indicated by the two nodes respectively belongs to the same user.
In a specific implementation, the graph calculation model may be a Spark-graph model. But is not limited thereto, other graph calculation models may be adopted that can achieve the same or similar functions.
In the implementation of step S15, the specific method for splitting the account association relationship graph to obtain the plurality of initialized account association subgraphs may include: randomly splitting the account association relation graph into a first preset number of initialization association subgraphs; or, identifying node serial numbers for all nodes in the account association relation graph, and forming a sub-graph for every second preset number of nodes according to the sequence from the node serial numbers to the large number to obtain the plurality of initialization association sub-graphs.
The specific values of the first preset number and the second preset number may be set in combination with actual needs, which is not limited in the embodiment of the present invention.
It can be understood that in each clustered account association subgraph, the account information indicated by each node belongs to the same user, that is, the account information set formed by the account information indicated by each node is used as an account cluster.
Further, in the step S15, merging the plurality of account related subgraphs with the nodes having the connection relationship into a single account related subgraph includes: and merging the plurality of account related subgraphs with the nodes with the connection relationship by adopting a graph traversal algorithm pregel so as to determine the plurality of account related subgraphs.
In the embodiment of the invention, compared with the existing account clustering method which generally clusters based on commonly used equipment, communication numbers, commonly-occurring geographic positions and other information, whether account information in different obtained clusters belongs to the same user or not cannot be determined. Therefore, the preliminary clustering result can be optimized, the number of the accounts in the preliminary clustering can be expanded, and a more accurate and complete account clustering result can be obtained.
Further, after obtaining the clustered account association subgraphs, the method further comprises the following steps: and generating a unique identity identifier OneID of the user to which the account associated subgraph belongs for each clustered account associated subgraph.
In the embodiment of the invention, because the account information contained in each clustered account association subgraph can be derived from a plurality of different channels (for example, a plurality of different online and/or offline service platforms), the OneID of the affiliated user is generated for each clustered account association subgraph, and the OneID can be used for identifying the identity of the multi-channel-sourced client. Therefore, the identification and data communication of the business objects can be completed, the problem of data islanding is solved, and the power-assisted business service main body can complete the operation and marketing activities of the whole channel.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an account clustering device in an embodiment of the present invention. The account clustering device may include:
the account information to be clustered determining module 21 is configured to determine a plurality of account information to be clustered;
a preliminary grouping module 22, configured to perform preliminary grouping on the account information to be clustered to obtain a plurality of preliminary account groups, where account information included in each preliminary account group belongs to the same user;
an intra-group pairing module 23, configured to perform intra-group pairing on account information in at least a portion of the primary account groups, where each of the primary account groups performing intra-group pairing obtains a corresponding one or more account pairs;
the graph generating module 24 is configured to input the obtained account pairs into a preset graph calculation model, so as to generate an account association relationship graph;
the clustering module 25 is configured to split the account association relationship graph to obtain a plurality of initialized account association subgraphs, and then perform iterative operation based on each initialized account association subgraph until no connection relationship exists between any two nodes between every two account association subgraphs, stop iteration, and obtain each clustered account association subgraph, where in each iterative operation, the plurality of account association subgraphs with the nodes having the connection relationship are combined into a single account association subgraph;
the nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
Regarding the principle, implementation and beneficial effects of the account clustering device, please refer to the foregoing and the related description about the account clustering method shown in fig. 1, which are not repeated herein.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, the computer program executing the steps of the account clustering method shown in fig. 1 when being run by a processor. The computer readable storage medium may include non-volatile memory (non-volatile) or non-transitory memory, and may also include optical disks, mechanical hard disks, solid state disks, and the like.
Specifically, in the embodiment of the present invention, the processor may be a central processing unit (central processing unit, abbreviated as CPU), and the processor may also be other general purpose processors, digital signal processors (digital signal processor, abbreviated as DSP), application specific integrated circuits (application specific integrated circuit, abbreviated as ASIC), off-the-shelf programmable gate arrays (field programmable gate array, abbreviated as FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically erasable ROM (electrically EPROM, EEPROM), or a flash memory. The volatile memory may be a random access memory (random access memory, RAM for short) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, abbreviated as RAM) are available, such as static random access memory (static RAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, abbreviated as DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus random access memory (direct rambus RAM, abbreviated as DR RAM).
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the account clustering method shown in the figure 1 when running the computer program. The terminal can include, but is not limited to, terminal equipment such as a mobile phone, a computer, a tablet computer, a server, a cloud platform, and the like.
It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, the character "/" indicates that the front and rear associated objects are an "or" relationship.
The term "plurality" as used in the embodiments herein refers to two or more.
The first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the description objects, and no order division is used, nor does it indicate that the number of the devices in the embodiments of the present application is particularly limited, and no limitation on the embodiments of the present application should be construed.
It should be noted that the serial numbers of the steps in the present embodiment do not represent a limitation on the execution sequence of the steps.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims (12)

1. An account clustering method is characterized by comprising the following steps:
determining a plurality of account information to be clustered;
preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, wherein the account information contained in each preliminary account group belongs to the same user;
performing intra-group pairing on account information in at least a part of the preliminary account groups, wherein each preliminary account group subjected to intra-group pairing obtains one or more corresponding account pairs;
inputting the obtained account pairs into a preset graph calculation model to generate an account association relationship graph;
splitting the account association relationship graph to obtain a plurality of initialization account association subgraphs, and then carrying out iterative operation based on each initialization account association subgraph until no connection relationship exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining clustered account association subgraphs, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relationship are combined into a single account association subgraph;
the nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
2. The method of claim 1, wherein the determining a plurality of account information to be clustered comprises:
acquiring a plurality of business data from different business platforms, wherein each business data comprises a main identity of a user to which each business data belongs, and each business platform comprises a plurality of off-line entity stores and/or a plurality of on-line virtual stores;
and for each service data, extracting the main identity of the user belonging to the service data, and taking the extracted main identities as the account information to be clustered.
3. The method of claim 2, wherein the traffic data is selected from the group consisting of:
trade order data, member meeting data, interaction data.
4. The method of claim 2, wherein each of the traffic data further comprises one or more secondary identities of the respective user;
preliminary grouping is carried out on the account information to be clustered to obtain a plurality of preliminary account groups, and the method comprises the following steps:
and respectively determining service data of account information sources to be clustered, and dividing the account information extracted from each service data containing the same secondary identity into a group to obtain a plurality of preliminary account groups.
5. The method of claim 4, wherein the secondary identity is selected from the group consisting of:
communication number, social software account number, identity of service platform.
6. The method of claim 1, wherein intra-group pairing of account information in at least a portion of the set of preliminary accounts comprises:
and for at least a part of the preliminary account groups, selecting one piece of account information to be paired from each preliminary account group, and forming paired account pairs by the rest account information in the preliminary account groups and the account information to be paired respectively.
7. The method of claim 1, wherein the graph calculation model is a Spark-graph model.
8. The method according to claim 1, wherein the merging the plurality of account related subgraphs for which the node having the connection relationship exists into a single account related subgraph comprises:
and merging the plurality of account related subgraphs with the nodes with the connection relationship by adopting a graph traversal algorithm pregel so as to determine the plurality of account related subgraphs.
9. The method of claim 1, wherein after obtaining the clustered individual account association subgraphs, the method further comprises:
and generating a unique identity identifier OneID of the user to which the account associated subgraph belongs for each clustered account associated subgraph.
10. An account clustering device, comprising:
the account information to be clustered determining module is used for determining a plurality of account information to be clustered;
the primary grouping module is used for carrying out primary grouping on the account information to be clustered to obtain a plurality of primary account groups, wherein the account information contained in each primary account group belongs to the same user;
the intra-group pairing module is used for performing intra-group pairing on account information in at least a part of the primary account groups, and each primary account group performing intra-group pairing obtains one or more corresponding account pairs;
the diagram generating module is used for inputting the obtained account pairs into a preset diagram calculation model so as to generate an account association relation diagram;
the clustering module is used for splitting the account association relation graph to obtain a plurality of initialization account association subgraphs, then carrying out iterative operation on the basis of each initialization account association subgraph until no connection relation exists between any two nodes between every two account association subgraphs, stopping iteration and obtaining each clustered account association subgraph, wherein in each iterative operation, the plurality of account association subgraphs with the nodes with the connection relation are combined into a single account association subgraph;
the nodes are used for indicating the account information, and each node with a connection relationship is used for indicating the account information belonging to the same user.
11. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the account clustering method of any one of claims 1 to 9.
12. A terminal comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor executes the steps of the account clustering method according to any one of claims 1 to 9 when the computer program is executed.
CN202310625405.3A 2023-05-29 2023-05-29 Account clustering method and device, computer readable storage medium and terminal Active CN116362737B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310625405.3A CN116362737B (en) 2023-05-29 2023-05-29 Account clustering method and device, computer readable storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310625405.3A CN116362737B (en) 2023-05-29 2023-05-29 Account clustering method and device, computer readable storage medium and terminal

Publications (2)

Publication Number Publication Date
CN116362737A true CN116362737A (en) 2023-06-30
CN116362737B CN116362737B (en) 2023-10-13

Family

ID=86910677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310625405.3A Active CN116362737B (en) 2023-05-29 2023-05-29 Account clustering method and device, computer readable storage medium and terminal

Country Status (1)

Country Link
CN (1) CN116362737B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100057580A1 (en) * 2008-08-28 2010-03-04 Radha Raghunathan Unified payment card
CN105630904A (en) * 2015-12-21 2016-06-01 中国电子科技集团公司第十五研究所 Internet account information mining method and device
US20170116315A1 (en) * 2015-10-21 2017-04-27 International Business Machines Corporation Fast path traversal in a relational database-based graph structure
CN109447177A (en) * 2018-11-12 2019-03-08 南京中孚信息技术有限公司 Account clustering method, device and server
CN110852739A (en) * 2018-08-20 2020-02-28 北京嘀嘀无限科技发展有限公司 Account number merging method, device, equipment and computer readable storage medium
CN111125469A (en) * 2019-12-09 2020-05-08 重庆邮电大学 User clustering method and device for social network and computer equipment
CN111368013A (en) * 2020-06-01 2020-07-03 深圳市卡牛科技有限公司 Unified identification method, system, equipment and storage medium based on multiple accounts
CN111701247A (en) * 2020-07-13 2020-09-25 腾讯科技(深圳)有限公司 Method and equipment for determining unified account
CN113641657A (en) * 2021-08-23 2021-11-12 苏州良医汇网络科技有限公司 Method, device and equipment for merging user accounts
CN114254278A (en) * 2021-11-19 2022-03-29 中国建设银行股份有限公司 User account merging method and device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100057580A1 (en) * 2008-08-28 2010-03-04 Radha Raghunathan Unified payment card
US20170116315A1 (en) * 2015-10-21 2017-04-27 International Business Machines Corporation Fast path traversal in a relational database-based graph structure
CN105630904A (en) * 2015-12-21 2016-06-01 中国电子科技集团公司第十五研究所 Internet account information mining method and device
CN110852739A (en) * 2018-08-20 2020-02-28 北京嘀嘀无限科技发展有限公司 Account number merging method, device, equipment and computer readable storage medium
CN109447177A (en) * 2018-11-12 2019-03-08 南京中孚信息技术有限公司 Account clustering method, device and server
CN111125469A (en) * 2019-12-09 2020-05-08 重庆邮电大学 User clustering method and device for social network and computer equipment
CN111368013A (en) * 2020-06-01 2020-07-03 深圳市卡牛科技有限公司 Unified identification method, system, equipment and storage medium based on multiple accounts
CN111701247A (en) * 2020-07-13 2020-09-25 腾讯科技(深圳)有限公司 Method and equipment for determining unified account
CN113641657A (en) * 2021-08-23 2021-11-12 苏州良医汇网络科技有限公司 Method, device and equipment for merging user accounts
CN114254278A (en) * 2021-11-19 2022-03-29 中国建设银行股份有限公司 User account merging method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN116362737B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
Klesel et al. A test for multigroup comparison using partial least squares path modeling
US10825024B1 (en) Systems, devices, and methods for DLT-based data management platforms and data products
US11042946B2 (en) Identity mapping between commerce customers and social media users
WO2017222836A1 (en) Predicting psychometric profiles from behavioral data using machine-learning while maintaining user anonymity
TW201917601A (en) User intention recognition method and device capable of recognizing user intention by acquiring dialogue text from a user
JP2018532140A (en) Method and apparatus for online generation of problem paths from existing problem collections using knowledge graphs
CN112613917A (en) Information pushing method, device and equipment based on user portrait and storage medium
US11244153B2 (en) Method and apparatus for processing information
CN109685536B (en) Method and apparatus for outputting information
US20220101358A1 (en) Segments of contacts
JP7237905B2 (en) Method, apparatus and system for data mapping
WO2021174881A1 (en) Multi-dimensional information combination prediction method, apparatus, computer device, and medium
CN111259952A (en) Abnormal user identification method and device, computer equipment and storage medium
US10708234B2 (en) System, method, and recording medium for preventing back propogation of data protection
JP2015162246A (en) efficient link management for graph clustering
Löbner et al. Explainable machine learning for default privacy setting prediction
CN114943279A (en) Method, device and system for predicting bidding cooperative relationship
CN116362737B (en) Account clustering method and device, computer readable storage medium and terminal
US9760654B2 (en) Method and system for focused multi-blocking to increase link identification rates in record comparison
US9336249B2 (en) Decision tree with just-in-time nodal computations
US12039273B2 (en) Feature vector generation for probabalistic matching
CN114661887A (en) Cross-domain data recommendation method and device, computer equipment and medium
US10803102B1 (en) Methods and systems for comparing customer records
CN113822691A (en) User account identification method, device, system and medium
Demuynck et al. On the revealed preference analysis of stable aggregate matchings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant