CN106708844A

CN106708844A - User group partitioning method and device

Info

Publication number: CN106708844A
Application number: CN201510772638.1A
Authority: CN
Inventors: 黄光远
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2015-11-12
Filing date: 2015-11-12
Publication date: 2017-05-24
Also published as: WO2017080398A1

Abstract

The embodiment of the invention provides a user group partitioning method and device. The method comprises the steps of capturing user identifiers; building a joint behavior relation between the user identifiers; obtaining a user node diagram; in the user node diagram, recognizing one or more core user groups according to the joint behavior relation; in the one or more core user groups, dividing one or more target user groups according to the joint behavior relation. On one hand, manual rule setting is avoided, different user groups have different distribution features, although individual behavior differences are large, the potential relation of the user is more stable, and the user group partitioning accuracy of the user groups is improved in a diagram-based user group partitioning mode; on the other hand, by roughly partitioning the core user groups, the data size is greatly decreased, the partitioning efficiency is improved, and the user group partitioning accuracy is improved.

Description

User group dividing method and device

Technical Field

The present application relates to the field of computer processing technologies, and in particular, to a user group partitioning method and a user group partitioning apparatus.

Background

With the rapid development of the internet, the amount of information on the internet is increased rapidly, and the excessive information makes people unable to efficiently obtain the required part from the information, so that the use efficiency of the information is reduced.

Therefore, each large website generally divides the users facing the large website into different user groups, and provides more refined services.

In addition, in some security detection scenarios, it is also necessary to divide users into different user groups.

For example, in an e-commerce website, a lawbreaker maliciously increases the points of a store through a virtual transaction or the like, commonly called "drill brushing", and in order to maintain order, the website needs to identify the group of "drill brushing".

At present, two ways of dividing the user group are generally provided, one is to manually set rules, and the other is to use a community discovery algorithm.

In the manual rule setting method, it is often difficult to cover different characteristics of different groups, and the rules of the user groups are various and easy to change, and the manually set rules are difficult to avoid deviation, so that the accuracy of dividing the user groups is low.

Taking the identification of the "drill brushing" group as an example, the common rules for identifying the "drill brushing" group include "the number of the same type of goods browsed by the user before purchase", "the time length from the user to the order placement", "the time interval between the user and the purchase of a plurality of articles", and the like.

Different "brush-and-drill" populations often have different behaviors. For example, a 'drill brushing' group directly purchases a specified commodity after receiving a demand; and the other 'drill brushing' group can buy the appointed goods after browsing a plurality of similar goods.

For the two groups of 'drill brushing', the expression on the rule of 'the number of the same type of commodities browsed by the user before purchase' is different, and the identification and the judgment are difficult to be carried out through the same threshold value.

In the community discovery algorithm, data which is not consistent with a specific application scene is easily introduced into a result, so that the data volume is overlarge, the division efficiency is low, and the accuracy of user group division is low.

For example, when a group of 'drill swiping' is identified, firstly, the purchasing relationship between users is abstracted during modeling, and simply, two users purchase a certain commodity together, that is, consider that the two users have the relationship, so that the scale of the established graph is too large, the partitioning efficiency is low, and some users with a large purchasing quantity are mistakenly identified as 'drill swiping' users.

Disclosure of Invention

In view of the above problems, embodiments of the present application are proposed to provide a user group partitioning method and a corresponding user group partitioning apparatus that overcome or at least partially solve the above problems.

In order to solve the above problem, an embodiment of the present application discloses a method for dividing a user group, including:

capturing a user identifier;

establishing a common behavior relation among the user identifications to obtain a user node graph;

identifying, in the user node graph, one or more core user populations according to the common behavioral relationships;

and in the one or more core user groups, dividing one or more target user groups according to the common behavior relationship.

Optionally, the step of establishing a common behavior relationship between the user identifiers and obtaining the user node graph includes:

searching the behavior data of the user identification;

identifying common behavior data from the behavior data;

and establishing a common behavior relation for the user identifications to which the common behavior data belong.

Optionally, the step of searching the behavior data of the user identifier includes:

and extracting the behavior data of the user identification in a preset time period from a preset database.

Optionally, the step of establishing a common behavior relationship for the user identifiers to which the common behavior data belongs includes:

configuring weights for the common behavior data according to the types of the behavior data;

and when the sum of the weights is greater than a preset weight threshold value, establishing a common behavior relation for the user identification to which the common behavior data belongs.

Optionally, in the user node graph, the step of identifying one or more core user groups according to the common behavioral relationship includes:

calculating a core degree value of the user identifier in the user node graph;

and when the core degree value is larger than a preset core threshold value, determining that the user identifier corresponding to the core degree value belongs to a core user group.

Optionally, the step of calculating the core degree value of the user identifier in the user node map includes:

setting a global core degree value of the current iteration;

in the user node graph, counting the number of user identifications connected through a common behavior relation aiming at each user identification to obtain a node value;

in the user node graph, judging whether the node value of each user identifier is smaller than or equal to a global core degree value;

if so, removing the user identification of which the node value is smaller than or equal to the global core degree value;

assigning the global core degree value to the user identifier removed in advance to serve as the core degree value of the user identifier removed in advance;

deleting the common behavior relation connected with the previously removed user identifier in the user node graph, returning to execute the step of judging whether the node value of each user identifier is smaller than or equal to the global core degree value in the user node graph;

if not, returning to the step of executing the global core degree value of the current iteration until the user node graph is traversed.

Optionally, the step of setting the global kernel degree value of the current iteration includes:

setting an initial global core degree value to be 1 during first iteration;

or,

and when the iteration is not performed for the first time, adding 1 on the basis of the last global core degree value to be used as the current global core degree value.

Optionally, the step of dividing, in the one or more core user groups, one or more target user groups according to the common behavioral relationship includes:

configuring a tag for each user identification in the one or more core user groups, the tag having a numerical value;

transmitting the label of each user identifier to the connected user identifiers;

selecting one label from the labels received by each user identifier as an owned label according to the numerical value of the label;

judging whether the labels owned by the user identifications in the one or more core user groups are changed or not;

if yes, returning to the step of transmitting the label of each user identifier to the connected user identifier;

if not, dividing the user identifications with the same label into target user groups.

configuring a tag for each user identity in the one or more core user groups;

selecting one label as an owned label from the labels received by each user identifier according to the number of the labels;

judging whether the labels owned by the user identifications in the one or more core user groups are changed or not, or whether the current labels are smaller than the preset maximum iteration times or not;

The embodiment of the present application further discloses a device for dividing a user group, which includes:

the user identification acquisition module is used for capturing user identifications;

the user node graph building module is used for building a common behavior relation among the user identifications to obtain a user node graph;

the core user group identification module is used for identifying one or more core user groups according to the common behavior relationship in the user node graph;

and the target user group division module is used for dividing one or more target user groups in the one or more core user groups according to the common behavior relationship.

Optionally, the user node graph building module includes:

the behavior data searching submodule is used for searching the behavior data of the user identification;

the common behavior data identification submodule is used for identifying common behavior data from the behavior data;

and the common behavior relation establishing submodule is used for establishing a common behavior relation for the user identification to which the common behavior data belongs.

Optionally, the behavior data search sub-module includes:

and the time section data searching unit is used for extracting the behavior data of the user identifier in a preset time section from a preset database.

Optionally, the common behavior relationship establishing sub-module includes:

the weight configuration unit is used for configuring weights for the common behavior data according to the types of the behavior data;

and the relationship establishing unit is used for establishing a common behavior relationship for the user identification to which the common behavior data belongs when the sum of the weights is greater than a preset weight threshold.

Optionally, the core user group identification module includes:

the core degree value operator module is used for calculating the core degree value of the user identifier in the user node graph;

and the core user group determining submodule is used for determining that the user identifier corresponding to the core degree value belongs to the core user group when the core degree value is greater than a preset core threshold value.

Optionally, the kernel level value operator module includes:

the global core degree value setting unit is used for setting a global core degree value of the current iteration;

a node value counting unit, configured to count, for each user identifier, the number of user identifiers connected in a common behavior relationship in the user node graph, to obtain a node value;

the quantity comparison unit is used for judging whether the node value of each user identifier is smaller than or equal to the global core degree value or not in the user node graph; if yes, calling a user identification removing unit, otherwise, returning to and calling the global core degree value setting unit until the user node graph is traversed;

a user identifier removing unit, configured to remove, from the user node map, a user identifier whose node degree value is less than or equal to the global core degree value;

the core degree value assigning unit is used for assigning the global core degree value to the user identifier removed in advance to be used as the core degree value of the user identifier removed in advance;

and the common behavior relation deleting unit is used for deleting the common behavior relation connected with the previously removed user identifier in the user node graph and returning to call the quantity comparing unit.

Optionally, the global core degree value setting unit includes:

the initial setting subunit is used for setting an initial global core degree value to be 1 during first iteration;

or,

and the value-added subunit is used for adding 1 to the last global core degree value as the current global core degree value when the iteration is not performed for the first time.

Optionally, the target user group partitioning module includes:

a first tag configuration submodule configured to configure a tag for each user identity in the one or more core user groups, the tag having a numerical value;

the first label transmission submodule is used for transmitting the label of each user identifier to the connected user identifier;

the first label selection submodule is used for selecting one label from the labels received by each user identifier as an owned label according to the numerical value of the label;

the first judgment submodule is used for judging whether the labels owned by the user identifications change in the one or more core user groups; if yes, returning to call the first label transmission submodule; if not, calling a first target user group division submodule;

and the first target user group division submodule is used for dividing the user identifications with the same label into target user groups.

Optionally, the target user group partitioning module includes:

a second tag configuration submodule, configured to configure a tag for each user identifier in the one or more core user groups;

the second label transmission submodule is used for transmitting the label of each user identifier to the connected user identifier;

the second label selection submodule is used for selecting one label as the owned label from the labels received by each user identification according to the number of the labels;

a second judgment submodule, configured to judge whether a tag owned by a user identifier in the one or more core user groups changes, or whether the current tag is smaller than a preset maximum iteration number; if yes, returning to call the second label transmission submodule; if not, calling a second target user group division submodule;

and the second target user group division submodule is used for dividing the user identifications with the same label into target user groups.

The embodiment of the application has the following advantages:

according to the method and the device, the user node graph is constructed through the common behavior relation of the users, the core user group is roughly divided in the user node graph, and the target user group is finely divided in the core user group.

Drawings

FIG. 1 is a flowchart illustrating the steps of an embodiment of a method for partitioning a user group according to the present application;

FIGS. 2A-2C are diagrams of an example of a user node graph according to the present application;

FIGS. 3A-3D are exemplary diagrams of identification of a core user population of the present application;

FIG. 4 is a diagram of an example of identification of a target user population of the present application;

fig. 5 is a block diagram illustrating an embodiment of a user group partitioning apparatus according to the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a user group partitioning method according to the present application is shown, and specifically may include the following steps:

step 101, capturing a user identifier;

the user ID may be information that can represent a certain user, such as a user ID (Identity), a cookie, a Mac (Media Access Control) address, and the like.

In the embodiment of the application, the server can record the data of the user through the website log and store the data in the database.

When dividing the user population, the user identification may be scraped from the database.

102, establishing a common behavior relationship among the user identifications to obtain a user node graph;

a common behavioral relationship may refer to the existence of common behaviors between users (characterized by user identities).

In the user node graph, nodes represent users (characterized by user identifiers), connecting lines represent relationships between nodes (i.e., common behavior relationships), and the user node graph is a strong relationship network representing that there is a common operation between users.

In one embodiment of the present application, step 102 may comprise the sub-steps of:

substep S11, finding behavior data of the user identifier;

a typical web log may record what the IP address of the user's computer is, at what time, with what operating system, what browser, what display, and which page of the web site was accessed, whether or not access was successful.

However, for the user behavior, not robot data such as an IP address, an operating system, and a browser of a user computer, but behavior data indicating the user's interest and preference, such as what information the user browses, and an expression behavior of the user's preference degree, are required.

In particular implementations, the website log may be filtered to obtain structured behavior data, such as user ID, product ID accessed by the user, access time, user behavior (e.g., click, purchase, rating, etc.).

For example, the website log may be:

118.112.27.164---[24/Oct/2012:11:00:00+0800]"GET/b.jpg？cD17Mn0mdT17L2NoaW5hLmFsaWJhYmEuY29tL30mbT17R0VUfSZzPXsyMDB9JnI9e2h0dHA6Ly9mdy50bWFsbC5jb20vP3NwbT0zLjE2OTQwNi4xOTg0MDEufSZhPXtzaWQ9MTdjMDM2MjEtZTk2MC00NDg0LWIwNTYtZDJkMDcwM2NkYmE4fHN0aW1lPTEzNTEwNDc3MDU3OTZ8c2RhdGU9MjR8YWxpX2FwYWNoZV9pZD0xMTguMTEyLjI3LjE2NC43MjU3MzI0NzU5ODMzMS43fGNuYT0tfSZiPXstfSZjPXtjX3NpZ25lZD0wfQ＝＝&pageid＝7f0000017f00000113511803054674156071647816&sys＝ie6.0|windowsXP|1366*768|zh-cn&ver＝43&t＝1351047705828HTTP/1.0"200-"Mozilla/4.0(compatible；MSIE 6.0；Windows NT 5.1；SV1；.NET CLR 2.0.50727)"118.112.27.164.135104760038.61^sid％3D17c03621-e960-4484-b056-d2d0703cdba8％7Cstime％3D1351047705796％7Csdate％3D24|cna＝-^-^aid＝118.112.27.164.72573247598331.7

the structured behavior data obtained after filtering may be:

1,b2b-1633112210,1215596848,1,07/Aug/2013:08:27:22

it should be noted that the behavior of the user is time-efficient, and for example, buying popsicles in summer, buying down in winter, etc., the time dimension is generally considered for the establishment of the common behavior relationship.

Therefore, in the embodiment of the present application, behavior data of a user identifier within a preset time period may be extracted from a preset database.

A substep S12 of identifying common behavior data from the behavior data;

in practical applications, common behavior data refers to behavior data that is the same among users (characterized by user identities).

In the e-commerce website, behavior data of purchasing, collecting, commenting and adding in a shopping cart in a period of time can be taken, and operation records of commodities purchased by two users together, commodities collected together, commodities commenting together and commodities added in the shopping cart together in a certain time interval are counted.

For example, if a purchase record in one month is taken, the time interval is one week, and if buyer a purchases at a certain store on weekdays and buyer B purchases at the store three days later, buyer a and buyer B have a common behavior data.

It should be noted that common behavior data with different granularities can be adopted according to the requirements of different service scenarios.

Taking the common purchase data as an example, the "common" relationship can be flexibly realized according to different application scenes and investigation objects.

If a "drill brushing" group of a single commodity is identified, the "common behavior data" should be defined as "two users have purchased the same commodity together" because the group is a single commodity.

When dealing with a group division scenario of stores, the relationship between stores is taken into consideration, and therefore the "common behavior data" can be defined as "two users purchase an arbitrary product in the same store".

And a substep S13, establishing a common behavior relationship for the user identifications to which the common behavior data belong.

In a specific implementation, the user intention strengths of different behavior expressions are different, for example, the user has the strongest intention to purchase goods, the collection order is the second order, and the browsing is weaker, so that the common behavior data can be configured with weights according to the types of the behavior data.

In addition, a weight threshold is preset, and the setting of the weight threshold is in direct proportion to the strength of the user intention expressed by the behavior, and is generally between 0 and 1.

As shown in fig. 2A, in constructing the user node graph, if the user a and the user B have common behavior data, a dotted line may be connected between the user a and the user B.

As shown in fig. 2B, if it is determined that the user a and the user B have a strong common behavior relationship, a solid line may be connected between the user a and the user B.

As shown in fig. 2C, the same operation is performed on each user, and then a user node map may be constructed, for example, a user a-Q constructs a user node map.

Step 103, identifying one or more core user groups according to the common behavior relationship in the user node graph;

the core user group may refer to a group formed by main users targeted by the server, such as a group formed by users with more active behaviors and closer association.

In a specific implementation, the graph algorithm Kcore may be used to filter edge nodes, find nodes (i.e. user identifiers) in the user node graph corresponding to the core location, and find their association.

In one embodiment of the present application, step 103 may comprise the following sub-steps:

a substep S21, calculating a core degree value of the user identifier in the user node graph;

in this embodiment, the core degree value may represent the importance degree of the user, and a higher core degree value represents a higher importance degree of the user.

In one embodiment of the present application, the sub-step S21 may further include the sub-steps of:

a substep S211 of setting a global core degree value;

in a specific implementation, at the time of the first iteration, an initial global kernel degree value may be set to 1, and assuming that k is set to 1 and k is k, k is 1 at the initial time.

In the case of a non-first iteration, 1 may be added to the last global kernel level value as the current global kernel level value, i.e., k equals k +1, k equals 2 in the second iteration, k equals 3 in the third iteration, and so on.

Substep S212, in the user node map, counting the number of user identifiers connected through a common behavior relationship for each user identifier to obtain a node value;

in the user node graph, a certain node (i.e., a user identifier) has nodes (i.e., user identifiers) connected by N edges (i.e., a common behavior relationship), and then its node value is N, where N is a positive integer.

For example, as shown in FIG. 2C, node A is connected to node B, C, D, E, F, J, and the node A has a node value of 6; and node J is connected to node A only, the node value of node J is 1.

Substep S213, in the user node map, determining whether the node value of each user identifier is less than or equal to the global core degree value; if yes, executing substep S214, otherwise, returning to substep S211 until traversing the user node graph is completed;

a substep S214, in the user node graph, removing the user identifier whose node value is less than or equal to the global core degree value;

substep S215, assigning the global core degree value to the previously removed user identifier as a core degree value (core) of the previously removed user identifier;

and a substep S216, in the user node graph, deleting the common behavior relation connected with the user identifier removed in advance, and returning to execute the substep S213 until the user node graph is traversed.

In the embodiment of the application, the graph algorithm Kcore supports a distributed system and can process massive data.

In each iteration, nodes and edges are removed to form a new user node map, and processing is performed in the next iteration, namely in the new user node map.

As shown in the user node map shown in fig. 2C, at the first iteration, k is 1, and the node values of the nodes are as follows:

node degree value	Node point
		1	J、K、L、M、N、O、P、Q
2	E、F
		4	B、C、G、H、I
5	D
		7	A

In the first iteration, the node value of node J, K, L, M, N, O, P, Q is equal to k (1), so node J, K, L, M, N, O, P, Q and its connected edges are removed and node J, K, L, M, N, O, P, Q is assigned a value of k, and its core degree value (coreness) is 1.

As shown in the user node diagram in fig. 3A, after removing the node J, K, L, M, N, O, P, Q and its connected edge, the node value of each node changes, for example, the node value of the node I becomes 1, and the node value of each node is as follows:

node degree value	Node point
		1	I
2	E、F、G、H
		4	B、C
5	D
		6	A

The node value of node I is equal to k (1), so that node I and its connected edges are removed, and k is assigned to node I, and its core degree value (coreness) is 1.

As shown in the user node graph in fig. 3B, after the node I and the edges connected to the node I are removed, the node values of the nodes are no longer changed, and the node values of all the nodes are greater than or equal to the global core degree value k (1) of the current iteration, and the node values of the nodes are as follows:

node degree value	Node point
		2	E、F、G、H
4	B、C
		5	D
6	A

Therefore, a second iteration is entered, with k +1 and 2.

In the second iteration, the node value of node I, E, F, G, H is less than or equal to k (2), so node I, E, F, G, H and its associated edges are removed and node I, E, F, G, H is assigned a value of k, which results in a core degree value (coreness) of 2.

As shown in the user node diagram in fig. 3C, after the node I, E, F, G, H and its connected edges are removed, the node values of the nodes are no longer changed, and the node values of all the nodes are greater than or equal to the global kernel-degree value k (2) of the current iteration, and the node values of the nodes are as follows:

node degree value	Node point
		3	A、B、C、D

Thus, the third iteration is entered, k + 1-3.

In the third iteration, the node value of node A, B, C, D is equal to k (3), so node A, B, C, D and its connected edges are removed and node A, B, C, D is assigned k, and its core degree value (coreness) is 3, at which point traversing the user node graph is complete.

As shown in fig. 3D, the core degree value of the node J, K, L, M, N, O, P, Q, I is 1(core ═ 1), the core degree value of the node E, F, G, H is 2(core ═ 2) at the outermost layer, the core degree value of the node A, B, C, D is 3(core ═ 3) at the second outer layer, and the middle layer.

And a substep S22, determining that the user identifier corresponding to the core degree value belongs to the core user group when the core degree value is greater than a preset core threshold value.

In this embodiment of the present application, a set of nodes whose core degree values (core) are greater than a certain core threshold may be taken, and the corresponding user group is a core user group of the user node graph.

The core threshold setting is related to the size scale of the user node map, for example, for ten million levels of user node maps, the range of the core threshold is above 100.

Generally speaking, the core user group does not consider whether there is a connection, because according to the processing procedure of the graph algorithm KCore, a set of nodes with core degree values (coreness) greater than a certain core threshold value will form several subgraphs, and no isolated single node exists.

That is, here, a plurality of user groups with coarse ranges are divided according to core users.

Of course, besides the graph algorithm Kcore, other ways may also be used to identify the core user group, for example, using a value algorithm to identify the core user group, the value calculation method is simpler, a higher value indicates that the user has a stronger co-operation relationship with more other users, and the like.

And 104, dividing one or more target user groups in one or more core user groups according to the common behavior relation.

In the embodiment of the present application, the fine division may be further performed on the basis of the coarse-range user group (i.e., the core user group).

In an embodiment of the present application, if the structure of the user node graph is relatively simple, or the requirement on the partition accuracy of the user group is not high, the target user group may be partitioned on the basis of the core user group by using a connected graph algorithm.

Wherein, in an undirected graph, if v is from the vertex_iTo the vertex v_jIf there is a path connection, then it is called v_iAnd v_jAnd all nodes in the connected graph are connected.

As in the scenario of identifying a "drill-through" population, since relatively strict criteria are used in data modeling data cleansing, a preliminary screening by a connectivity graph algorithm may be performed at this time.

In the connected graph algorithm, if two users belong to different user groups, a strong common operation relationship does not exist between the two users, that is, edges do not exist in two nodes in a corresponding user node graph.

Then, in the embodiment of the present application, step 104 may include the following sub-steps:

a substep S31 of configuring a tag for each user identity in the one or more core user groups;

in a specific implementation, for convenience of calculation, the tag may be its user ID, and of course, the tag may also be configured in other ways, such as random configuration, as long as the uniqueness of the tag is maintained, which is not limited in this embodiment of the application.

In the embodiments of the present application, the tag has a numerical value, such as 1, 2, etc.

A substep S32 of transferring the label of each user identity to the connected user identity;

in the embodiment of the present application, the label of each user identifier may be transferred to its neighbor, and as such, the user identifier may receive the label transferred by its neighbor.

For example, as shown in fig. 4, in the core user group, the node R transmits its label to the node S and the node T, and receives the label transmitted by the node S and the node T.

Substep S33, selecting one label from the labels received by each user identifier as the owned label according to the value of the label;

in a specific implementation, the tag with the largest numerical value may be selected, and the tag with the smallest numerical value may also be selected, so that it is only necessary to ensure that the updating policies are consistent, and the embodiment of the present application does not impose any limitation on this.

A substep S34 of determining whether a tag owned by the user identifier has changed in the one or more core user groups; if yes, returning to execute the substep S32, otherwise, executing the substep S35;

and a substep S35, dividing the user identifications having the same label into target user groups.

Because the labels have uniqueness, the nodes are communicated in the same user group, and the nodes are not communicated in different user groups, so that the labels can flow in the same user group in the iteration process, the labels of the same user group gradually tend to be stable, when the labels are stable, the nodes with the same labels belong to the same communication graph, namely, the users corresponding to the nodes belong to the same user group, and the labels of the nodes can be used as the identification labels of the user group.

For example, as shown in fig. 4, assuming that the labels of the node R, S, T, U have values of 1, 2, 3, and 4, respectively, the label with the smallest value is selected, and the iterative process is as follows:

after the 3 rd iteration, the labels owned by the user identifiers are all 1, and no change occurs, so that the node R, S, T, U belongs to the same connected graph, and the user corresponding to the node R, S, T, U belongs to the same user group.

In another embodiment of the present application, if the structure of the user node graph is complex, or different user groups need to be partitioned more accurately, a community discovery algorithm may be used to partition the different user groups.

In a scene of identifying a microblog group, the number of related users is large, the user node graph is complex, and the community discovery algorithm can obtain high accuracy.

In the community discovery algorithm, the connection lines among the nodes belonging to the same user group are dense, and the connection lines among the nodes of different user groups are sparse, namely, the relationship of users corresponding to the nodes in the same user group is tighter, and the 'group' attribute of the user group can be well reflected.

In the embodiment of the application, the community discovery algorithm supports a distributed system and can process massive data.

a substep S41 of configuring a tag for each user identity in the one or more core user groups;

A substep S42 of transferring the label of each user identity to the connected user identity;

substep S43, selecting one label as the owned label according to the number of labels from the labels received by each user identifier;

in a specific implementation, the tags with the largest number may be selected, and if the number of the tags is the same, the tags may be randomly selected.

A substep S44, determining whether a tag owned by the user identifier changes in the one or more core user groups, or whether the current number of iterations is less than a preset maximum number of iterations; if yes, returning to execute the substep S42, otherwise, executing the substep S45;

and a substep S45, dividing the user identifications having the same label into target user groups.

In the first iteration, the labels can be randomly selected, and because the nodes of the core are connected with other peripheral nodes, the probability that the labels are randomly obtained is higher, and in the subsequent iteration process, the number of the labels of the nodes of the core can be increased and gradually stabilized.

When the label is stable or reaches the maximum iteration number, the nodes with the same label belong to the same user group, and the label of the node can be used as the identification label of the user group.

For example, as shown in fig. 4, the names of the nodes are used as the labels of the nodes, i.e., the labels of the nodes R, S, T, U are R, S, T, U, respectively, and then the iterative process is as follows:

after the 3 rd iteration, the labels owned by the user identifiers are all R, and no change occurs, so that the users corresponding to the node R, S, T, U belong to the same user group.

Of course, other community discovery algorithms, such as GN algorithm, Louvain algorithm, etc., may be used besides the above community discovery algorithm, which is not limited in this embodiment of the present application.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

Referring to fig. 5, a block diagram of an embodiment of a device for dividing a user group according to the present application is shown, and specifically includes the following modules:

a user identifier obtaining module 501, configured to capture a user identifier;

a user node graph building module 502, configured to build a common behavior relationship between the user identifiers, and obtain a user node graph;

a core user group identification module 503, configured to identify, in the user node map, one or more core user groups according to the common behavior relationship;

a target user group division module 504, configured to divide one or more target user groups according to the common behavior relationship among the one or more core user groups.

In an embodiment of the present application, the user node map building module 502 may include the following sub-modules:

In an example of the embodiment of the present application, the behavior data search sub-module may include the following units:

In an example of the embodiment of the present application, the common behavior relationship establishing sub-module may include the following units:

In an embodiment of the present application, the core user group identification module 503 may include the following sub-modules:

In an embodiment of the present application, the kernel level value operator module may include the following units:

In an example of the embodiment of the present application, the global kernel level value setting unit may include a subunit:

or,

In an embodiment of the present application, the target user group division module 504 may include the following sub-modules:

In another embodiment of the present application, the target user group division module 504 may include the following sub-modules:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (fransitory media), such as modulated data signals and carrier waves.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. The term "comprising" is used to specify the presence of stated elements, but not necessarily the presence of stated elements, unless otherwise specified.

The method for dividing the user group and the device for dividing the user group provided by the application are introduced in detail, specific examples are applied in the text to explain the principle and the implementation of the application, and the description of the above embodiments is only used to help understand the method and the core ideas of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for dividing a user group, comprising:

capturing a user identifier;

2. The method according to claim 1, wherein the step of establishing a common behavior relationship between the user identifiers and obtaining a user node graph comprises:

searching the behavior data of the user identification;

identifying common behavior data from the behavior data;

3. The method of claim 2, wherein the step of locating the user-identified behavior data comprises:

4. The method of claim 2, wherein the step of establishing a common behavioral relationship for the user identities to which the common behavioral data belongs comprises:

5. The method of claim 1 or 2 or 3 or 4, wherein the step of identifying one or more core user populations according to the common behavioral relationship in the user node graph comprises:

calculating a core degree value of the user identifier in the user node graph;

6. The method of claim 5, wherein the step of calculating the core metric value for the user identification in the user node graph comprises:

setting a global core degree value of the current iteration;

7. The method of claim 6, wherein the step of setting the global kernel degree value of the current iteration comprises:

setting an initial global core degree value to be 1 during first iteration;

or,

8. The method according to claim 1, 2, 3, 4, 6 or 7, wherein the step of partitioning one or more target user groups according to the common behavioral relationship among the one or more core user groups comprises:

9. The method according to claim 1, 2, 3, 4, 6 or 7, wherein the step of partitioning one or more target user groups according to the common behavioral relationship among the one or more core user groups comprises:

configuring a tag for each user identity in the one or more core user groups;

10. An apparatus for dividing a user group, comprising:

11. The apparatus of claim 10, wherein the user node graph building module comprises:

12. The apparatus of claim 11, wherein the behavior data lookup sub-module comprises:

13. The apparatus of claim 11, wherein the common behavioral relationship establishment submodule comprises:

14. The apparatus of claim 10, 11, 12 or 13, wherein the core user group identification module comprises:

15. The apparatus of claim 14, wherein the kernel level value operator module comprises:

16. The apparatus of claim 15, wherein the global kernel level value setting unit comprises:

or,

17. The apparatus of claim 10, 11, 12, 13, 15 or 16, wherein the target user group partitioning module comprises:

18. The apparatus of claim 10, 11, 12, 13, 15 or 16, wherein the target user group partitioning module comprises: