CN102202012A - Group dividing method and system of communication network - Google Patents
Group dividing method and system of communication network Download PDFInfo
- Publication number
- CN102202012A CN102202012A CN201110141970XA CN201110141970A CN102202012A CN 102202012 A CN102202012 A CN 102202012A CN 201110141970X A CN201110141970X A CN 201110141970XA CN 201110141970 A CN201110141970 A CN 201110141970A CN 102202012 A CN102202012 A CN 102202012A
- Authority
- CN
- China
- Prior art keywords
- node
- corporations
- communication
- limit
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Transfer Between Computers (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a group dividing method of a communication network, which comprises the following steps: preprocessing the communication data; creating a communication relationship network according to the obtained preprocessing result to obtain the nodes representing a communication sender and a communication receiver in the communication network as well as a side representing the communication relationship between the communication sender and the communication receiver; constructing a demand text vector and a communication text vector according to a query word provided by the user; calculating the node centrality of each node in the communication relationship network; calculating the communication relationship strength among the nodes in communication relationship in the communication relationship network, the similarity of the sides among the nodes and the satisfaction degree of the user on the sides among the nodes; performing side clustering operation on the sides in the communication relationship network to generate multiple groups; finding respective core members in the group according to the node centrality and communication theme; expanding the members in the group; and dividing the expanded members in the group to generate a new group.
Description
Technical field
The present invention relates to the data mining field, particularly a kind of corporations' division methods and system of communication network.
Background technology
Fetion, mail, MSN, meanss of communication such as QQ become the important means that people carry out information interchange gradually, and the convenience of contact makes its application increasingly extensive.Communication network is social networks embodiment on the internet, and communication data provides the research sample for the discovery of social rule.By analyzing communication data, find user's interest public organization and core member according to user's request, this method is also referred to as corporations' division methods, and corporations' division result has been shone upon the group in the reality, has practical significance.
For corporations' division methods of communication network, prior art mainly is divided into two kinds:
A kind of Complex Networks Theory that is based on is divided communication network, and as spectral method, stratification is based on method of modularity etc.What the corporations of complex network divided concern is topology of networks, division result can be good at reflecting topology of networks, but comprised a large amount of extraneous data in the communication network, the speed that the existence of these data makes corporations divide on the one hand is restricted, though make that on the other hand dividing the result belongs to a group on topological structure, the Content of Communication of this group is not that the user pays close attention to.For the corporations that are met user's request divide, need screen communication data based on user's request.
Another kind is based on Content of Communication communication network is carried out corporations' division, as the k-means algorithm, and Bayes etc., the Content of Communication that Content of Communication is similar is divided into corporations.Adopt the resulting corporations of this method,, and can meet consumers' demand by screening though Content of Communication is similar, for the group of same Content of Communication may be corresponding in real society different " groups ".
Consideration Content of Communication and user's request are carried out corporations and are divided, need to consider on the one hand the requirement of the renewal of communication data every day to algorithm speed, to consider to communicate by letter text on the other hand to the influence of node and side attribute, thereby the analysis result that obtains is met consumers' demand.
Summary of the invention
The objective of the invention is to overcome existing corporations division methods and in partition process, lay particular stress on to some extent, can't meet consumers' demand, can not reflect the feature of node well.
To achieve these goals, the invention provides a kind of corporations' division methods of communication network, comprising:
Step 1), communication data is carried out preliminary treatment, obtain the information that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication about communication data;
Step 2), create the communications and liaison relational network that is used to reflect described communication network architecture according to the resulting preliminary treatment result of step 1), obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Step 3), the query word structure demand text vector that provides according to the user and the text vector of communicating by letter;
The node center degree of each node in step 4), the described link relation network of calculating; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Step 5), calculate communications and liaison relationship strength, the similarity between each internodal limit and user between each node that has the communications and liaison relation in the described communications and liaison relational network to the satisfaction on described internodal limit;
Step 6), be cluster operation in the described communications and liaison relational network, generate a plurality of corporations while doing based on described Content of Communication;
Step 7), in described corporations, seek separately core member according to described node center degree and communication theme;
Step 8), on described core member's basis, the member in the corporations is expanded;
Step 9), the member that process in the described corporations is expanded divide, and generate new corporations.
In the technique scheme, described step 6) comprises:
Step 6-1), determine the number of the corporations that the limit cluster will generate;
Step 6-2), be each corporations' generation initial cores separately;
Step 6-3), for every in communication network limit, calculate the similarity between the initial cores in itself and described each corporations successively;
Step 6-4), according to step 6-3) result of calculation, the limit in the described communication network is added in the corporations with the initial cores place of its similarity maximum;
Step 6-5), adjust the cluster centre of described each corporations;
Step 6-6), repeated execution of steps 6-3)-step 6-5), up to satisfying stop condition.
In the technique scheme, described step 6-2) comprising:
Step 6-2-1), according to the similarity between described each internodal limit, if s
Ij=0, then limit i and limit j are formed to depositing in the set A;
Step 6-2-2), in every group among the set of computations A with the class degree value of limit i
And the class degree value of limit j
Whether judge these two class degree values all greater than preassigned threshold value, have only that limit i and limit j formed to being isolated limit when described two class degree values during all less than described threshold value, will for the limit i on isolated limit and limit j formed to from set A, deleting;
Step 6-2-3), limit i in the set A and limit j are carried out step-by-step and operation
With satisfy the limit i of minimum value and limit j deposit in cluster centre center=(i, j) in;
Step 6-2-4), search with cluster centre center in the limit k of all limit similarity minimums as new cluster centre, if k does not exist, then return the cluster centre that finds, this cluster centre is exactly an initial cluster center; If it is a plurality of that k has, described k is deposited among the set center, re-execute step 6-2-3 then).
In the technique scheme, described step 7) comprises:
Step 7-1), be each member's computing node centrad in the corporations;
Step 7-2), theme as member's computing node weight in the corporations based on communication;
Step 7-3), node is sorted, obtain the core member according to ranking results by described node center degree and described node weights.
In the technique scheme, described step 8) comprises:
Step 8-1), get m and node i beeline and form set of node { v greater than 2 node
1, v
2..., v
m; The number of times that belongs to same corporations with variable fnum record and node i;
Step 8-2), from the set of node that previous step is produced, choose a undressed subclass, judge whether node and the node i in this node subclass belongs to same corporations;
Step 8-3), repeating step 8-2), the frequency p according to the fnum of each node calculates each node if frequency p, thinks then that this node and node i belong to same corporations greater than another threshold value, otherwise then is not.
In the technique scheme, described step 9) comprises:
Step 9-1), communication network is divided into n corporations, each node is exactly independently corporations; Wherein, initial modularity value Q=0, initial a
iAnd intermediate variable b
IjSatisfy:
E when node i has the limit to be connected with node j wherein
Ij=1; E when not having the limit to connect between node i and the node j
Ij=0; w
IjBe limit e
IjCorresponding weights; The element of module Increment Matrix satisfies when initial:
Step 9-2), from raft H, select maximum Δ Q
Ij, merging corresponding i of corporations and j, the label of the corporations after mark merges is j; And update module degree increment Delta Q
Ij, raft H and auxiliary vectorial a
i: this step comprises:
Step 9-2-1), Δ Q
IjRenewal, delete the element of the capable and i of i row, upgrade the element of the capable and j row of j, thereby obtain
Step 9-2-2), the renewal of raft H, upgrade Δ Q at every turn
IjAfter, upgrade the greatest member of corresponding row and column in the raft;
Step 9-2-3), auxiliary vector upgrades:
a′
j=a
i+a
j
a′
i=0
Modularity value Q+ Δ Q after record merges simultaneously
Ij
Step 9-3), repeating step 9-2) merge end condition up to satisfying.
The present invention also provides a kind of corporations of communication network to divide system, comprising: data preprocessing module, communications and liaison relational network structure module, text vector constructing module, node center degree computing module, side attribute computing module, limit cluster module, core member search module, member's expansion module and member and divide module; Wherein,
Described data preprocessing module is carried out preliminary treatment to communication data, obtains the information about communication data that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication;
Described communications and liaison relational network makes up module and creates the communications and liaison relational network that is used to reflect described communication network architecture according to resulting preliminary treatment result, obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Described text vector constructing module is according to user the query word structure demand text vector that provides and the text vector of communicating by letter;
Described node center degree computing module calculates the node center degree of each node in the described link relation network; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Described side attribute computing module calculates communications and liaison relationship strength, the similarity between each internodal limit and user between each node that has communications and liaison relations in the described link relation network to the satisfaction on described internodal limit;
Described limit cluster module is the cluster operation while doing in the described communications and liaison relational network based on described Content of Communication, generates a plurality of corporations;
Described core member searches module and seek separately core member according to described node center degree and communication theme in described corporation;
Described member's expansion module is expanded the member in the corporations on described core member's basis;
Described member divides module the member through expansion in the described corporations is divided, and generates new corporations.
The invention has the advantages that:
1, method and system of the present invention has extracted from communication network and has comprised and be used for representing the sender of communications of described communication network, communication receiver's node, be used to represent the limit of correspondence between described sender of communications, communication receiver, the node center degree, each internodal link relation intensity, similarity between each internodal limit and user to the satisfaction on described internodal limit in interior information, for the excavation and the analysis of follow-up communication data provides technical support than horn of plenty.
2, method and system of the present invention once spreads (being the diffusion that the cluster done when dividing of limit cluster, corporations and incorporator are done when expanding) by twice cluster and has realized corporations' division, divide the result accurately, reliable.
Description of drawings
Fig. 1 is a corporations of the present invention division methods flow chart in one embodiment;
Fig. 2 is the related in one embodiment schematic diagram that is used to store the pretreated form of process;
The flow chart of Fig. 3 in the corporations of the present invention division methods member in the corporations being expanded;
Fig. 4 is that corporations of the present invention divide system's schematic diagram in one embodiment.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is illustrated.
Before embodiments of the present invention are elaborated, at first related notion related among the present invention is described.
1, set of node N
Set of node N is the set of each communication node in the communication network.
2, limit collection E
Limit collection E is used in the record communication process as the communication node of transmit leg and as the correspondence between recipient's the communication node, is typically expressed as one 0,1 matrix, wherein e
IjThere is the limit to connect e between=1 expression node i and the node j
IjThere is not the limit to connect between=0 expression node i and the node j.
3, user's request Q
The scale of considering communication network is very huge, and in order to improve accuracy rate, the user need provide the demand text to come the lock onto target scope.For example, a user thinks the information of locking about " security ", and then this user need provide as keywords such as " security ", " stocks " and inquire about as the demand text, and all discussed the people of these speech with locked.Described user's request normally occurs with the form of speech.Need to prove, even user's request is clear and definite, can both may be the People's University as " National People's Congress " owing to the inconsistent ambiguity that causes of word also, also may be people's congress, thus also to expand the demand text, thus make up user inquiring vector Q.
4, nodal community collection L
N
Property set L for node i
NComprise following three:
1), communication number of the account:
Mapping relations between record node and the communication number of the account.
2), information of neighbor nodes table:
If there is the limit to connect between node i and the node j, then node i is called the neighbours of node j, and each node has the information of neighbor nodes table of self.The information of the neighbor node of one node is kept in the information of neighbor nodes table of this node.
3), node center degree C:
Each node is owing to the difference on its topological structure has different status in communication network.Node center degree C is an index that is used to indicate the communication node significance level taking all factors into consideration node tightness, intermediary's degree and contact degree, is represented with a matrix usually.
5, side attribute collection L
E
For limit e
IjProperty set L
EComprise following three:
1), communications and liaison intensity matrix W
In communication network, the communication communications and liaison intensity between the needs assessment node (being called for short communications and liaison intensity).If the direct communication behavior is arranged between the node, then the communications and liaison intensity reflects is that it gets in touch with intensity in reality; If there is not the direct communication behavior, then the communications and liaison intensity reflects is its possibility that produces information interchange in reality.Can take all factors into consideration information such as call duration time, communication frequency, topological structure and make up communications and liaison intensity matrix W.
2), similarity matrix S
The limit is expressed as the vector with semanteme, according to the similarity between the vector calculation limit.Similarity matrix S is that cluster analysis provides support.
3), user satisfaction CE
Every limit can be given a user satisfaction CE according to the user's request text, user satisfaction is used for judging that this limit is whether in user's AOI.
More than being the explanation to related notion of the present invention, in the following embodiments, will be example with the mail network, to how excavating the information in the mail network, and then realize that the process that corporations divide describes.In other embodiments, also can set up information excavating and corporations' division with reference to correlated process such as communication networks such as landline telephone, portable terminals.
Before mail network was analyzed, inevitable requirement had the related data of mail communication.These data can utilize prior art to obtain from the communication network such as the Internet, no longer repeat at this.Below with reference to Fig. 1, to how according to the mail communication data by the communication network mined information, and then realize that the process that corporations divide describes.
Preliminary treatment to the mail communication data mainly is the information that will obtain following many aspects:
1), communication data ID
Communication data is numbered, and ID is a unique identification of distinguishing communication data.In the present embodiment, be generally an envelope mail and give an ID.And in other embodiments,, give an ID for once talking with as in instant messagings such as MSN and QQ.
2), caller information
The information of transmit leg in the communication data.In the present embodiment, caller information can be the e-mail address of transmit leg, in other embodiments, also can be number of the account, IP address of transmit leg etc., as long as can the unique identification transmit leg.
3), recipient's information
Recipient's information in the communication data.In the present embodiment, recipient's information can be recipient's e-mail address, in other embodiments, also can be number of the account, IP address of recipient etc., as long as can the unique identification recipient.
4), call duration time
The time of origin of communication data.In the present embodiment, call duration time can be the time that transmit leg sends mail, or the recipient receives the time of mail.In other embodiments, in the instant messaging process, other call duration time identification method can be arranged also, as with chat time started of primary network chat as call duration time.
5), Content of Communication
Content of Communication is exactly the content of text of communication data, as the theme and the text of Email, in the present embodiment, not with the information in the Email attachment as Content of Communication.In other embodiments, also can read text message in the annex by related software, and with it as Content of Communication.Owing in Chinese, do not have tangible line of demarcation between speech and the speech, therefore,, need do word segmentation processing to the content of text in the communication data as a kind of preferred implementation, obtain the Content of Communication of forming by a plurality of words.
A communication process in the communication network can obtain the information of above-mentioned five aspects, and the information of all or part communication process of whole communication network in a period of time is put together the basic data that just can be formed for describing the mail communication network.As a kind of preferred implementation, can be classified to these basic datas, and classification results is stored respectively with a plurality of tables.
In the present embodiment, with reference to figure 2, in the several below forms of sorted storage:
A, mapping table: this form is a mapping table, can find the pairing node name information of communication number of the account by inquiring about this table;
B, e-mail messages message: this form is the Content of Communication table, " mail numbering " mid is the major key of this table, unique " mail numbering " mid is all arranged as sign for each communication, if the theme and the text of communication that be mail then this table essential record is if be other communication formats then be chat record;
C, related information table recipient info: this form is that Content of Communication receives information table, in this table, can inquire essential information in " e-mail messages " message table by field " mail numbering " mid;
D, related information table: this form is the contact table, has write down receiving and sending messages between the communication number of the account in this form;
E, weight table: this form is the weight information table of communication number of the account contact;
F, interactive information table: this table comprises text message vector sum user satisfaction for the interactive information table between the communication number of the account.
In step before, from the mail communication of reality, obtained corresponding data, these data itself can not reflect the integral status of mail network intuitively, therefore need to set up the communications and liaison relational network according to mail data in this step.
In the process of setting up the communications and liaison relational network, create a communication node for each communication number of the account, whether needs are created the limit between communication node according to the decision of the content in the resulting form after the preliminary treatment then.If have correspondence between two communication numbers of the account, there is the limit to exist between these two the pairing communication nodes of communication number of the account so, otherwise, just there is not corresponding limit.
When setting up the communications and liaison relational network, can obtain set of node N and limit collection E according to the mail communication data.The composition of set of node N and limit collection E and data structure have had corresponding explanation in preamble, therefore do not repeat herein.
In the preprocessing process of step 10, mention, can obtain text message (being Content of Communication) in the communication process by preprocessing process, and these text messages done word segmentation processing, these text messages are done following processing below by following operation.
Step 31, structure inverted index
On the basis of word segmentation result, utilize index dictionary and inactive vocabulary to make up inverted index.Index dictionary, the vocabulary and utilize the index dictionary and inactive vocabulary makes up the common practise of the process of inverted index for this area, therefore repetition herein of stopping using.
Step 32, establishment demand text vector and the text vector of communicating by letter
Include content aspect multiple, user's request customer-furnished comprising having, that represent with the form of query word usually in the text in communication.These texts relevant with user's request are called as the demand text, and the vector of being created by the demand text is called as the demand text vector.The form of demand text vector Q is as follows:
{(t
1,tw
1),(t
2,tw
2),...,(t
m,tw
m)}
Wherein, t
1, t
2..., t
mBe the inquiry lexical item, these speech are all arranged according to ascending order; Tw
1, tw
2..., tw
mFor being used to describe the weight of inquiry lexical item in the in the eyes of significance level of user.
Inquiry lexical item by the demand text can make up communication text vector { (t
1, tw
1), (t
2, tw
2) ..., (t
m, tw
m), and the weight of inquiry lexical item can be calculated by following formula, calculates the inquiry lexical item t among the mail j
iWeight tw
Ji:
F wherein
IjBe to comprise speech t among the mail j in the communication text collection
iNumber, N be communication text collection number.
Calculate weight tw by above-mentioned formula
JiAfter, just can calculate each inquiry lexical item t through weighted calculation
1, t
2..., t
mWeight tw in whole communication text collection
1, tw
2..., tw
mNeed to prove, though hereinbefore, in demand text vector and feature text vector, the weight of inquiry lexical item is all used such as the form of tw and is represented, but this weight reflects in the demand text vector be corresponding inquiry this in user's significance level in the heart, the frequency dependence that then in the text of communicating by letter, occurs with the inquiry lexical item in the communication text vector.
Step 33, expansion demand text
Consider the diversity of the employed query word of user, as in the example of an inquiry about computerized information, the user who has can be called computer " computer ", in order to make Query Result more accurate, complete, needs expansion demand text.
When expansion demand text, need add relevant lexical item by certain strategy, make the text after the expansion can intactly describe implicit notion or theme.
The operation of expansion demand text can may further comprise the steps:
Step 33-1, at first calculate a lexical item t and the inquiry co-occurrence frequency of lexical item q in text j:
cof(t,q|j)=log(tf(t,j)+1.0)×log(tf(q,j)+1.0)
Wherein, and tf (t, j) or tf (q, j) expression speech t or the occurrence number of q in text j.
Step 33-2, after obtaining the co-occurrence frequency of a lexical item and inquiry lexical item, can further calculate this lexical item and the degree of association of inquiring about between lexical item.
Suppose between each speech among the initial demand text Q separate, the degree of association that can measure lexical item t and Q according to the product of the co-occurrence frequency of each speech among lexical item t and the Q in local text set S.Lexical item t and the Q degree of association in S is defined as:
Wherein idf (| C) be defined as:
Df (| C) the text number of certain lexical item appears among the expression corpus C, μ be one greater than 0 adjustable parameter, default value is 100.
Step 33-3, calculate valuation functions, judge whether described lexical item t will be expanded in the demand text by the result of calculation of described valuation functions by the degree of association.
On the basis of aforementioned degree of association computing formula, take the logarithm in both sides, and the computing formula that obtains valuation functions score (t) is as follows:
Define lodd below
Q, C(t is under the condition of given overall text set C and user's request text vector Q q|S), lexical item t and the query word q local dependency degree (LocalDependence Degree) in the local document S set, and its computing formula is as follows:
lodd
Q,C(t,q|S)=idf(q|C)idf(t|C)log(cood(t,q|S)+1.0)
Then Zhi Qian valuation functions can be reduced to:
After obtaining the score value of valuation functions, just can select the higher lexical item of score value to carry out the expansion of demand text, on the one hand to those in local text set S with query vector Q in the lexical item of the numerous co-occurrence of word frequency give higher score value, concentrate lexical item then to carry out to a certain degree punishment (regulating the degree of punishment by the parameter μ in the idf computing formula) to those at overall mail on the other hand, make the lexical item that the score value finally chosen is the highest and the theme of user's request text have higher correlation with higher frequency.
Definitional part at preamble is mentioned, and the node center degree comprises node intermediary degree, node tightness and three indexs of node contact degree, with regard to how calculating these indexs describes respectively below.
Step 41, computing node intermediary degree
The mean value of the shortest path number by node k is called intermediary's degree coefficient of node k, is designated as C
A(k), then:
Wherein, g
Ij(k) be a two-valued variable, whether the shortest path between expression node i, the j then is 1 by k, otherwise is 0 by node k.
Step 42, computing node contact degree
The mean value of the node number that will directly link to each other with node k is called degree of the contact coefficient of node k, is designated as C
B(k), then:
Wherein n is the nodal point number of a network, and a (i is a two-valued variable k), is 1 explanation node i, directly link to each other between the k, and be that 0 explanation does not directly link to each other.
Step 43, node tightness
The mean value of the shortest path sum in node k and the network between all nodes is called the tightness coefficient of k, is designated as C
C(k), then:
Wherein (i k) is shortest path length between node i, the k to l.
Centrad vector C (k)=(C that just can computing node k after obtaining node intermediary degree, node tightness and node contact degree
A(k), C
B(k), C
C(k)).
To node i, the communications and liaison relationship strength assessment between the j comprises four indexs: number of communications, call duration time span, shortest path length, shared neighbours' number.Respectively the computational process of these indexs is described below.
Step 51, calculating number of communications
Number of communications is many more between node, shows that its contacts are frequent, concerns tight more.The number of communications of node i, j is calculated as follows:
comm_num
ij=send
ij+receive
ij
Wherein, send
IjThe number of times that the expression node i is initiated communication to node j, receive
IjThe expression node i receives the number of communications that node j initiates.
Step 52, calculating call duration time span
The inter-node communication time span is long more, shows that the interdependent node contact history is of a specified duration more, concerns closely more, and the call duration time span of node i, j is:
dur_day
ij=latest_day
ij-earliest_day
ij
Wherein, latest_day
IjBe the node i that monitors recently, the call duration time between j, earliest_day
IjIt is the initial communication time between node i, j.
Step 53, calculating shortest path length
Internodal shortest path length is short more, shows that the substantivity of its contacts is strong more, concerns tight more.Node i, the shortest path length shortest_len between j
IjExpression, it is meant that node i has the limit number that the path comprised of minimum edges number in all paths of j.
Step 54, shared neighbours' number
It is many more to share neighbours' node between node, shows that the possibility of its relationship cycle that exists together is big more, concerns tight more.The neighbor node set of scanning node i and j obtains sharing neighbours' number:
sharenode_num
ij=|neighbor
i∩neighbor
j|
Step 55, after calculating number of communications, call duration time span, shortest path length, sharing neighbours' number, just can calculate the function closeness (i that is used to assess two node communications and liaison relationship strength, j), (i, j) value has been formed described communications and liaison intensity matrix W to function closeness on a plurality of dimensions.Described function closeness (i, computing formula j) is:
Wherein, Max_num is a maximum communication number of times mutual between all nodes; Max_day is a maximum time span mutual between all nodes; Max_node is that maximum mutual between all nodes is shared neighbours' number; Max_len is the longest mutual between all a nodes shortest path; k
iBe weight coefficient.
Step 61, utilize vector space model to the edge-vector between node i and the node j unify the expression, every limit is a vector.Edge-vector between node i and the node j is defined as the mean value of all communication text vectors between node i and the node j.That is:
Wherein,
E
w-ID
w(m
k, t
j) representation feature speech t
jAt communication text m
kIn weight. step 62, calculate the similarity between any both sides
Utilize cosine formula to calculate the vector on any both sides
With
Between similarity, its computing formula is:
s
IjIts value is big more, and angle is more little, and similarity is high more.If
Then think e
iAnd e
jSimilar, otherwise dissimilar.Wherein,
Be similarity threshold.
Step 63, structure similarity matrix S
Carry out according to the abovementioned steps opposite side obtaining similarity matrix S on the basis of similarity calculating in twos:
Given threshold value
If
Then similar, otherwise dissimilar, the matrix S after can filtering in view of the above, wherein
By the user's request text is expanded, Content of Communication can be introduced.Detailed process is as follows:
The weight of step 71, computation requirement text
At first need definite each inquiry lexical item in the in the eyes of weight of user in order to obtain user's satisfaction, before the weight of computation requirement text, at first do as giving a definition:
R represents the text collection of meeting consumers' demand;
C represents all text collections;
N_C represents all text numbers in the set
All text numbers of meeting consumers' demand during N_sim represents to gather.
The weight of computation requirement text can adopt the correlation technique of prior art, in the present embodiment, can be according to the experiment of the relevant feedback of Rocchio, with the demand text as query vector, the desirable query vector that the text that satisfies the demands and the text that do not satisfy the demands are all made a distinction
Value on each dimension is as the weight of demand text.The computing formula of described desirable query vector is:
Wherein, d
jThe j dimension of the vector that expression is corresponding,
The value of the j dimension of the vector that expression is corresponding;
In the actual conditions, because the text number that satisfies the demands can't be known in advance, therefore when Practical Calculation, at first construct an initial query vector, be that the user gives one [0 with each lexical item, 1] value is represented its significance level, according to the text that satisfies the demands of user's appointment it is progressively revised then, up to reaching an ideal results.The classic algorithm that Rocchio proposes is as follows:
Wherein α, β, γ are three constants that are used to adjust;
Expression initial query vector.
The user satisfaction of step 72, calculating text m
The satisfaction s of text m
mBe expressed as the vector T of text m
mWith user's request text vector T
QBetween similar value.
Step 73, calculating limit user satisfaction
The mean value of all text satisfactions that node i is communicated by letter with node j is called limit user satisfaction CE:
Wherein, N
kThe amount of text of communicating by letter with node j for node i.
Mining process to relevant information in the mail communication network in step before illustrates,
Utilize these information can realize that corporations divide.
Described limit cluster is all limits in the communication network will be divided into several corporations, and for Content of Communication, the difference between the limit of different corporations is comparatively obvious, and the limit in the same corporations should be comparatively approaching.The purpose of cluster operation while doing in the communication network is quick lock in user's request scope.The implementation method of described limit cluster operation has multiple, as stratification, partitioning, based on computational methods of grid etc., can adopt the k-means method in the present embodiment.The concrete steps that the k-means method that is adopted in the present embodiment is done the limit cluster operation describe below.
Step 81, determining the number of the corporations that will generate by the limit cluster, is n with this number indicia;
Step 82, be that each corporations generate initial cores separately;
Step 83, for every in communication network limit, calculate the similarity between the initial cores in itself and each corporations successively;
Step 84, according to the result of calculation of step 83, the limit in the communication network is added in the corporations with the initial cores place of its similarity maximum;
Step 85, adjustment cluster centre; In this step, described adjustment cluster centre can adopt the mean value such as each member in the compute classes, with described mean value as common method in the new prior aries such as cluster centre;
Step 86, repeated execution of steps 83-step 85, up to satisfying stop condition, this moment, resulting each corporations were exactly the limit clustering result.Related stop condition can have multiplely in this step, and as in adjusting the process of cluster centre, difference is less than a preassigned threshold value between the core of former and later two classes.
In above-mentioned steps 82, relate to the process that generates initial cores, the establishment with regard to initial cores is illustrated below.
Index 1: the similarity between the initial cores is as much as possible little, makes more possible little of similarity between the corporations at initial cores place.
Index 2: for guaranteeing the initial cores vector is not the limit that isolates, and adds up the limit number similar to it, makes it greater than given threshold value.
Index 3: overlapping few more good more in the limit that two selected cluster centres are relevant.
The selected process of initial cores is as follows:
Step 82-1, in similarity matrix S, if s
Ij=0, then limit i and limit j are formed to depositing in the set A;
In every group among step 82-2, the set of computations A with the class degree value of limit i
And the class degree value of limit j
Whether judge these two class degree values all greater than preassigned threshold value (as 2), have only that limit i and limit j formed to being isolated limit when described two class degree values during all less than described threshold value, will for the limit i on isolated limit and limit j formed to from set A, deleting.
Step 82-3, limit i in the set A and limit j are carried out step-by-step and operation
With satisfy the limit i of minimum value and limit j deposit in cluster centre center=(i, j) in;
Step 82-4, search with cluster centre center in the limit k of all limit similarity minimums as new cluster centre.If k does not exist, then return the cluster centre that finds, this cluster centre is exactly an initial cluster center.If it is a plurality of that k has, described k is deposited among the set center, re-execute step 82-3 then.
The process of finding the core member is as follows.
Step 91, whether be that the core member judges to the member in the corporations based on the node center degree.
Composition about the node center degree has had detailed explanation with calculating in the step 40 of preamble, therefore, do not repeated in this step.Wherein, the contact degree in the node center degree has reflected the active degree of node in network, and the contact degree of a node is very high to mean that it is likely server; What intermediary's degree was weighed is that certain special node is positioned at the degree between other node; Tightness weighed the distance of distance between a node and other node, reflected that a node arrives the speed of other all nodes.The node center degree is integrated above-mentioned three has described the degree of the middle cardiac status of node k in network.
Step 92, based on communication theme whether be that the core member judges to the incorporator.
In this step, whether be that the core member judges that multiple implementation is arranged to the incorporator, adopted the HITS algorithm in the present embodiment based on the communication theme.Described HITS algorithm is that its basic principle is according to a given general reference theme by a kind of web page interlinkage parser of the Kleinberg proposition of IBM, determines authority's page or leaf of this theme by link analysis.In conjunction with the characteristics of communication behavior self, utilize this algorithm to find that core member's process is as follows:
Step 92-1, determine to comprise the node set of HITS algorithm effect:
Step a), concentrate from the Query Result that obtains based on the user's request text and to get the highest preceding t position of rank and put into result set R
σ(being called Root Set).
Step b), to described result set R
σExpand.Described expansion is divided into two aspects, and the one, with all R
σIn the active communication node of node extend to described result set R
σIn; The 2nd, pointing to described R
σIn in the passive nodes in communication of each node, get any d node and extend to original result set R
σIn, thereby form S
σ(being called Base Set).
S set after the expansion
σCan satisfy three characteristics: S preferably
σLess relatively; S
σMiddle interdependent node is abundant; S
σThe authoritative node that comprises most most worthies.
The center weight of step 92-2, computing node and authoritative weight.
The node set S that will have the communications and liaison relation
σBe expressed as a directed graph, (p, q) expression node p and node q communicate directed edge.A good Centroid (hub) points to many good authoritative node (authorities), and a good authoritative node (authority) also has a plurality of good Centroids (hubs) to point to it simultaneously.For any node p, the authorityweight (authoritative weight) of A (p) expression node p, the hub weight (center weight) of H (p) expression node p, satisfy normalization condition:
Kleinberg is divided into dual mode with the transmission of node weights, i.e. I operation and O operation:
I is operating as the transmission of Centroid to authoritative node, is expressed as:
Q={q| (p, q) ∈ E} wherein;
O is operating as the transmission of authoritative node to Centroid, is expressed as:
Wherein (p, q) ∈ E} can obtain the final weight of all nodes to Q={q| by interative computation.
After step 93, all nodes all have a centrad and node weights, take all factors into consideration the value of two aspects, according to descending, the forward node of overall ranking is exactly the core member with it.
After previous step finds core member in the corporations, serve as that the expansion to member in the corporations is realized on the basis with these core members.Described member's expansion can be by judging whether a member and core member belong to same corporations and realize.Connect tight relatively the node that belongs to same corporations, the node outside corporations.In like manner, for the information that node i has, the amount of information that obtains in the same corporations is greater than and obtains amount of information outside the corporations.Therefore, the amount of information that obtains according to each node in the information communication process can judge whether two nodes belong to same corporations.With reference to figure 3, detailed process is as follows:
Step 101, get m and the node i beeline is formed set of node { v greater than 2 node
1, v
2..., v
m; Write down the number of times that belongs to same corporations with node i with variable fnum, the initial value of this variable is 0.
Step 102, from the set of node that step 101 is produced, choose a undressed subclass, judge whether node and the node i in this node subclass belongs to same corporations; This step comprises:
Step 102-1, in the node subclass, choose a node j, choose aforesaid node i in addition as source node, initialization M as terminal note
i=1, M
j=0, set { M
k=0}; M wherein
i, M
j, M
kBy representing node i, the useful amount of information of j, k respectively; K ∈ 1,2 ..., n_node} and k ≠ i, j; N_node is the node number in the network;
Step 102-2, upgrade M successively by ascending order
kValue, the value of information of node k is as follows:
Wherein, when having the limit to connect between node i and the node j, e
Ij=1; E when not having the limit to connect between node i and the node j
Ij=0.w
Ij=1 is limit e
IjCorresponding weights.
Step 102-3, repeat above-mentioned step 102-2, change not obvious up to the value of information of node.
Step 102-4, select the divided information threshold value according to redundancy.
Each node all has a value of information, as long as exist the maximum difference place to scratch near position intermediate and information, just whole network can be divided into two corporations.Described herein has introduced redundancy near position intermediate.Such as, number of network node is n, if redundancy is, means that the size of corporations is roughly at 20% o'clock
Like this as long as be positioned at (0.3n, adjacent two nodes of searching difference maximum in node 0.7n).
Behind step 102-5, the selected good threshold, if the value of information of node k then belongs to same corporations with source node greater than a threshold value (being 70% in the present embodiment), corresponding fnum+1; If less than this threshold value then do not belong to same corporations.
Step 103, repeating step 102 calculate the frequency p of each node according to the fnum of each node, if frequency p greater than another threshold value (having adopted 0.6 in the present embodiment), thinks that then this node and node i belong to same corporations, otherwise then is not.
When carrying out corporations' division, can adopt partitioning and act of union.Be example in the present embodiment with the act of union, the process that corporations are divided describes.
In this step, can relate to the notion of modularity, do following explanation earlier:
Modularity: suppose that network is divided into k corporations, define the symmetrical matrix E=(e of a k * k dimension
Ij), element e wherein
IjThe limit of node that connects two different corporations in the expression network shared ratio in all limits, these two nodes lay respectively at i corporations and j corporations.Modularity represents that with Q its computing formula is as follows:
Wherein || e
2|| all element sum among the representing matrix x.
Also relate to following three kinds of data structures in this step:
(1) modularity Increment Matrix Δ Q
IjWith its each line stores is a balanced binary tree, and raft.
(2) raft H.Comprised modularity Increment Matrix Δ Q in this heap
IjIn greatest member of each row, comprise the numbering i and the j of two corporations of element correspondence simultaneously.
(3) auxiliary vectorial a
i
After related notion and data structure being done as above explanation, it is as follows to adopt act of union to do the concrete steps that corporations divide:
Step 111, communication network is divided into n corporations, each node is exactly independently corporations.At this moment, initial modularity value Q=0.Initial a
i, and intermediate variable b
IjSatisfy:
E when node i has the limit to be connected with node j wherein
Ij=1; E when not having the limit to connect between node i and the node j
Ij=0.w
IjBe limit e
IjCorresponding weights.The element of module Increment Matrix satisfies when initial:
Step 112, from raft H, select maximum Δ Q
Ij, merging corresponding i of corporations and j, the label of the corporations after mark merges is j; And update module degree increment Delta Q
Ij, raft H and auxiliary vectorial a
i:
Step 112-1, Δ Q
IjRenewal, delete the element of the capable and i of i row, upgrade the element of the capable and j row of j, thereby obtain
Δ Q is upgraded in the renewal of step 112-2, raft H at every turn
IjAfter, upgrade the greatest member of corresponding row and column in the raft.
Step 112-3, auxiliary vector upgrade:
a′
j=a
i+a
j
a′
i=0
Modularity value Q+ Δ Q after record merges simultaneously
Ij
Step 113, repeating step 112 merge end condition up to satisfying.Described merging end condition has multiple, and in one embodiment, described merging end condition all belongs in the corporations for all nodes.In another embodiment, consider that modularity Q only has a peak value, therefore can be made as after the greatest member in the modularity Increment Matrix is born by positive changing to that just can stop can be also with merging end condition.
The present invention also provides a kind of corporations of communication network to divide system, with reference to figure 4, comprising: data preprocessing module, link relation network struction module, text vector constructing module, node center degree computing module, side attribute computing module, limit cluster module, core member search module, member's expansion module and member and divide module; Wherein,
Described data preprocessing module is carried out preliminary treatment to communication data, obtains the information about communication data that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication;
Described communications and liaison relational network makes up module and creates the communications and liaison relational network that is used to reflect described communication network architecture according to resulting preliminary treatment result, obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Described text vector constructing module is according to user the query word structure demand text vector that provides and the text vector of communicating by letter;
Described node center degree computing module calculates the node center degree of each node in the described link relation network; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Described side attribute computing module calculates link relation intensity, the similarity between each internodal limit and user between each node that has link relation in the described link relation network to the satisfaction on described internodal limit;
Described limit cluster module is the cluster operation while doing in the described link relation network based on described Content of Communication, generates a plurality of corporations;
Described core member searches module and seek separately core member according to described node center degree and communication theme in described corporation;
Described member's expansion module is expanded the member in the corporations on described core member's basis;
Described member divides module the member through expansion in the described corporations is divided, and generates new corporations.
By above method and system, can realize division, thereby the member is classified according to their attribute or feature different corporations in the communication network.
It should be noted last that above embodiment is only unrestricted in order to technical scheme of the present invention to be described.Although the present invention is had been described in detail with reference to embodiment, those of ordinary skill in the art is to be understood that, technical scheme of the present invention is made amendment or is equal to replacement, do not break away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.
Claims (7)
1. corporations' division methods of a communication network comprises:
Step 1), communication data is carried out preliminary treatment, obtain the information that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication about communication data;
Step 2), create the communications and liaison relational network that is used to reflect described communication network architecture according to the resulting preliminary treatment result of step 1), obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Step 3), the query word structure demand text vector that provides according to the user and the text vector of communicating by letter;
The node center degree of each node in step 4), the described communications and liaison relational network of calculating; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Step 5), calculate communications and liaison relationship strength, the similarity between each internodal limit and user between each node that has link relation in the described communications and liaison relational network to the satisfaction on described internodal limit;
Step 6), be cluster operation in the described communications and liaison relational network, generate a plurality of corporations while doing based on described Content of Communication;
Step 7), in described corporations, seek separately core member according to described node center degree and communication theme;
Step 8), on described core member's basis, the member in the corporations is expanded;
Step 9), the member that process in the described corporations is expanded divide, and generate new corporations.
2. corporations' division methods of communication network according to claim 1 is characterized in that, described step 6) comprises:
Step 6-1), determine the number of the corporations that the limit cluster will generate;
Step 6-2), be each corporations' generation initial cores separately;
Step 6-3), for every in communication network limit, calculate the similarity between the initial cores in itself and described each corporations successively;
Step 6-4), according to step 6-3) result of calculation, the limit in the described communication network is added in the corporations with the initial cores place of its similarity maximum;
Step 6-5), adjust the cluster centre of described each corporations;
Step 6-6), repeated execution of steps 6-3)-step 6-5), up to satisfying stop condition.
3. corporations' division methods of communication network according to claim 2 is characterized in that, described step 6-2) comprising:
Step 6-2-1), according to the similarity between described each internodal limit, if similarity s
Ij=0, then limit i and limit j are formed to depositing in the set A;
Step 6-2-2), in every group among the set of computations A with the class degree value of limit i
And the class degree value of limit j
Whether judge these two class degree values all greater than preassigned threshold value, have only that limit i and limit j formed to being isolated limit when described two class degree values during all less than described threshold value, will for the limit i on isolated limit and limit j formed to from set A, deleting;
Step 6-2-3), limit i in the set A and limit j are carried out step-by-step and operation
With satisfy the limit i of minimum value and limit j deposit in cluster centre center=(i, j) in;
Step 6-2-4), search with cluster centre center in the limit k of all limit similarity minimums as new cluster centre, if k does not exist, then return the cluster centre that finds, this cluster centre is exactly an initial cluster center; If it is a plurality of that k has, described k is deposited among the set center, re-execute step 6-2-3 then).
4. corporations' division methods of communication network according to claim 1 is characterized in that, described step 7) comprises:
Step 7-1), be each member's computing node centrad in the corporations;
Step 7-2), theme as member's computing node weight in the corporations based on communication;
Step 7-3), node is sorted, obtain the core member according to ranking results by described node center degree and described node weights.
5. corporations' division methods of communication network according to claim 1 is characterized in that, described step 8) comprises:
Step 8-1), get m and node i beeline and form set of node { v greater than 2 node
1, v
2..., v
m; The number of times that belongs to same corporations with variable fnum record and node i;
Step 8-2), from the set of node that previous step is produced, choose a undressed subclass, judge whether node and the node i in this node subclass belongs to same corporations;
Step 8-3), repeating step 8-2), the frequency p according to the fnum of each node calculates each node if frequency p, thinks then that this node and node i belong to same corporations greater than another threshold value, otherwise then is not.
6. corporations' division methods of communication network according to claim 1 is characterized in that, described step 9) comprises:
Step 9-1), communication network is divided into n corporations, each node is exactly independently corporations; Wherein, the initial modularity value Q=0 that is used for the representation module degree, initial auxiliary vectorial a
iAnd intermediate variable b
IjSatisfy:
E when node i has the limit to be connected with node j wherein
Ij=1; E when not having the limit to connect between node i and the node j
Ij=0; w
IjBe limit e
IjCorresponding weights; The element Δ Q of module Increment Matrix
IjWhen initial, satisfy:
Step 9-2), from raft H, select maximum Δ Q
Ij, merging corresponding i of corporations and j, the label of the corporations after mark merges is j; And renewal Δ Q
Ij, raft H and auxiliary vectorial a
i: this step comprises:
Step 9-2-1), Δ Q
IjRenewal, delete the element of the capable and i of i row, upgrade the element of the capable and j row of j, thereby obtain
Step 9-2-2), the renewal of raft H, upgrade Δ Q at every turn
IjAfter, upgrade the greatest member of corresponding row and column in the raft;
Step 9-2-3), auxiliary vector upgrades:
a′
j=a
i+a
j
a′
i=0
Modularity value Q+ Δ Q after record merges simultaneously
Ij
Step 9-3), repeating step 9-2) merge end condition up to satisfying.
7. the corporations of a communication network divide system, it is characterized in that, comprising: data preprocessing module, communications and liaison relational network structure module, text vector constructing module, node center degree computing module, side attribute computing module, limit cluster module, core member search module, member's expansion module and member and divide module; Wherein,
Described data preprocessing module is carried out preliminary treatment to communication data, obtains the information about communication data that comprises communication data ID, caller information, recipient's information, call duration time, Content of Communication;
Described communications and liaison relational network makes up module and creates the communications and liaison relational network that is used to reflect described communication network architecture according to resulting preliminary treatment result, obtain being used for representing the sender of communications of described communication network, communication receiver's node by described communications and liaison relational network, and the limit that is used to represent correspondence between described sender of communications, communication receiver;
Described text vector constructing module is according to user the query word structure demand text vector that provides and the text vector of communicating by letter;
Described node center degree computing module calculates the node center degree of each node in the described link relation network; Described node center degree comprises node intermediary degree, node tightness and node contact degree;
Described side attribute computing module calculates link relation intensity, the similarity between each internodal limit and user between each node that has link relation in the described link relation network to the satisfaction on described internodal limit;
Described limit cluster module is the cluster operation while doing in the described link relation network based on described Content of Communication, generates a plurality of corporations;
Described core member searches module and seek separately core member according to described node center degree and communication theme in described corporation;
Described member's expansion module is expanded the member in the corporations on described core member's basis;
Described member divides module the member through expansion in the described corporations is divided, and generates new corporations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110141970.XA CN102202012B (en) | 2011-05-30 | 2011-05-30 | Group dividing method and system of communication network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110141970.XA CN102202012B (en) | 2011-05-30 | 2011-05-30 | Group dividing method and system of communication network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102202012A true CN102202012A (en) | 2011-09-28 |
CN102202012B CN102202012B (en) | 2015-01-14 |
Family
ID=44662413
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110141970.XA Expired - Fee Related CN102202012B (en) | 2011-05-30 | 2011-05-30 | Group dividing method and system of communication network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102202012B (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102509248A (en) * | 2011-11-22 | 2012-06-20 | 昆明理工大学 | Multi-objective categorization method for travel grouping and system adopting same |
CN103226577A (en) * | 2013-04-01 | 2013-07-31 | 儒豹(苏州)科技有限责任公司 | News clustering method |
CN103338460A (en) * | 2013-06-17 | 2013-10-02 | 北京邮电大学 | Method for calculating centrality of nodes of dynamic network environment |
CN104394202A (en) * | 2014-11-13 | 2015-03-04 | 西安交通大学 | A node vitality quantifying method in a mobile social network |
CN105740907A (en) * | 2016-02-01 | 2016-07-06 | 石家庄铁道大学 | Local community mining method |
CN105760503A (en) * | 2016-02-23 | 2016-07-13 | 清华大学 | Method for quickly calculating graph node similarity |
CN105812280A (en) * | 2016-05-05 | 2016-07-27 | 四川九洲电器集团有限责任公司 | Classification method and electronic equipment |
CN105844577A (en) * | 2015-01-12 | 2016-08-10 | 阿里巴巴集团控股有限公司 | Relation network recognition method and device |
CN106022936A (en) * | 2016-05-25 | 2016-10-12 | 南京大学 | Influence maximization algorithm based on community structure and applicable to paper cooperation network |
CN106301868A (en) * | 2015-06-12 | 2017-01-04 | 华为技术有限公司 | The method and apparatus determining the importance of network node |
CN106789338A (en) * | 2017-01-18 | 2017-05-31 | 北京航空航天大学 | A kind of method that key person is found in the extensive social networks of dynamic |
CN107545509A (en) * | 2017-07-17 | 2018-01-05 | 西安电子科技大学 | A kind of group dividing method of more relation social networks |
CN107623594A (en) * | 2017-09-01 | 2018-01-23 | 电子科技大学 | A kind of three-dimensional level network topology method for visualizing of geographical location information constraint |
CN108989581A (en) * | 2018-09-21 | 2018-12-11 | 中国银行股份有限公司 | A kind of consumer's risk recognition methods, apparatus and system |
WO2019042060A1 (en) * | 2017-08-30 | 2019-03-07 | 腾讯科技(深圳)有限公司 | Method and apparatus for determining member role, and storage medium |
CN109543108A (en) * | 2018-11-26 | 2019-03-29 | 中国人民解放军陆军工程大学 | User role mining system facing network multi-domain information |
CN109978053A (en) * | 2019-03-25 | 2019-07-05 | 北京航空航天大学 | A kind of unmanned plane cooperative control method based on community division |
CN110083780A (en) * | 2019-04-25 | 2019-08-02 | 上海理工大学 | Personalized recommendation method based on community division in complex network model |
CN110213164A (en) * | 2019-05-21 | 2019-09-06 | 南瑞集团有限公司 | A kind of method and device of the identification network key disseminator based on topology information fusion |
CN110825935A (en) * | 2019-09-26 | 2020-02-21 | 福建新大陆软件工程有限公司 | Community core character mining method, system, electronic equipment and readable storage medium |
WO2020062450A1 (en) * | 2018-09-28 | 2020-04-02 | 苏州达家迎信息技术有限公司 | Method and apparatus for determining central vertex in social network, and device and storage medium |
CN111104722A (en) * | 2018-10-10 | 2020-05-05 | 华北电力大学(保定) | Electric power communication network modeling method considering overlapping communities |
CN112699108A (en) * | 2020-12-25 | 2021-04-23 | 中科恒运股份有限公司 | Data reconstruction method and device for marital registration system and terminal equipment |
CN114118094A (en) * | 2021-11-12 | 2022-03-01 | 国网天津市电力公司 | Semantic community discovery method based on non-negative matrix factorization |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008120072A1 (en) * | 2007-04-03 | 2008-10-09 | Fernando Luege Mateos | Method and system of classifying, ranking and relating information based on networks |
CN101321190A (en) * | 2008-07-04 | 2008-12-10 | 清华大学 | Recommend method and recommend system of heterogeneous network |
CN101408901A (en) * | 2008-11-26 | 2009-04-15 | 东北大学 | Probability clustering method of cross-categorical data based on key word |
CN101430708A (en) * | 2008-11-21 | 2009-05-13 | 哈尔滨工业大学深圳研究生院 | Blog hierarchy classification tree construction method based on label clustering |
-
2011
- 2011-05-30 CN CN201110141970.XA patent/CN102202012B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008120072A1 (en) * | 2007-04-03 | 2008-10-09 | Fernando Luege Mateos | Method and system of classifying, ranking and relating information based on networks |
CN101321190A (en) * | 2008-07-04 | 2008-12-10 | 清华大学 | Recommend method and recommend system of heterogeneous network |
CN101430708A (en) * | 2008-11-21 | 2009-05-13 | 哈尔滨工业大学深圳研究生院 | Blog hierarchy classification tree construction method based on label clustering |
CN101408901A (en) * | 2008-11-26 | 2009-04-15 | 东北大学 | Probability clustering method of cross-categorical data based on key word |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102509248A (en) * | 2011-11-22 | 2012-06-20 | 昆明理工大学 | Multi-objective categorization method for travel grouping and system adopting same |
CN102509248B (en) * | 2011-11-22 | 2016-03-30 | 昆明理工大学 | A kind ofly be applied to the travel Multi-Target Classification Method of forming a team and system |
CN103226577A (en) * | 2013-04-01 | 2013-07-31 | 儒豹(苏州)科技有限责任公司 | News clustering method |
CN103338460A (en) * | 2013-06-17 | 2013-10-02 | 北京邮电大学 | Method for calculating centrality of nodes of dynamic network environment |
CN103338460B (en) * | 2013-06-17 | 2016-03-30 | 北京邮电大学 | For the computational methods of the node center degree of dynamic network environment |
CN104394202A (en) * | 2014-11-13 | 2015-03-04 | 西安交通大学 | A node vitality quantifying method in a mobile social network |
CN104394202B (en) * | 2014-11-13 | 2018-01-05 | 西安交通大学 | A kind of node liveness quantization method in mobile community network |
CN105844577A (en) * | 2015-01-12 | 2016-08-10 | 阿里巴巴集团控股有限公司 | Relation network recognition method and device |
CN106301868A (en) * | 2015-06-12 | 2017-01-04 | 华为技术有限公司 | The method and apparatus determining the importance of network node |
CN106301868B (en) * | 2015-06-12 | 2019-08-20 | 华为技术有限公司 | The method and apparatus for determining the importance of network node |
CN105740907A (en) * | 2016-02-01 | 2016-07-06 | 石家庄铁道大学 | Local community mining method |
CN105760503B (en) * | 2016-02-23 | 2019-02-05 | 清华大学 | A kind of method of quick calculating node of graph similarity |
CN105760503A (en) * | 2016-02-23 | 2016-07-13 | 清华大学 | Method for quickly calculating graph node similarity |
CN105812280A (en) * | 2016-05-05 | 2016-07-27 | 四川九洲电器集团有限责任公司 | Classification method and electronic equipment |
CN105812280B (en) * | 2016-05-05 | 2019-06-04 | 四川九洲电器集团有限责任公司 | A kind of classification method and electronic equipment |
CN106022936B (en) * | 2016-05-25 | 2020-03-20 | 南京大学 | Community structure-based influence maximization algorithm applicable to thesis cooperative network |
CN106022936A (en) * | 2016-05-25 | 2016-10-12 | 南京大学 | Influence maximization algorithm based on community structure and applicable to paper cooperation network |
CN106789338A (en) * | 2017-01-18 | 2017-05-31 | 北京航空航天大学 | A kind of method that key person is found in the extensive social networks of dynamic |
CN106789338B (en) * | 2017-01-18 | 2020-10-30 | 北京航空航天大学 | Method for discovering key people in dynamic large-scale social network |
CN107545509A (en) * | 2017-07-17 | 2018-01-05 | 西安电子科技大学 | A kind of group dividing method of more relation social networks |
CN110020341A (en) * | 2017-08-30 | 2019-07-16 | 腾讯科技(深圳)有限公司 | Member role determines method, apparatus and storage medium |
CN110020341B (en) * | 2017-08-30 | 2022-09-16 | 腾讯科技(深圳)有限公司 | Member role determination method, device and storage medium |
WO2019042060A1 (en) * | 2017-08-30 | 2019-03-07 | 腾讯科技(深圳)有限公司 | Method and apparatus for determining member role, and storage medium |
CN107623594A (en) * | 2017-09-01 | 2018-01-23 | 电子科技大学 | A kind of three-dimensional level network topology method for visualizing of geographical location information constraint |
CN108989581B (en) * | 2018-09-21 | 2022-03-22 | 中国银行股份有限公司 | User risk identification method, device and system |
CN108989581A (en) * | 2018-09-21 | 2018-12-11 | 中国银行股份有限公司 | A kind of consumer's risk recognition methods, apparatus and system |
WO2020062450A1 (en) * | 2018-09-28 | 2020-04-02 | 苏州达家迎信息技术有限公司 | Method and apparatus for determining central vertex in social network, and device and storage medium |
US11487818B2 (en) | 2018-09-28 | 2022-11-01 | Suzhou Dajiaying Information Technology Co., Ltd | Method, apparatus, device and storage medium for determining a central vertex in a social network |
CN111104722A (en) * | 2018-10-10 | 2020-05-05 | 华北电力大学(保定) | Electric power communication network modeling method considering overlapping communities |
CN109543108A (en) * | 2018-11-26 | 2019-03-29 | 中国人民解放军陆军工程大学 | User role mining system facing network multi-domain information |
CN109978053A (en) * | 2019-03-25 | 2019-07-05 | 北京航空航天大学 | A kind of unmanned plane cooperative control method based on community division |
CN110083780B (en) * | 2019-04-25 | 2023-07-21 | 上海理工大学 | Community based on complex network model partitioned personalized recommendation method |
CN110083780A (en) * | 2019-04-25 | 2019-08-02 | 上海理工大学 | Personalized recommendation method based on community division in complex network model |
CN110213164A (en) * | 2019-05-21 | 2019-09-06 | 南瑞集团有限公司 | A kind of method and device of the identification network key disseminator based on topology information fusion |
CN110213164B (en) * | 2019-05-21 | 2021-06-08 | 南瑞集团有限公司 | Method and device for identifying network key propagator based on topology information fusion |
CN110825935A (en) * | 2019-09-26 | 2020-02-21 | 福建新大陆软件工程有限公司 | Community core character mining method, system, electronic equipment and readable storage medium |
CN112699108A (en) * | 2020-12-25 | 2021-04-23 | 中科恒运股份有限公司 | Data reconstruction method and device for marital registration system and terminal equipment |
CN114118094A (en) * | 2021-11-12 | 2022-03-01 | 国网天津市电力公司 | Semantic community discovery method based on non-negative matrix factorization |
CN114118094B (en) * | 2021-11-12 | 2024-05-24 | 国网天津市电力公司 | Semantic community discovery method based on nonnegative matrix factorization |
Also Published As
Publication number | Publication date |
---|---|
CN102202012B (en) | 2015-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102202012B (en) | Group dividing method and system of communication network | |
Hammouda et al. | Hierarchically distributed peer-to-peer document clustering and cluster summarization | |
CN104598588B (en) | Microblog users label automatic generating calculation based on double focusing class | |
CN105045875B (en) | Personalized search and device | |
CN106503148B (en) | A kind of table entity link method based on multiple knowledge base | |
Qiao et al. | Top-k nearest keyword search on large graphs | |
US20080097994A1 (en) | Method of extracting community and system for the same | |
CN106484764A (en) | User's similarity calculating method based on crowd portrayal technology | |
CN104995870A (en) | Multi-objective server placement determination | |
CN110147421B (en) | Target entity linking method, device, equipment and storage medium | |
CN110209808A (en) | A kind of event generation method and relevant apparatus based on text information | |
CN102646122B (en) | Automatic building method of academic social network | |
CN103020302A (en) | Academic core author excavation and related information extraction method and system based on complex network | |
CN105808590A (en) | Search engine realization method as well as search method and apparatus | |
CN110502509A (en) | A kind of traffic big data cleaning method and relevant apparatus based on Hadoop Yu Spark frame | |
CN104298776A (en) | LDA model-based search engine result optimization system | |
CN107153687B (en) | Indexing method for social network text data | |
CN105721279A (en) | Relationship circle excavation method and system of telecommunication network users | |
CN105404619A (en) | Similarity based semantic Web service clustering labeling method | |
CN112084781B (en) | Standard term determining method, device and storage medium | |
CN104699817A (en) | Search engine ordering method and search engine ordering system based on improved spectral clusters | |
CN111680498B (en) | Entity disambiguation method, device, storage medium and computer equipment | |
Langville et al. | The use of linear algebra by web search engines | |
Zhang et al. | Co-ranking multiple entities in a heterogeneous network: Integrating temporal factor and users’ bookmarks | |
CN116450938A (en) | Work order recommendation realization method and system based on map |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20150114 Termination date: 20160530 |