CN104598927A - Large-scale graph partitioning method and system - Google Patents

Large-scale graph partitioning method and system Download PDF

Info

Publication number
CN104598927A
CN104598927A CN201510047749.6A CN201510047749A CN104598927A CN 104598927 A CN104598927 A CN 104598927A CN 201510047749 A CN201510047749 A CN 201510047749A CN 104598927 A CN104598927 A CN 104598927A
Authority
CN
China
Prior art keywords
node
limit
large scale
shortest path
scale graphs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510047749.6A
Other languages
Chinese (zh)
Inventor
刘志超
李红娜
宁立
张涌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201510047749.6A priority Critical patent/CN104598927A/en
Publication of CN104598927A publication Critical patent/CN104598927A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention is applicable to the technical field of network and provides a large-scale graph partitioning method and system. The large-scale graph partitioning method comprises inputting a large-scale graph; calculating a shortest path between nodes in the large-scale graph and setting flag values for edges between the nodes; random sampling the shortest path; partitioning the large-scale graph based on the shortest path for the random sampling and deleting the edge if the flag value of the edge between the nodes is larger than a preset parameter value after partitioning. The large-scale graph partitioning method can effectively improve the partitioning efficiency of the large-scale graph.

Description

A kind of Large Scale Graphs dividing method and system
Technical field
The invention belongs to networking technology area, particularly relate to a kind of Large Scale Graphs dividing method and system.
Background technology
Figure segmentation refers to the independent group node of figure being divided into user's specified quantity, for optimizing the standard relevant to cutting limit.
The method of figure segmentation mainly concentrates on the community structure finding the overall situation in complex network, and an important prerequisite of traditional algorithm is the topological structure needing to know whole figure.But when the size of figure rises to extensive rank, new problem has occurred, such as: (1) complex network has become huge network, be difficult to satisfy the demands based on traditional Complex Networks Analysis method; (2) scale of figure becomes large gradually, All Paths is calculated one time unrealistic; (3) Large Scale Graphs interior joint Numerous, change frequently, judge whether a limit is in abundant bar shortest path very consumes resources.
Summary of the invention
Given this, the embodiment of the present invention provides a kind of Large Scale Graphs dividing method and system, to solve the problems referred to above that prior art exists.
The embodiment of the present invention is achieved in that a kind of Large Scale Graphs dividing method, and described method comprises:
Input Large Scale Graphs;
Calculate the shortest path between each node in described Large Scale Graphs, and mark value is arranged to the limit between each node;
Random sampling is carried out to described shortest path;
Shortest path based on random sampling is split described Large Scale Graphs, if the mark value on the limit of depositing among the nodes after segmentation is greater than preset parameter value, then deletes this limit.
Another object of the embodiment of the present invention is to provide a kind of Large Scale Graphs segmenting system, and described system comprises:
Large Scale Graphs input block, for inputting Large Scale Graphs;
Computing unit, for calculating the shortest path in described Large Scale Graphs between each node, and arranges mark value to the limit between each node;
Random sampling unit, for carrying out random sampling to described shortest path;
Processing unit, splits described Large Scale Graphs for the shortest path based on random sampling, if the mark value on the limit of depositing among the nodes after segmentation is greater than preset parameter value, then deletes this limit.
The beneficial effect that the embodiment of the present invention compared with prior art exists is: the embodiment of the present invention is by the shortest path between computing node, and random sampling shortest path, shortest path based on random sampling is split Large Scale Graphs, effectively can solve existing Large Scale Graphs interior joint Numerous, change is frequent, judge whether a limit is in the problem of abundant bar shortest path very consumes resources.Effectively can be improved the efficiency of Large Scale Graphs segmentation by the embodiment of the present invention, there is stronger ease for use and practicality.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the realization flow figure of the Large Scale Graphs dividing method that the embodiment of the present invention one provides;
Fig. 2 is the composition structural drawing of the Large Scale Graphs segmenting system that the embodiment of the present invention two provides.
Embodiment
In below describing, in order to illustrate instead of in order to limit, propose the detail of such as particular system structure, technology and so on, understand the embodiment of the present invention thoroughly to cut.But, it will be clear to one skilled in the art that and also can realize the present invention in other embodiment not having these details.In other situation, omit the detailed description to well-known system, device, circuit and method, in order to avoid unnecessary details hinders description of the invention.
In order to technical solutions according to the invention are described, be described below by specific embodiment.
embodiment one:
Fig. 1 shows the realization flow of the Large Scale Graphs dividing method that the embodiment of the present invention one provides, and details are as follows for the method process:
In step S101, input Large Scale Graphs.
In embodiments of the present invention, described Large Scale Graphs refers to that interstitial content is numerous, change is schemed frequently, and such as interstitial content is more than 5000, every the figure that one minute interstitial content will change.
In step s 102, calculate the shortest path between each node in described Large Scale Graphs, and mark value is arranged to the limit between each node.
Wherein, the shortest path between described calculating two node is specially:
If D_{i, j, k} be from node i to node j only with (1 ..., K) and node in set is the length of the shortest path of intermediate node;
If shortest path is through node k, then D_{i, j, k}=D_{i, k, k-1}+D_{k, j, k-1};
If shortest path is without node k, then D_{i, j, k}=D_{i, j, k-1};
Therefore, D_{i, j, k}=mbox{min} (D_{i, j, k-1}, D_{i, k, k-1}+D_{k, j, k-1}).Wherein, i, j, k be greater than zero integer.
In actual applications, in order to conserve space, directly spatially can carry out iteration original, two dimension can be down in such space.Limit mark value on the shortest path calculated is added 1, if there is limit to be in many by the shortest path of node i to node j simultaneously, then the mark value on this limit only increases by 1 time.
In step s 103, random sampling is carried out to described shortest path.
In embodiments of the present invention, because described Large Scale Graphs node is more, in order to improve the segmentation efficiency of figure, adopting the method for random sampling, selecting Partial key shortest path wherein to carry out figure segmentation.
It should be noted that, the present embodiment adopts the benefit of random sampling also to comprise: 1. uncertain: unlike enumerating normally according to certain sequence enumeration, but be difficult to avoid one on a large scale within each decision-making situation about all almost must all enumerate; 2. dirigibility: can controlling run number of times very easily.When especially " being forced to " enumerate, repeatedly random effect can be better.
In step S104, the shortest path based on random sampling is split described Large Scale Graphs, if the mark value on the limit of depositing among the nodes after segmentation is greater than preset parameter value, then deletes this limit.
Concrete can be,
Step 1: the parameter value m that m-OVER algorithm is set, described m be greater than zero integer;
Step 2: for every bar limit arranges an integer mark value, and the mark value on each limit is initialized as zero;
Step 3: Stochastic choice two node from described Large Scale Graphs, and the mark value on the limit on described two internodal all shortest paths is added 1; It should be noted that, if there is limit to be in many by the shortest path of node i to node j simultaneously, then the mark value on this limit only increases by 1 time.
Step 4: if the mark value that there is certain limit is greater than m, then delete this limit;
Step 5: after judging this edge contract, whether described Large Scale Graphs is communicated with, if be communicated with, then returns step 3; If there is multiple independent subgraph after deleting this limit, then for the independent subgraph being greater than default scale (namely fairly large), recurrence perform step 1,2,3,4 and 5 (i.e. m-OVER algorithms); For the independent subgraph being less than or equal to default scale (i.e. small-scale), using the node on this independent subgraph as a subset after described Large Scale Graphs segmentation, and export this subset.
The embodiment of the present invention adopts random walk search strategy, designs the calculating of the shortest path distance of the Large Scale Graphs based on Floyd algorithm, realizes m-OVER partitioning algorithm thought, in order to reach the object of Fast Segmentation Large Scale Graphs.
embodiment two:
Fig. 2 shows the composition structure of the Large Scale Graphs segmenting system that the embodiment of the present invention two provides, and for convenience of explanation, illustrate only the part relevant to the embodiment of the present invention.
This Large Scale Graphs segmenting system can be built in the unit that software unit, hardware cell or software and hardware in terminal device (as personal computer, notebook computer, panel computer, smart mobile phone etc.) combine, or is integrated in the application system of terminal device or terminal device as independently suspension member.
This Large Scale Graphs segmenting system comprises:
Large Scale Graphs input block 21, for inputting Large Scale Graphs;
Computing unit 22, for calculating the shortest path in described Large Scale Graphs between each node, and arranges mark value to the limit between each node;
Random sampling unit 23, for carrying out random sampling to described shortest path;
Processing unit 24, splits described Large Scale Graphs for the shortest path based on random sampling, if the mark value on the limit of depositing among the nodes after segmentation is greater than preset parameter value, then deletes this limit.
Further, described processing unit 24 comprises:
Module 241 is set, for parameters value m, described m be greater than zero integer;
Initialization module 242, for being initialized as zero by the mark value on each limit;
First processing module 243, for Stochastic choice two node from described Large Scale Graphs, and adds 1 by the mark value on the limit on described two internodal all shortest paths;
Second processing module 244, if be greater than m for the mark value that there is certain limit, then deletes this limit;
3rd processing module 245, after judging this edge contract, whether described Large Scale Graphs is communicated with, if be communicated with, then return the first processing module and continues to perform; If there is multiple independent subgraph after deleting this limit, then for the independent subgraph being greater than default scale, recurrence performs and arranges module, initialization module, the first processing module, the second processing module and the 3rd processing module; For the independent subgraph being less than or equal to default scale, using the node on this independent subgraph as a subset after described Large Scale Graphs segmentation.
Further, described computing unit 22 specifically for:
If D_{i, j, k} be from node i to node j only with (1 ..., K) and node in set is the length of the shortest path of intermediate node;
If shortest path is through node k, then D_{i, j, k}=D_{i, k, k-1}+D_{k, j, k-1};
If shortest path is without node k, then D_{i, j, k}=D_{i, j, k-1};
Wherein, i, j, k be greater than zero integer.
Those skilled in the art can be well understood to, for convenience of description and succinctly, only be illustrated with the division of above-mentioned each functional unit, module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional units, module, inner structure by described system is divided into different functional units or module, to complete all or part of function described above.Each functional unit in embodiment, module can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated, above-mentioned integrated unit, module both can adopt the form of hardware to realize, and the form of SFU software functional unit, module also can be adopted to realize.In addition, the concrete title of each functional unit, module, also just for the ease of mutual differentiation, is not limited to the protection domain of the application.The specific works process of unit in said system, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In sum, the embodiment of the present invention is by the shortest path between computing node, and random sampling shortest path, shortest path based on random sampling is split Large Scale Graphs, effectively can solve existing Large Scale Graphs interior joint Numerous, change is frequent, judge whether a limit is in the problem of abundant bar shortest path very consumes resources.The efficiency of Large Scale Graphs segmentation effectively can be improved by the embodiment of the present invention.And the embodiment of the present invention is stated in process in realization, do not need to increase extra hardware, effectively can reduce costs, there is stronger ease for use and practicality.
Those of ordinary skill in the art can recognize, in conjunction with unit and the algorithm steps of each example of embodiment disclosed herein description, can realize with the combination of electronic hardware or computer software and electronic hardware.These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can use distinct methods to realize described function to each specifically should being used for, but this realization should not thought and exceeds scope of the present invention.
In embodiment provided by the present invention, should be understood that disclosed system and method can realize by another way.Such as, system embodiment described above is only schematic, such as, the division of described unit, module, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit, module or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, it can be by some interfaces that shown or discussed coupling each other or direct-coupling or communication connect, and the indirect coupling of device or unit or communication connect, and can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form of SFU software functional unit also can be adopted to realize.
If described integrated unit using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computer read/write memory medium.Based on such understanding, the part that the technical scheme of the embodiment of the present invention contributes to prior art in essence in other words or all or part of of this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) or processor (processor) perform all or part of step of method described in each embodiment of the embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random AccessMemory), magnetic disc or CD etc. various can be program code stored medium.
The above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of each embodiment technical scheme of the embodiment of the present invention.

Claims (6)

1. a Large Scale Graphs dividing method, is characterized in that, described method comprises:
Input Large Scale Graphs;
Calculate the shortest path between each node in described Large Scale Graphs, and mark value is arranged to the limit between each node;
Random sampling is carried out to described shortest path;
Shortest path based on random sampling is split described Large Scale Graphs, if the mark value on the limit of depositing among the nodes after segmentation is greater than preset parameter value, then deletes this limit.
2. the method for claim 1, is characterized in that, the described shortest path based on random sampling is split described Large Scale Graphs, if the mark value on the limit of depositing among the nodes after segmentation is greater than preset parameter value, then deletes this limit and comprises:
Step 1: parameters value m, described m be greater than zero integer;
Step 2: the mark value on each limit is initialized as zero;
Step 3: Stochastic choice two node from described Large Scale Graphs, and the mark value on the limit on described two internodal all shortest paths is added 1;
Step 4: if the mark value that there is certain limit is greater than m, then delete this limit;
Step 5: after judging this edge contract, whether described Large Scale Graphs is communicated with, if be communicated with, then returns step 3; If there is multiple independent subgraph after deleting this limit, then for the independent subgraph being greater than default scale, recurrence performs step 1,2,3,4 and 5; For the independent subgraph being less than or equal to default scale, using the node on this independent subgraph as a subset after described Large Scale Graphs segmentation, and export this subset.
3. the method for claim 1, is characterized in that, the shortest path between described calculating two node is specially:
If D_{i, j, k} be from node i to node j only with (1 ..., K) and node in set is the length of the shortest path of intermediate node;
If shortest path is through node k, then D_{i, j, k}=D_{i, k, k-1}+D_{k, j, k-1};
If shortest path is without node k, then D_{i, j, k}=D_{i, j, k-1};
Wherein, i, j, k be greater than zero integer.
4. a Large Scale Graphs segmenting system, is characterized in that, described system comprises:
Large Scale Graphs input block, for inputting Large Scale Graphs;
Computing unit, for calculating the shortest path in described Large Scale Graphs between each node, and arranges mark value to the limit between each node;
Random sampling unit, for carrying out random sampling to described shortest path;
Processing unit, splits described Large Scale Graphs for the shortest path based on random sampling, if the mark value on the limit of depositing among the nodes after segmentation is greater than preset parameter value, then deletes this limit.
5. system as claimed in claim 4, it is characterized in that, described processing unit comprises:
Module is set, for parameters value m, described m be greater than zero integer;
Initialization module, for being initialized as zero by the mark value on each limit;
First processing module, for Stochastic choice two node from described Large Scale Graphs, and adds 1 by the mark value on the limit on described two internodal all shortest paths;
Second processing module, if be greater than m for the mark value that there is certain limit, then deletes this limit;
3rd processing module, after judging this edge contract, whether described Large Scale Graphs is communicated with, if be communicated with, then return the first processing module and continues to perform; If there is multiple independent subgraph after deleting this limit, then for the independent subgraph being greater than default scale, recurrence performs and arranges module, initialization module, the first processing module, the second processing module and the 3rd processing module; For the independent subgraph being less than or equal to default scale, using the node on this independent subgraph as a subset after described Large Scale Graphs segmentation.
6. system as claimed in claim 4, is characterized in that, described computing unit specifically for:
If D_{i, j, k} be from node i to node j only with (1 ..., K) and node in set is the length of the shortest path of intermediate node;
If shortest path is through node k, then D_{i, j, k}=D_{i, k, k-1}+D_{k, j, k-1};
If shortest path is without node k, then D_{i, j, k}=D_{i, j, k-1};
Wherein, i, j, k be greater than zero integer.
CN201510047749.6A 2015-01-29 2015-01-29 Large-scale graph partitioning method and system Pending CN104598927A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510047749.6A CN104598927A (en) 2015-01-29 2015-01-29 Large-scale graph partitioning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510047749.6A CN104598927A (en) 2015-01-29 2015-01-29 Large-scale graph partitioning method and system

Publications (1)

Publication Number Publication Date
CN104598927A true CN104598927A (en) 2015-05-06

Family

ID=53124699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510047749.6A Pending CN104598927A (en) 2015-01-29 2015-01-29 Large-scale graph partitioning method and system

Country Status (1)

Country Link
CN (1) CN104598927A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109765839A (en) * 2019-02-15 2019-05-17 中国科学院上海光学精密机械研究所 The non-intersecting random method of machining path planning of Arbitrary Boundaries optical element uniline

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383748A (en) * 2008-10-24 2009-03-11 北京航空航天大学 Community division method in complex network
CN102073700A (en) * 2010-12-30 2011-05-25 浙江大学 Discovery method of complex network community
CN102571431A (en) * 2011-12-02 2012-07-11 北京航空航天大学 Group concept-based improved Fast-Newman clustering method applied to complex network
CN102982180A (en) * 2012-12-18 2013-03-20 华为技术有限公司 Method and device for storing data
US20130103368A1 (en) * 2011-10-25 2013-04-25 John Lyn Farmer Automated Experimental Design For Polymerase Chain Reaction
US8497877B2 (en) * 2010-01-26 2013-07-30 Hon Hai Precision Industry Co., Ltd. Electronic device and method of switching display images
CN103699606A (en) * 2013-12-16 2014-04-02 华中科技大学 Large-scale graphical partition method based on vertex cut and community detection
CN103763096A (en) * 2014-01-17 2014-04-30 北京邮电大学 Random secret key allocation method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101383748A (en) * 2008-10-24 2009-03-11 北京航空航天大学 Community division method in complex network
US8497877B2 (en) * 2010-01-26 2013-07-30 Hon Hai Precision Industry Co., Ltd. Electronic device and method of switching display images
CN102073700A (en) * 2010-12-30 2011-05-25 浙江大学 Discovery method of complex network community
US20130103368A1 (en) * 2011-10-25 2013-04-25 John Lyn Farmer Automated Experimental Design For Polymerase Chain Reaction
CN102571431A (en) * 2011-12-02 2012-07-11 北京航空航天大学 Group concept-based improved Fast-Newman clustering method applied to complex network
CN102982180A (en) * 2012-12-18 2013-03-20 华为技术有限公司 Method and device for storing data
CN103699606A (en) * 2013-12-16 2014-04-02 华中科技大学 Large-scale graphical partition method based on vertex cut and community detection
CN103763096A (en) * 2014-01-17 2014-04-30 北京邮电大学 Random secret key allocation method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109765839A (en) * 2019-02-15 2019-05-17 中国科学院上海光学精密机械研究所 The non-intersecting random method of machining path planning of Arbitrary Boundaries optical element uniline
CN109765839B (en) * 2019-02-15 2021-09-07 中国科学院上海光学精密机械研究所 Planning method for single-row non-intersecting random processing path of optical element with any boundary

Similar Documents

Publication Publication Date Title
Anchalia et al. MapReduce design of K-means clustering algorithm
CN102810113B (en) A kind of mixed type clustering method for complex network
CN102799625B (en) Method and system for excavating topic core circle in social networking service
CN104995870A (en) Multi-objective server placement determination
CN105335438A (en) Local shortest loop based social network group division method
CN102194149B (en) Community discovery method
CN105893382A (en) Priori knowledge based microblog user group division method
CN102915423B (en) A kind of power business data filtering system based on rough set and gene expression and method
CN104850500A (en) Data processing method and device used for data storage
CN111723298A (en) Social network community discovery method, device and medium based on improved label propagation
Dengiz et al. Design of reliable communication networks: A hybrid ant colony optimization algorithm
CN102375905B (en) Technology mapping method for integrated circuits for improved logic cells
CN105069290A (en) Parallelization critical node discovery method for postal delivery data
Louhichi et al. Unsupervised varied density based clustering algorithm using spline
CN101894123A (en) Subgraph based link similarity quick approximate calculation system and method thereof
CN113761293A (en) Graph data strong-connectivity component mining method, device, equipment and storage medium
CN103902547A (en) Increment type dynamic cell fast finding method and system based on MDL
CN104598927A (en) Large-scale graph partitioning method and system
CN104573730A (en) Method and system for partitioning uncertain graphs on basis of decisive path weights
CN102725754B (en) Method and device for processing index data
CN111651638A (en) Method for mining cohesive subgraph in symbol network based on cluster attribute and balance theory
CN109947531A (en) The expanding storage depth method, apparatus and storage medium of super fusion all-in-one machine
CN105243238A (en) Integrated quick product iteration forming device and method
CN105159922A (en) Label propagation algorithm-based posting data-oriented parallelized community discovery method
CN104376366A (en) Method and device for selecting optimal network maximum flow algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150506