CN105740447A

CN105740447A - Local diffusion updating algorithm for graph structure big data

Info

Publication number: CN105740447A
Application number: CN201610075075.5A
Authority: CN
Inventors: 胡自权
Original assignee: Sichuan Medical University
Current assignee: Sichuan Medical University; Southwest Medical University
Priority date: 2016-02-02
Filing date: 2016-02-02
Publication date: 2016-07-06

Abstract

The invention discloses a local diffusion updating algorithm for graph structure big data. The algorithm comprises the steps of decomposing and describing the graph structure big data as graph structure big data entities and entity relationships; storing big data entity identifiers and the entity relationships by adopting a distributed database HBase table; and buffering the adjacent entities by virtue of a queue, controlling an update cycle through an update depth, and updating the attributes of the adjacent entities by taking a to-be-updated starting entity as a center. Compared with a depth-first search and breadth-first search based local updating algorithm, the local diffusion updating algorithm is high in updating speed and small in occupied space, supports local data updating of big data of directed graphs and undirected graphs, and easily adds, deletes and maintains the entity relationships.

Description

The big data local diffusion update algorithm of graph structure

Technical field

The present invention relates to a kind of data processing method, particularly relate to a kind of big data local diffusion update algorithm of graph structure.

Background technology

In this patent, graph structure refers to the figure in data structure, has multiple predecessor node (father node) and multiple descendant node (child node), or the contact of node is any.Node is made up of data field, memory node data, predecessor node and descendant node pointer (address).Applying in (such as fields such as information pushing, advertisement putting, the marketings) based on graph structure, it is necessary to often update node data, data update extremely important in graph structure data.Update node data, it is necessary to traversing graph is to being intended to more new node.Traversing and be intended to the key technology that more new node is data local updating, the method for traversing graph has Depth Priority Searching and BFS method.Correspondingly, the local updating of figure can adopt based on Depth Priority Searching and BFS method.

Depth Priority Searching technical essential: certain summit from figure, accesses this summit, is not accessed for adjacent vertex depth-first traversal figure from this summit successively, until the summits that in figure, all and this summit communicates all are accessed to；If figure still there being summit be not accessed, then choose not accessed summit in figure and, as starting point, repeat said process, till all summits are all accessed in figure.

BFS method and technology main points: certain summit from figure, access this summit, are not accessed for adjacent vertex breadth first traversal figure from this summit successively, until the summits that in figure, all and this summit communicates all are accessed to；If figure still there being summit be not accessed, then choose not accessed summit in figure and, as starting point, repeat said process, till all summits are all accessed in figure.

Data local updating adopts BFS method or Depth Priority Searching to reach to be intended to more new node (summit in figure), no matter it is BFS method or Depth Priority Searching, recursive algorithm will be used, return function address is pressed in storehouse.In big data local data more new opplication (such as fields such as information pushing, advertisement putting, the marketings), it is likely that cause stack overflow, or be not even likely to local data's renewal.

Summary of the invention

It is desirable to provide a kind of big data local diffusion update algorithm of graph structure, by being big data entity and inter-entity contact by big for graph structure data decomposition description；Adopt distributed data base HBase table storage entity mark and inter-entity contact；By buffer queue adjacent entities, by updating severity control renewal circulation, to be intended to centered by renewal beginning entity updating adjacent entities attribute, to complete data renewal.Local of the present invention is all entities to be intended to update centered by renewal beginning entity depth bounds.

For achieving the above object, the technical solution used in the present invention is as follows:

The big data local diffusion update algorithm of graph structure disclosed by the invention, comprises the following steps:

The big data of step 1. exploded view structure: big for graph structure data are decomposed into big data entity and inter-entity contact；

Step 2. stores data: stored big data entity and inter-entity contact by application distribution data base's HBase table；

Step 3. initiation parameter: include desire and update beginning entity identification, be intended to more new data and the row member in updating row race, the renewal degree of depth and queue；Arranging the current degree of depth that updates is 0, and queue is empty；

Step 4. is according to being intended to update beginning entity identification, and search desire updates and starts entity, is intended to renewal and starts in entity insertion queue；

The current degree of depth that updates reaches the initialized renewal degree of depth of step 3 and queue for terminating time empty to update；

Step 5. updates the row member updating row race in the HBase table that head of the queue entity is corresponding；

When the adjacent entities degree of depth that head of the queue entity has not visited forwards step 7 to more than when updating the degree of depth；

Step 6. adds the adjacent entities having not visited of head of the queue entity in queue；

Step 7. record that head of the queue entity is accessed and head of the queue entity currently update the degree of depth, delete head of the queue entity elements；

If same layer adjacent entities has not updated, there is following two processing method, two kinds of optional one of processing method: the first, it is judged that more than the initialized renewal degree of depth of step 3 and queue, the current degree of depth that updates is whether sky is set up, set up and terminate to update, otherwise forward step 5 to；The second, forwards step 5 to；

Step 8. currently updates the degree of depth and adds 1；

When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update, it is false, forwards step 5 to.

Further, described step 1 adopts entity attribute reflection inter-entity contact, use entity identification to distinguish different big data entity.

Further, the mark that can distinguish each big data entity in the overall situation is defined as by described step 1 entity identification, described entity identification and big data entity one_to_one corresponding.

Further, described entity attribute includes contact attribute and Update attribute, and the data or the data field that are intended to renewal are defined as Update attribute, and inter-entity contact is defined as contact attribute, namely it is intended to the data or the data field that update with Update attribute reflection, reflects inter-entity contact by contact attribute.

Further, described entity attribute also includes other attribute, and other attribute is the attribute beyond contact attribute, entity identification, Update attribute.Definition herein makes transformational relation or data structure more succinct, makes Update attribute consistent with the row member of HBase table with contact attribute simultaneously, belongs to transitional term.

Further, in described step 2, described entity identification is converted into the row keyword of record；Described contact attribute, Update attribute and other attribute are separately converted to the contact row race of HBase table, update row race and other row race；Described more new data or the territory of being intended to is stored in the respective column member updating row race.

The big data local diffusion update algorithm of graph structure disclosed by the invention has the feature that

First, adopt distributed data base HBase (HBigdataEntity table) to store contact between the big data entity of graph structure and big data entity, HBase database storage capacity is infinitely great in theory, supports the storage of the big data of graph structure of magnanimity.

Second, the local maxima in local updating ranges for the subgraph of given figure, and different subgraphs belong to different local.The time loss of local updating essentially consists in graph traversal, traverses desire renewal and starts to update all entities in subrange in entity local.The Algorithms T-cbmplexity of the present invention is O (N), N is start, to be intended to update, subgraph entity (summit) quantity that entity is constituted.If will extend partially into whole figure rather than subgraph when calculating time complexity, this situation is not belonging in scope (being that all entities update rather than local updating).Adopting the local updating Algorithms T-cbmplexity based on depth-first search and breadth-first search is O (N²), namely update local and begin stepping through the farthest subgraph in summit for distance.The present invention is than the local updating algorithm based on Depth Priority Algorithm and BFS faster.

3rd, based on the local updating algorithm of depth-first search and breadth-first search, because depth-first search or breadth-first search adopt storehouse to preserve return address, under big data environment, possibly cannot complete, and there is risk.The present invention adopts the information such as queue buffer entity mark, and queue greatest length is that same layer updates degree of depth adjacent entities maximum amount of data.The space complexity of this algorithm is also much smaller than the local updating algorithm based on depth-first search or breadth-first search, and can be applicable to the local updating of big data.

4th, the present invention supports that directed graph and non-directed graph local data update.For non-directed graph, arrange the corresponding adjacent entities of allocated column member's storage entity in race in the contact of HBigdataEntity table.For directed graph, arranging allocated column member's storage entity arc head and arc tail two class row member in race in the contact of HBigdataEntity table, arc head row member stores the contact pointing to this entity；Arc tail row member stores the entity relationship pointed out.Owing to HBase table has well row member's autgmentability, it is easy to dynamic insertion and deleting contacts between big data entity.

5th, owing to the linear structure in data structure and tree construction are the special row of graph structure, contact the local data's renewal for linear structure and the field of tree construction so the present invention can also be applied between big data entity.

Compared with prior art, the present invention has following benefit effect:

1. big for graph structure data are converted to big data entity and inter-entity contact, entity relationship attribute expresses the netted contact of inter-entity, big data entity and inter-entity are contacted and is stored in HBigdataEntity table, have speed and less space cost faster than the local updating algorithm based on depth-first search and breadth-first search.

2. the present invention supports that the local data of directed graph and the big data of non-directed graph updates, it is easy to increases and deletes inter-entity contact, be i.e. easily contact between maintenance entity.

Accompanying drawing explanation

Fig. 1 is the flow chart of embodiment 1；

Fig. 2 is the flow chart of embodiment 2.

Detailed description of the invention

In order to make the purpose of the present invention, technical scheme and advantage clearly understand, below in conjunction with accompanying drawing, the present invention is further elaborated.

Embodiment 1

As it is shown in figure 1, the big data local diffusion update algorithm of graph structure disclosed in the present embodiment, comprise the following steps:

The big data of step 1. exploded view structure: big for graph structure data are decomposed into big data entity and inter-entity contact.Adopt entity attribute reflection inter-entity contact, use entity identification to distinguish different big data entity.The mark that can distinguish each big data entity in the overall situation is defined as entity identification, and entity identification and big data entity have one-to-one relationship.According to entity attribute function, entity attribute includes contact attribute, Update attribute and other attribute, wherein other attribute is for describing other character or feature or the characteristic etc. of big data entity, and in the present embodiment, other attribute is the attribute beyond contact attribute, entity identification, Update attribute.The data (territory) definition (tissue) being intended to update are Update attribute；Between the big data entity in big data, contact is defined as contact attribute.Contact attribute reflection figure the netted contact of inter-entity, great majority be create entity record time determine that constant because the contact of inter-entity is fixing, but also there is a need to amendment situation.

Step 1. completes graph structure big data problem territory definition, and step 2 completes the storage of the big data of graph structure.

Step 2. stores data: stored big data entity and inter-entity contact by application distribution data base's HBase table.The netted contact of big data entity and inter-entity is stored to store the big data of graph structure by application distribution data base's HBase table.Big data entity is further converted to the record of HBase table；Entity identification is converted into the row keyword of record；Entity relationship attribute, Update attribute and other attribute are separately converted to the contact row race of HBase table, update row race and other row race.It is stored in the respective column member that contact arranges race by describing the netted contact attribute of inter-entity, forerunner's entity of this entity stores (supporting multiple forerunner's entity) with forerunner row member, and follow-up entity stores (supporting multiple follow-up entity) with follow-up member；It is intended to more new data (territory) be stored in the respective column member updating row race (supporting that multiple data update simultaneously).For the ease of describing below, HBase table is defined as HBigdataEntity table by this patent, sets up HBigdataEntity table according to above analysis, and this table is used for storing contact and arranges race, updates row race and other row race, and row keyword is entity identification.

Step 2 completes the big data storage of graph structure, and step 1 and 2 is the basis of the application big data local updating algorithm of graph structure of the present invention, relates to the definition of following term: is intended to renewal and starts entity, updates the degree of depth and queue.It is intended to update and starts entity for input parameter, be first entity starting of local updating.Updating the degree of depth for input parameter, distance is intended to update the depth capacity starting entity, and being intended to update the degree of depth starting entity is 0, is counted as 1,2,3 to adjacent entities diffusion depth successively ....Queue is the data structure arriving first first to go out, and first element in queue is called header element, and correspondingly, in queue, last element is to tail element.Queue includes deleting header element and inserting tail element operation, adopts queue storage local updating entity relevant information in the present invention, for instance whether entity identification, entity are accessed mark and currently update the degree of depth.

Following steps 3～8 complete the big data local diffusion of graph structure and update:

Step 3. initiation parameter: include desire and update beginning entity identification, be intended to more new data and the row member in updating row race, the renewal degree of depth and queue；Arranging the current degree of depth that updates is 0, and queue is empty.

Step 4. starts entity identification according to being intended to renewal, and search desire starts more novel entities, is intended to renewal and starts in entity insertion queue,

When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update.

Step 5. updates the row member updating row race in the HBigdataEntity table that head of the queue entity is corresponding, and the data namely completing entity update.

Step 7 is forwarded to when the adjacent entities degree of depth that head of the queue entity has not visited reaches to update the degree of depth.

Step 6. judges that whether the adjacent entities degree of depth that head of the queue entity has not visited is more than updating the degree of depth, if the adjacent entities having not visited of head of the queue entity of being false in queue to add.

Step 7. record that head of the queue entity is accessed and head of the queue entity currently update the degree of depth, delete head of the queue entity elements.

If same layer adjacent entities has not updated, then judge more than the initialized renewal degree of depth of step 3 and queue, the current degree of depth that updates is whether sky is set up, terminating to update if setting up, being false, forwarding step 5 to.

Step 8. currently updates the degree of depth and adds 1；When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update, otherwise forward step 5 to.

Embodiment 2

The present embodiment and embodiment 1 are distinctive in that: after step 7, if same layer adjacent entities has not updated, then forward step 5 to.

Big for graph structure data decomposition description is contact between the big data entity of graph structure, entity identification and big data entity by the present invention；Adopt HBigdataEntity table to store big data entity mark and between (for row keyword), big data entity, contact (for contact row race member), Update attribute (for updating row race member) and other attribute；By buffer queue adjacent entities by more novel entities severity control loop parameter, to be intended to update renewal adjacent entities attribute centered by beginning entity.Present invention can apply in the fields such as information pushing, advertisement putting, the marketing, it is possible to be applied in linear structure and graph structure big data local updating field, applied range.

Certainly; the present invention also can have other various embodiments; when without departing substantially from present invention spirit and essence thereof; those of ordinary skill in the art can make various corresponding change and deformation according to the present invention, but these change accordingly and deformation all should belong to the scope of the claims appended by the present invention.

Claims

1. the big data local diffusion update algorithm of graph structure, it is characterised in that: comprise the following steps:

The big data of step 1. exploded view structure: by big for graph structure data decomposition description be big data entity and inter-entity contact；

Step 3. initiation parameter: include desire and update beginning entity identification, be intended to more new data and the row member in updating row race, the renewal degree of depth and queue；The current degree of depth that updates is 0, and queue is empty；

When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update；

Step 5. updates the row member updated in the HBase table that head of the queue entity is corresponding in row race；

When the adjacent entities degree of depth that head of the queue entity has not visited is more than when updating the degree of depth, forward step 7 to；

If same layer adjacent entities has not updated following two processing method, two kinds of processing methods.Optional one:

The first, it is judged that more than the initialized renewal degree of depth of step 3 and queue, the current degree of depth that updates is whether sky is set up, set up and terminate to update, otherwise forward step 5 to；

The second is to forward step 5 to；

Step 8. currently updates the degree of depth and adds 1；

When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update, otherwise forward rapid 5 to.

2. the big data local diffusion update algorithm of graph structure according to claim 1, it is characterised in that: described step 1 adopts entity attribute reflection inter-entity contact, uses entity identification to distinguish different big data entity.

3. the big data local diffusion update algorithm of graph structure according to claim 2, it is characterised in that: the mark that can distinguish each big data entity in the overall situation is defined as entity identification, described entity identification and big data entity one_to_one corresponding by described step 1.

4. the big data local diffusion update algorithm of graph structure according to claim 2, it is characterized in that: described entity attribute includes contact attribute and Update attribute, the data or the data field that are intended to renewal are defined as Update attribute, and contact between big data entity is defined as contact attribute.

5. the big data local diffusion update algorithm of graph structure according to claim 4, it is characterised in that: described entity attribute also includes other attribute, and other attribute is the attribute beyond contact attribute, entity identification, Update attribute.

6. the big data local diffusion update algorithm of graph structure according to claim 5, it is characterised in that: in described step 2, described entity identification is converted into the row keyword of record；Described contact attribute, Update attribute and other attribute are separately converted to the contact row race of HBase table, update row race and other row race；Described more new data or the data field of being intended to is stored in the respective column member updating row race.