CN105740447A - Local diffusion updating algorithm for graph structure big data - Google Patents

Local diffusion updating algorithm for graph structure big data Download PDF

Info

Publication number
CN105740447A
CN105740447A CN201610075075.5A CN201610075075A CN105740447A CN 105740447 A CN105740447 A CN 105740447A CN 201610075075 A CN201610075075 A CN 201610075075A CN 105740447 A CN105740447 A CN 105740447A
Authority
CN
China
Prior art keywords
entity
big data
update
attribute
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610075075.5A
Other languages
Chinese (zh)
Inventor
胡自权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Medical University
Southwest Medical University
Original Assignee
Sichuan Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Medical University filed Critical Sichuan Medical University
Priority to CN201610075075.5A priority Critical patent/CN105740447A/en
Publication of CN105740447A publication Critical patent/CN105740447A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a local diffusion updating algorithm for graph structure big data. The algorithm comprises the steps of decomposing and describing the graph structure big data as graph structure big data entities and entity relationships; storing big data entity identifiers and the entity relationships by adopting a distributed database HBase table; and buffering the adjacent entities by virtue of a queue, controlling an update cycle through an update depth, and updating the attributes of the adjacent entities by taking a to-be-updated starting entity as a center. Compared with a depth-first search and breadth-first search based local updating algorithm, the local diffusion updating algorithm is high in updating speed and small in occupied space, supports local data updating of big data of directed graphs and undirected graphs, and easily adds, deletes and maintains the entity relationships.

Description

The big data local diffusion update algorithm of graph structure
Technical field
The present invention relates to a kind of data processing method, particularly relate to a kind of big data local diffusion update algorithm of graph structure.
Background technology
In this patent, graph structure refers to the figure in data structure, has multiple predecessor node (father node) and multiple descendant node (child node), or the contact of node is any.Node is made up of data field, memory node data, predecessor node and descendant node pointer (address).Applying in (such as fields such as information pushing, advertisement putting, the marketings) based on graph structure, it is necessary to often update node data, data update extremely important in graph structure data.Update node data, it is necessary to traversing graph is to being intended to more new node.Traversing and be intended to the key technology that more new node is data local updating, the method for traversing graph has Depth Priority Searching and BFS method.Correspondingly, the local updating of figure can adopt based on Depth Priority Searching and BFS method.
Depth Priority Searching technical essential: certain summit from figure, accesses this summit, is not accessed for adjacent vertex depth-first traversal figure from this summit successively, until the summits that in figure, all and this summit communicates all are accessed to;If figure still there being summit be not accessed, then choose not accessed summit in figure and, as starting point, repeat said process, till all summits are all accessed in figure.
BFS method and technology main points: certain summit from figure, access this summit, are not accessed for adjacent vertex breadth first traversal figure from this summit successively, until the summits that in figure, all and this summit communicates all are accessed to;If figure still there being summit be not accessed, then choose not accessed summit in figure and, as starting point, repeat said process, till all summits are all accessed in figure.
Data local updating adopts BFS method or Depth Priority Searching to reach to be intended to more new node (summit in figure), no matter it is BFS method or Depth Priority Searching, recursive algorithm will be used, return function address is pressed in storehouse.In big data local data more new opplication (such as fields such as information pushing, advertisement putting, the marketings), it is likely that cause stack overflow, or be not even likely to local data's renewal.
Summary of the invention
It is desirable to provide a kind of big data local diffusion update algorithm of graph structure, by being big data entity and inter-entity contact by big for graph structure data decomposition description;Adopt distributed data base HBase table storage entity mark and inter-entity contact;By buffer queue adjacent entities, by updating severity control renewal circulation, to be intended to centered by renewal beginning entity updating adjacent entities attribute, to complete data renewal.Local of the present invention is all entities to be intended to update centered by renewal beginning entity depth bounds.
For achieving the above object, the technical solution used in the present invention is as follows:
The big data local diffusion update algorithm of graph structure disclosed by the invention, comprises the following steps:
The big data of step 1. exploded view structure: big for graph structure data are decomposed into big data entity and inter-entity contact;
Step 2. stores data: stored big data entity and inter-entity contact by application distribution data base's HBase table;
Step 3. initiation parameter: include desire and update beginning entity identification, be intended to more new data and the row member in updating row race, the renewal degree of depth and queue;Arranging the current degree of depth that updates is 0, and queue is empty;
Step 4. is according to being intended to update beginning entity identification, and search desire updates and starts entity, is intended to renewal and starts in entity insertion queue;
The current degree of depth that updates reaches the initialized renewal degree of depth of step 3 and queue for terminating time empty to update;
Step 5. updates the row member updating row race in the HBase table that head of the queue entity is corresponding;
When the adjacent entities degree of depth that head of the queue entity has not visited forwards step 7 to more than when updating the degree of depth;
Step 6. adds the adjacent entities having not visited of head of the queue entity in queue;
Step 7. record that head of the queue entity is accessed and head of the queue entity currently update the degree of depth, delete head of the queue entity elements;
If same layer adjacent entities has not updated, there is following two processing method, two kinds of optional one of processing method: the first, it is judged that more than the initialized renewal degree of depth of step 3 and queue, the current degree of depth that updates is whether sky is set up, set up and terminate to update, otherwise forward step 5 to;The second, forwards step 5 to;
Step 8. currently updates the degree of depth and adds 1;
When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update, it is false, forwards step 5 to.
Further, described step 1 adopts entity attribute reflection inter-entity contact, use entity identification to distinguish different big data entity.
Further, the mark that can distinguish each big data entity in the overall situation is defined as by described step 1 entity identification, described entity identification and big data entity one_to_one corresponding.
Further, described entity attribute includes contact attribute and Update attribute, and the data or the data field that are intended to renewal are defined as Update attribute, and inter-entity contact is defined as contact attribute, namely it is intended to the data or the data field that update with Update attribute reflection, reflects inter-entity contact by contact attribute.
Further, described entity attribute also includes other attribute, and other attribute is the attribute beyond contact attribute, entity identification, Update attribute.Definition herein makes transformational relation or data structure more succinct, makes Update attribute consistent with the row member of HBase table with contact attribute simultaneously, belongs to transitional term.
Further, in described step 2, described entity identification is converted into the row keyword of record;Described contact attribute, Update attribute and other attribute are separately converted to the contact row race of HBase table, update row race and other row race;Described more new data or the territory of being intended to is stored in the respective column member updating row race.
The big data local diffusion update algorithm of graph structure disclosed by the invention has the feature that
First, adopt distributed data base HBase (HBigdataEntity table) to store contact between the big data entity of graph structure and big data entity, HBase database storage capacity is infinitely great in theory, supports the storage of the big data of graph structure of magnanimity.
Second, the local maxima in local updating ranges for the subgraph of given figure, and different subgraphs belong to different local.The time loss of local updating essentially consists in graph traversal, traverses desire renewal and starts to update all entities in subrange in entity local.The Algorithms T-cbmplexity of the present invention is O (N), N is start, to be intended to update, subgraph entity (summit) quantity that entity is constituted.If will extend partially into whole figure rather than subgraph when calculating time complexity, this situation is not belonging in scope (being that all entities update rather than local updating).Adopting the local updating Algorithms T-cbmplexity based on depth-first search and breadth-first search is O (N2), namely update local and begin stepping through the farthest subgraph in summit for distance.The present invention is than the local updating algorithm based on Depth Priority Algorithm and BFS faster.
3rd, based on the local updating algorithm of depth-first search and breadth-first search, because depth-first search or breadth-first search adopt storehouse to preserve return address, under big data environment, possibly cannot complete, and there is risk.The present invention adopts the information such as queue buffer entity mark, and queue greatest length is that same layer updates degree of depth adjacent entities maximum amount of data.The space complexity of this algorithm is also much smaller than the local updating algorithm based on depth-first search or breadth-first search, and can be applicable to the local updating of big data.
4th, the present invention supports that directed graph and non-directed graph local data update.For non-directed graph, arrange the corresponding adjacent entities of allocated column member's storage entity in race in the contact of HBigdataEntity table.For directed graph, arranging allocated column member's storage entity arc head and arc tail two class row member in race in the contact of HBigdataEntity table, arc head row member stores the contact pointing to this entity;Arc tail row member stores the entity relationship pointed out.Owing to HBase table has well row member's autgmentability, it is easy to dynamic insertion and deleting contacts between big data entity.
5th, owing to the linear structure in data structure and tree construction are the special row of graph structure, contact the local data's renewal for linear structure and the field of tree construction so the present invention can also be applied between big data entity.
Compared with prior art, the present invention has following benefit effect:
1. big for graph structure data are converted to big data entity and inter-entity contact, entity relationship attribute expresses the netted contact of inter-entity, big data entity and inter-entity are contacted and is stored in HBigdataEntity table, have speed and less space cost faster than the local updating algorithm based on depth-first search and breadth-first search.
2. the present invention supports that the local data of directed graph and the big data of non-directed graph updates, it is easy to increases and deletes inter-entity contact, be i.e. easily contact between maintenance entity.
Accompanying drawing explanation
Fig. 1 is the flow chart of embodiment 1;
Fig. 2 is the flow chart of embodiment 2.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearly understand, below in conjunction with accompanying drawing, the present invention is further elaborated.
Embodiment 1
As it is shown in figure 1, the big data local diffusion update algorithm of graph structure disclosed in the present embodiment, comprise the following steps:
The big data of step 1. exploded view structure: big for graph structure data are decomposed into big data entity and inter-entity contact.Adopt entity attribute reflection inter-entity contact, use entity identification to distinguish different big data entity.The mark that can distinguish each big data entity in the overall situation is defined as entity identification, and entity identification and big data entity have one-to-one relationship.According to entity attribute function, entity attribute includes contact attribute, Update attribute and other attribute, wherein other attribute is for describing other character or feature or the characteristic etc. of big data entity, and in the present embodiment, other attribute is the attribute beyond contact attribute, entity identification, Update attribute.The data (territory) definition (tissue) being intended to update are Update attribute;Between the big data entity in big data, contact is defined as contact attribute.Contact attribute reflection figure the netted contact of inter-entity, great majority be create entity record time determine that constant because the contact of inter-entity is fixing, but also there is a need to amendment situation.
Step 1. completes graph structure big data problem territory definition, and step 2 completes the storage of the big data of graph structure.
Step 2. stores data: stored big data entity and inter-entity contact by application distribution data base's HBase table.The netted contact of big data entity and inter-entity is stored to store the big data of graph structure by application distribution data base's HBase table.Big data entity is further converted to the record of HBase table;Entity identification is converted into the row keyword of record;Entity relationship attribute, Update attribute and other attribute are separately converted to the contact row race of HBase table, update row race and other row race.It is stored in the respective column member that contact arranges race by describing the netted contact attribute of inter-entity, forerunner's entity of this entity stores (supporting multiple forerunner's entity) with forerunner row member, and follow-up entity stores (supporting multiple follow-up entity) with follow-up member;It is intended to more new data (territory) be stored in the respective column member updating row race (supporting that multiple data update simultaneously).For the ease of describing below, HBase table is defined as HBigdataEntity table by this patent, sets up HBigdataEntity table according to above analysis, and this table is used for storing contact and arranges race, updates row race and other row race, and row keyword is entity identification.
Step 2 completes the big data storage of graph structure, and step 1 and 2 is the basis of the application big data local updating algorithm of graph structure of the present invention, relates to the definition of following term: is intended to renewal and starts entity, updates the degree of depth and queue.It is intended to update and starts entity for input parameter, be first entity starting of local updating.Updating the degree of depth for input parameter, distance is intended to update the depth capacity starting entity, and being intended to update the degree of depth starting entity is 0, is counted as 1,2,3 to adjacent entities diffusion depth successively ....Queue is the data structure arriving first first to go out, and first element in queue is called header element, and correspondingly, in queue, last element is to tail element.Queue includes deleting header element and inserting tail element operation, adopts queue storage local updating entity relevant information in the present invention, for instance whether entity identification, entity are accessed mark and currently update the degree of depth.
Following steps 3~8 complete the big data local diffusion of graph structure and update:
Step 3. initiation parameter: include desire and update beginning entity identification, be intended to more new data and the row member in updating row race, the renewal degree of depth and queue;Arranging the current degree of depth that updates is 0, and queue is empty.
Step 4. starts entity identification according to being intended to renewal, and search desire starts more novel entities, is intended to renewal and starts in entity insertion queue,
When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update.
Step 5. updates the row member updating row race in the HBigdataEntity table that head of the queue entity is corresponding, and the data namely completing entity update.
Step 7 is forwarded to when the adjacent entities degree of depth that head of the queue entity has not visited reaches to update the degree of depth.
Step 6. judges that whether the adjacent entities degree of depth that head of the queue entity has not visited is more than updating the degree of depth, if the adjacent entities having not visited of head of the queue entity of being false in queue to add.
Step 7. record that head of the queue entity is accessed and head of the queue entity currently update the degree of depth, delete head of the queue entity elements.
If same layer adjacent entities has not updated, then judge more than the initialized renewal degree of depth of step 3 and queue, the current degree of depth that updates is whether sky is set up, terminating to update if setting up, being false, forwarding step 5 to.
Step 8. currently updates the degree of depth and adds 1;When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update, otherwise forward step 5 to.
Embodiment 2
The present embodiment and embodiment 1 are distinctive in that: after step 7, if same layer adjacent entities has not updated, then forward step 5 to.
Big for graph structure data decomposition description is contact between the big data entity of graph structure, entity identification and big data entity by the present invention;Adopt HBigdataEntity table to store big data entity mark and between (for row keyword), big data entity, contact (for contact row race member), Update attribute (for updating row race member) and other attribute;By buffer queue adjacent entities by more novel entities severity control loop parameter, to be intended to update renewal adjacent entities attribute centered by beginning entity.Present invention can apply in the fields such as information pushing, advertisement putting, the marketing, it is possible to be applied in linear structure and graph structure big data local updating field, applied range.
Certainly; the present invention also can have other various embodiments; when without departing substantially from present invention spirit and essence thereof; those of ordinary skill in the art can make various corresponding change and deformation according to the present invention, but these change accordingly and deformation all should belong to the scope of the claims appended by the present invention.

Claims (6)

1. the big data local diffusion update algorithm of graph structure, it is characterised in that: comprise the following steps:
The big data of step 1. exploded view structure: by big for graph structure data decomposition description be big data entity and inter-entity contact;
Step 2. stores data: stored big data entity and inter-entity contact by application distribution data base's HBase table;
Step 3. initiation parameter: include desire and update beginning entity identification, be intended to more new data and the row member in updating row race, the renewal degree of depth and queue;The current degree of depth that updates is 0, and queue is empty;
Step 4. is according to being intended to update beginning entity identification, and search desire updates and starts entity, is intended to renewal and starts in entity insertion queue;
When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update;
Step 5. updates the row member updated in the HBase table that head of the queue entity is corresponding in row race;
When the adjacent entities degree of depth that head of the queue entity has not visited is more than when updating the degree of depth, forward step 7 to;
Step 6. adds the adjacent entities having not visited of head of the queue entity in queue;
Step 7. record that head of the queue entity is accessed and head of the queue entity currently update the degree of depth, delete head of the queue entity elements;
If same layer adjacent entities has not updated following two processing method, two kinds of processing methods.Optional one:
The first, it is judged that more than the initialized renewal degree of depth of step 3 and queue, the current degree of depth that updates is whether sky is set up, set up and terminate to update, otherwise forward step 5 to;
The second is to forward step 5 to;
Step 8. currently updates the degree of depth and adds 1;
When current renewal when the degree of depth is empty more than the initialized renewal degree of depth of step 3 and queue terminates to update, otherwise forward rapid 5 to.
2. the big data local diffusion update algorithm of graph structure according to claim 1, it is characterised in that: described step 1 adopts entity attribute reflection inter-entity contact, uses entity identification to distinguish different big data entity.
3. the big data local diffusion update algorithm of graph structure according to claim 2, it is characterised in that: the mark that can distinguish each big data entity in the overall situation is defined as entity identification, described entity identification and big data entity one_to_one corresponding by described step 1.
4. the big data local diffusion update algorithm of graph structure according to claim 2, it is characterized in that: described entity attribute includes contact attribute and Update attribute, the data or the data field that are intended to renewal are defined as Update attribute, and contact between big data entity is defined as contact attribute.
5. the big data local diffusion update algorithm of graph structure according to claim 4, it is characterised in that: described entity attribute also includes other attribute, and other attribute is the attribute beyond contact attribute, entity identification, Update attribute.
6. the big data local diffusion update algorithm of graph structure according to claim 5, it is characterised in that: in described step 2, described entity identification is converted into the row keyword of record;Described contact attribute, Update attribute and other attribute are separately converted to the contact row race of HBase table, update row race and other row race;Described more new data or the data field of being intended to is stored in the respective column member updating row race.
CN201610075075.5A 2016-02-02 2016-02-02 Local diffusion updating algorithm for graph structure big data Pending CN105740447A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610075075.5A CN105740447A (en) 2016-02-02 2016-02-02 Local diffusion updating algorithm for graph structure big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610075075.5A CN105740447A (en) 2016-02-02 2016-02-02 Local diffusion updating algorithm for graph structure big data

Publications (1)

Publication Number Publication Date
CN105740447A true CN105740447A (en) 2016-07-06

Family

ID=56244859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610075075.5A Pending CN105740447A (en) 2016-02-02 2016-02-02 Local diffusion updating algorithm for graph structure big data

Country Status (1)

Country Link
CN (1) CN105740447A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018161702A1 (en) * 2017-03-06 2018-09-13 京信通信***(中国)有限公司 Method for processing data service in long term evolution (lte) base station and base station
CN111614749A (en) * 2020-05-19 2020-09-01 深圳华锐金融技术股份有限公司 Data transmission method, data transmission device, computer equipment and storage medium
US10824612B2 (en) 2017-08-21 2020-11-03 Western Digital Technologies, Inc. Key ticketing system with lock-free concurrency and versioning
US11055266B2 (en) 2017-08-21 2021-07-06 Western Digital Technologies, Inc. Efficient key data store entry traversal and result generation
US11210212B2 (en) 2017-08-21 2021-12-28 Western Digital Technologies, Inc. Conflict resolution and garbage collection in distributed databases
US11210211B2 (en) 2017-08-21 2021-12-28 Western Digital Technologies, Inc. Key data store garbage collection and multipart object management

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018161702A1 (en) * 2017-03-06 2018-09-13 京信通信***(中国)有限公司 Method for processing data service in long term evolution (lte) base station and base station
US10824612B2 (en) 2017-08-21 2020-11-03 Western Digital Technologies, Inc. Key ticketing system with lock-free concurrency and versioning
US11055266B2 (en) 2017-08-21 2021-07-06 Western Digital Technologies, Inc. Efficient key data store entry traversal and result generation
US11210212B2 (en) 2017-08-21 2021-12-28 Western Digital Technologies, Inc. Conflict resolution and garbage collection in distributed databases
US11210211B2 (en) 2017-08-21 2021-12-28 Western Digital Technologies, Inc. Key data store garbage collection and multipart object management
CN111614749A (en) * 2020-05-19 2020-09-01 深圳华锐金融技术股份有限公司 Data transmission method, data transmission device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105740447A (en) Local diffusion updating algorithm for graph structure big data
Malliaros et al. The core decomposition of networks: Theory, algorithms and applications
CN105320719B (en) A kind of crowd based on item label and graphics relationship raises website item recommended method
CN103365992B (en) Method for realizing dictionary search of Trie tree based on one-dimensional linear space
CN103914493A (en) Method and system for discovering and analyzing microblog user group structure
CN104794177B (en) A kind of date storage method and device
CN105159950B (en) Mass data real-time sequencing query method and system
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
Yun et al. Mining recent high average utility patterns based on sliding window from stream data
CN103365991A (en) Method for realizing dictionary memory management of Trie tree based on one-dimensional linear space
CN111143513B (en) Sensitive word recognition method and device and electronic equipment
CN103123650A (en) Extensible markup language (XML) data bank full-text indexing method based on integer mapping
San Segundo et al. Efficiently enumerating all maximal cliques with bit-parallelism
Hassani Overview of efficient clustering methods for high-dimensional big data streams
Cheng et al. ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model
Wong et al. Online skyline analysis with dynamic preferences on nominal attributes
CN103399904A (en) Data processing method and data processing system
CN105843809A (en) Data processing method and device
CN116010664A (en) Data processing method and system based on MPTT and parent searching
WO2023024474A1 (en) Data set determination method and apparatus, and computer device and storage medium
CN115495462A (en) Batch data updating method and device, electronic equipment and readable storage medium
CN102882792B (en) Method for simplifying internet propagation path diagram
CN115328366A (en) Million-level tree node searching and displaying method and system based on full path calculation
CN112800056B (en) Multi-layer index construction method based on multi-granularity space-time data
Park et al. A fast and compact indexing technique for moving objects

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160706