CN112818179A - Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment - Google Patents

Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment Download PDF

Info

Publication number
CN112818179A
CN112818179A CN201911127994.2A CN201911127994A CN112818179A CN 112818179 A CN112818179 A CN 112818179A CN 201911127994 A CN201911127994 A CN 201911127994A CN 112818179 A CN112818179 A CN 112818179A
Authority
CN
China
Prior art keywords
matrix
hybrid
graph
offset
edata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911127994.2A
Other languages
Chinese (zh)
Other versions
CN112818179B (en
Inventor
刘树珍
周家秀
孟金涛
魏彦杰
冯圣中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201911127994.2A priority Critical patent/CN112818179B/en
Publication of CN112818179A publication Critical patent/CN112818179A/en
Application granted granted Critical
Publication of CN112818179B publication Critical patent/CN112818179B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to the technical field of graph calculation, and discloses a graph traversal access optimization method and system based on a Hybrid storage format and electronic equipment. The optimization method randomly generates an adjacent matrix AM of the storage map data; carrying out graph data processing on the adjacent matrix AM to generate a matrix hybrid with a new storage format; and finally, according to the generated matrix hybrid with the new storage format, BFS searching is carried out to complete breadth-first searching of the graph. The invention adopts the matrix hybrid with a new storage format to store the graph data, and effectively solves the problems of too large memory consumption or array overflow and the like of the existing data structures of various storage graphs in the graph traversal algorithm.

Description

Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment
Technical Field
The invention relates to the technical field of graph calculation, in particular to a graph traversal access optimization method and system based on a Hybrid storage format and electronic equipment.
Background
The graph is an important data structure, and in reality, many problems can be abstractly specified to be a graph calculation problem and then be further processed. In practice, the whole graph is rarely used, the application of the dense graph is rare, most graphs are sparse graphs in reality, and especially graphs involved in large-scale graph calculation are extremely sparse.
The Breadth-First Search algorithm (BFS) is a classic algorithm for graph computation, and is also a blind-Search graph computation method that does not consider the possible locations of the results, and thoroughly searches the entire graph in order to start the system expansion from a certain point and traverse all vertices in the graph until a result is found. BFS is one of the simplest graph traversal algorithms, and is also a prototype of many graph traversal algorithms. In reality, a plurality of problems can be abstracted into the graph calculation problem, and the graph calculation is widely applied to the industries of communication, transportation, biological medical treatment and the like. The number of vertices in the current analysis graph network has increased from billions to billions, and the edge has increased from billions to billions, requiring increasingly larger memory space.
The breadth-first search algorithm has been studied for nearly sixty years, and starting in 1959, Moore first discovered a parallel algorithm for breadth-first search (BFS) when studying the problem of finding paths through a maze. In 1961, the same algorithm was independently discovered in the case of c.y.lee wiring on a circuit board. Since then, many colleges and research institutions have continuously explored various efficient BFS algorithms, and hopefully, the search speed of the breadth-first search algorithm (BFS) is faster and the consumed storage is smaller by optimizing the algorithm itself or modifying the data structure of the graph, so that the BFS algorithm is greatly improved. Compared with the structure of a classical memory map, the structure of the memory map comprises an adjacent matrix, an adjacent table and a sparse matrix csr, but the following problems still exist:
(1) the adjacency matrix is a classic data structure which is firstly used for storing the graph, can quickly judge whether an edge exists between two vertexes, and is convenient for calculating the degree of the vertexes and reading data; edges can be flexibly added and deleted, the calculation complexity is low, and the consumption time is O (n).
(2) The adjacency list only stores actual edges, and is a storage structure combining an array and a linked list, in the graph, vertexes are stored by using a one-dimensional array, all adjacency points of any vertex v _ i form a linear list, and the adjacency list is used for storing because the number of the adjacency points is not constant. Each node in the linear table comprises two fields of data and firstedge, wherein the data is a data field and stores vertex information; firstedge is a pointer field that points to the first node of the edge table, i.e., the first adjacency point of this vertex. The edge table node consists of two fields of adjvex and next, wherein the adjvex is an adjacent point field and stores subscripts of adjacent points of a certain vertex in a vertex table; next, a pointer is stored to the next node in the edge table. The temporal complexity for a directed graph adjacency list with n vertices and e edges is O (n + e) and the temporal complexity for an undirected graph is O (n +2 e). The adjacency list is relatively used in practical application, but for the graph with large difference of the number of the adjacency points among vertices, a large amount of memory space is wasted by using the adjacency list due to the principle of memory data for the adjacency list.
(3) csr is a compressed storage mode that consumes minimal memory, and csr compresses a two-dimensional matrix into three bit arrays, using values, column numbers, and row offsets, respectively. The csr is not a triplet, but rather an overall encoding. The CSR uses three one-dimensional arrays to store data information of each edge of the graph, is a good compression structure, but is difficult to slice, and the problem of data overflow is easy to occur when the number of edges is large.
Take undirected graph G ═ V, E as an example, when the number of edges is counted
Figure BDA0002277468320000021
When considering the chain field to be added to the adjacency list, the adjacency matrix is the best storage method. When the number of edges e < n2The sparse matrix csr is suitable for the case of extremely sparse graph, i.e. number of edges
Figure BDA0002277468320000022
Then (c) is performed.
Disclosure of Invention
The invention aims to provide a graph traversal access optimization method, a graph traversal access optimization system and electronic equipment based on a Hybrid storage format, aiming at the technical problems in the prior art, and the problems of too large memory consumption or array overflow and the like in the existing graph traversal algorithm of various data structures of storage graphs can be effectively solved.
In order to solve the problems proposed above, the technical scheme adopted by the invention is as follows:
a graph traversal access optimization method based on a Hybrid storage format comprises the following specific steps:
step a: randomly generating an adjacency matrix AM for storing graph data;
step b: carrying out graph data processing on the adjacent matrix AM to generate a matrix hybrid with a new storage format;
step c: and according to the generated matrix hybrid of the new storage format, performing BFS search to finish breadth-first search of the graph.
Preferably, the specific processing procedure in step b is as follows:
step S21: carrying out de-0 treatment on the adjacent matrix AM according to rows, only retaining non-0 elements in the adjacent matrix AM, and generating a sparse matrix ELL I; the sparse matrix ELL I comprises a matrix Edata and Offset;
step S22: selecting a proper segmentation point and segmenting a sparse matrix ELL I; the matrix Edata is divided into a matrix Edata-L and an matrix Edata-R, wherein all elements in the matrix Edata-L are non-zero real numbers; the matrix Offset is divided into matrices Offset-L and Offset-R;
step 23: generating a matrix hybrid of a new storage format according to the segmented sparse matrix ELL I, wherein the matrix hybrid comprises a sparse matrix ELL II and a sparse matrix COO, the sparse matrix ELL II is composed of a matrix Edata-L and a matrix Offset-L, and the sparse matrix COO is composed of a matrix Edata-R and an Offset-R
Preferably, the step S21 specifically includes the following steps:
step 211: marking the scale of an adjacent matrix AM corresponding to a graph with n vertexes as n x n, respectively calculating the number of a row containing the most non-0 elements and the number of a row containing the most non-0 elements, and respectively marking the number of the rows as max and min;
step 212: taking the number max of one row with the maximum non-0 elements in the adjacent matrix AM, and establishing two matrixes Edata and Offset of n × max, wherein the matrix Edata is used for storing the non-0 elements of the adjacent matrix AM, and the matrix Offset is used for storing the position coordinates of the non-0 elements in the adjacent matrix AM;
step 213: and generating a sparse matrix ELL I according to the matrix Edata and Offset.
Preferably, the specific search process in step c is as follows:
step S31: and performing vertex positioning, and searching a starting vertex v0 as a row number of a matrix hybrid, wherein an adjacent point of the vertex v0 is a value contained in any row of the matrix Offset-R, and a row number of the sparse matrix COO is a triple of the vertex v 0. The row number of the matrix hybrid is consistent with that of the adjacent matrix;
step S32: performing a BFS search completes a breadth-first search of the graph according to the located vertex v 0.
Preferably, the specific process of step S32 is as follows:
step S321: setting the located vertex v0 as a starting point;
step S322: searching all adjacent points corresponding to the starting point, judging whether the starting point has the adjacent points or not, and if not, finishing the search; if yes, selecting to execute the next step;
step S323: judging whether all the adjacent points of the starting point are completely accessed, if so, executing the next step; if not, continuing the access until the access is finished, and then executing one step;
step S324: selecting one adjacent point of the vertex v0 as a starting point, and repeating the steps S322-S323 to finish the operation;
step S325: the remaining adjacent points of the vertex v0 are set as starting points one by one, and the operations from step S322 to step S323 are repeated until the breadth-first search for the graph is completed.
A graph traversal access optimization system based on a Hybrid storage format comprises:
a random matrix generation module: an adjacency matrix AM for randomly generating storage map data;
the graph data processing module: the system is used for carrying out graph data processing on the adjacent matrix AM and generating a matrix hybrid with a new storage format;
a BFS searching module: and the method is used for carrying out BFS search on the generated matrix hybrid with the new storage format to complete breadth-first search of the graph.
Preferably, the graph data processing module includes:
a zero-removing processing module: the method is used for performing de-0 treatment on the adjacent matrix AM according to rows, only retaining non-0 elements in the adjacent matrix AM, and generating a sparse matrix ELL I; the sparse matrix ELL I comprises a matrix Edata and an Offset, wherein the matrix Edata stores non-0 elements of an adjacent matrix AM, and the matrix Offset stores position coordinates of the non-0 elements in the adjacent matrix AM;
selecting a segmentation module: the method is used for selecting a proper segmentation point and segmenting the sparse matrix ELL I; dividing the matrix Edata into a matrix Edata-L and a matrix Edata-R, and dividing the matrix Offset into a matrix Offset-L and an Offset-R;
a new matrix generation module: the matrix hybrid used for generating a new storage format according to the segmented sparse matrix ELL I; the matrix hybrid comprises a matrix sparse ELL II and a sparse matrix COO, wherein the sparse ELL II is composed of a matrix Edata-L and a matrix Offset-L, and the matrix COO is composed of a matrix Edata-R and an Offset-R, wherein the row number is the row number of the matrix Edata-R or the Offset-R, the column number is the value of the Offset-R, and the numerical value is the value of the Edata-R.
Preferably, the BFS search module comprises:
a vertex positioning module: the method is used for positioning the vertex and searching the initial vertex v0 as the row number of the matrix hybrid; the adjacent point of the vertex v0 is a value contained in any row of the matrix Offset-R, and the row number of the sparse matrix COO is a triplet of the vertex v 0; the row number of the matrix hybrid is consistent with that of the adjacent matrix;
the execution searching module: and the searching module is used for setting the positioned vertex v0 as a starting point, searching all adjacent points corresponding to the starting point, and setting the adjacent points corresponding to the starting point one by one as the starting point to execute searching until the breadth-first searching of the graph is completed.
An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the following operations of the graph traversal access optimization method based on the Hybrid storage format:
step a: randomly generating an adjacency matrix AM for storing graph data;
step b: carrying out graph data processing on the adjacent matrix AM to generate a matrix hybrid with a new storage format;
step c: and according to the generated matrix hybrid of the new storage format, performing BFS search to finish breadth-first search of the graph.
Compared with the prior art, the invention has the beneficial effects that:
the invention adopts the matrix hybrid with a new storage format to store the graph data, and effectively solves the problems of too large memory consumption or array overflow and the like of the existing data structures of various storage graphs in the graph traversal algorithm.
Compared with an adjacent table and an adjacent matrix, the invention greatly reduces the consumption of the memory; because its memory consumption is much less than n x n compared to the adjacency matrix, no additional chain domain is needed compared to the adjacency list matrix hybrid; after compression, I/O times are reduced, and I/O time overhead is shortened.
Compared with CSR, the matrix ELL II in the matrix hybrid is equivalent to a sparse matrix of a full graph, and the matrix COO is directly addressed compared with the CSR format, so that offset position calculation is not needed, and calculation times are reduced.
Drawings
Fig. 1 is a flowchart of a graph traversal memory access optimization method based on a Hybrid storage format according to an embodiment of the present application.
FIG. 2 is a schematic diagram of the data processing and new matrix generation in the embodiment of the present application.
Fig. 3 is a flowchart of performing BFS search in the embodiment of the present application.
Fig. 4 is a schematic diagram of a graph traversal memory access optimization system based on the Hybrid storage format according to an embodiment of the present application.
FIG. 5 is a schematic diagram of a data processing module in an embodiment of the present application.
Fig. 6 is a schematic diagram of a BFS search module in an embodiment of the present application.
Fig. 7 is a schematic structural diagram of hardware equipment of a graph traversal memory access optimization method based on the Hybrid storage format according to an embodiment of the present application.
Detailed Description
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Referring to fig. 1, the invention further provides a graph traversal access optimization method based on the Hybrid storage format, which comprises the following specific steps:
step S1: an adjacency matrix AM storing graph data is randomly generated.
Step S2: and carrying out graph data processing on the adjacent matrix AM to generate a matrix hybrid with a new storage format. The specific processing procedure is as follows (see fig. 2):
step S21: performing de-0 treatment on the adjacent matrix AM according to rows, only reserving non-0 elements in the adjacent matrix AM, and generating a sparse matrix ELL I, which specifically comprises the following steps:
step 211: and marking the scale of the adjacent matrix AM corresponding to the graph with n vertexes as n × n, respectively calculating the number of the row with the most non-0 elements and the minimum row in the adjacent matrix AM as max and min, wherein for the sparse graph max & lt n, min & lt max.
Step 212: and taking the number max of one row with the maximum non-0 elements in the adjacent matrix AM, and establishing two matrixes Edata and Offset of n max, wherein the matrix Edata is used for storing the non-0 elements of the adjacent matrix AM, and the matrix Offset is used for storing the position coordinates of the non-0 elements in the adjacent matrix AM.
In this step 212, all non-0 elements in the adjacency matrix AM are moved to the left end of the adjacency matrix AM and stored in the matrix Edata. The matrix Offset is arranged in the same manner as the matrix Edata except that the value of the element other than 0 in the matrix Edata is changed to the position of the ordinate of the element other than 0 (the element 0 in the right part of the matrix Edata is replaced by a special symbol, and the same replacement processing is performed in the matrix 0 ffset).
Step 213: and generating a sparse matrix ELL I according to the matrix Edata and Offset.
Step S22: and selecting a proper segmentation point and segmenting the sparse matrix ELL I. Specifically, the matrix Edata is split into a matrix Edata-L of n × min (all elements are non-zero real numbers) and a matrix Edata-R of n × max (max-min). And performing the same operation on the matrix Offset to segment two matrices, namely Offset-L and Offset-R.
Step 23: generating a matrix hybrid of a new storage format according to the segmented sparse matrix ELL I, wherein the matrix hybrid comprises a sparse matrix ELL II and a sparse matrix COO, and the sparse matrix ELL II consists of a matrix Edata-L and a matrix Offset-L (the matrix Edata-L can be deleted without a weight map, and only the matrix Offset-L is reserved for recording the position of a vertex); the matrix COO is composed of the divided matrixes Edata-R and Offset-R, wherein the row number is the row number of the matrix Edata-R or Offset-R, the column number is the value of the Offset-R, and the numerical value is the value of the Edata-R.
Step S3: and performing BFS search to complete breadth-first search of the graph according to the generated matrix hybrid of the new storage format. (see FIG. 3) the specific search process is as follows:
step S31: and performing vertex positioning, and searching a starting vertex v0 as a row number of a matrix hybrid, wherein an adjacent point of the vertex v0 is a value contained in any row of the matrix Offset-R, and a row number of the sparse matrix COO is a triple of the vertex v 0. The row number of the matrix hybrid coincides with the row number of the adjacent matrix.
Step S32: and according to the positioned vertex v0, performing BFS search to finish breadth-first search of the graph, wherein the specific search process is as follows:
step S321: the located vertex v0 is set as a start point.
Step S322: searching all adjacent points corresponding to the starting point, judging whether the starting point has the adjacent points or not, and if not, finishing the search; if so, the next step is selected.
Step S323: judging whether all the adjacent points of the starting point are completely accessed, if so, executing the next step; if not, the access is continued until the access is completed, and then one step is executed.
Step S324: one adjacent point of the vertex v0 is selected as a starting point, and the operations from step S322 to step S323 are repeated.
Step S325: the remaining adjacent points of the vertex v0 are set as starting points one by one, and the operations from step S322 to step S323 are repeated until the breadth-first search for the graph is completed.
Referring to fig. 4, this embodiment further provides a graph traversal access optimization system based on the Hybrid storage format, where the optimization system includes:
a random matrix generation module: the adjacency matrix AM for randomly generating the memory map data.
The graph data processing module: and the system is used for carrying out graph data processing on the adjacent matrix AM and generating a matrix hybrid with a new storage format.
A BFS searching module: and the method is used for carrying out BFS search on the generated matrix hybrid with the new storage format to complete breadth-first search of the graph.
Further, referring to fig. 5, the graph data processing module includes:
a zero-removing processing module: the method is used for performing de-0 processing on the adjacent matrix AM according to rows, only retaining non-0 elements in the adjacent matrix AM, and generating the sparse matrix ELL I. The sparse matrix ELL I comprises a matrix Edata and an Offset, wherein the matrix Edata stores non-0 elements of the adjacent matrix AM, and the matrix Offset stores position coordinates of the non-0 elements in the adjacent matrix AM.
Selecting a segmentation module: the method is used for selecting a proper segmentation point and segmenting the sparse matrix ELL I. Specifically, the matrix Edata is split into a matrix Edata-L of n × min (all elements are non-zero real numbers) and a matrix Edata-R of n × max (max-min). And performing the same operation on the matrix Offset to segment two matrices, namely Offset-L and Offset-R.
A new matrix generation module: the matrix hybrid is used for generating a matrix hybrid in a new storage format according to the segmented sparse matrix ELL I, wherein the matrix hybrid comprises a matrix sparse ELL II and a sparse matrix COO, and the sparse ELL II consists of a matrix Edata-L and a matrix Offset-L (the matrix Edata-L can be deleted without a weight map, and only the matrix Offset-L is reserved for recording the position of a vertex); the matrix COO is composed of the divided matrixes Edata-R and Offset-R, wherein the row number is the row number of the matrix Edata-R or Offset-R, the column number is the value of the Offset-R, and the numerical value is the value of the Edata-R.
Further, the zero-removing processing module comprises:
a calculate non-zero elements module: the method is used for calculating the number of the row with the most non-0 elements and the minimum row in the adjacent matrix AM, which are respectively marked as max and min, and marking the scale of the adjacent matrix AM corresponding to the graph with n vertexes as n x n, wherein for the sparse graph max & lt n, min & lt max.
A matrix building module: the method is used for taking the number max of a row with the maximum non-0 elements in the adjacent matrix AM and establishing two matrixes Edata and Offset of n max.
A sparse matrix generation module: the sparse matrix ELL I is generated by the matrix Edata and Offset.
Further, referring to fig. 6, the BFS search module includes:
a vertex positioning module: for vertex positioning, a starting vertex v0 is searched as a row number of a matrix hybrid, an adjacent point of the vertex v0 is a value contained in any row of the matrix Offset-R, and a row number of the sparse matrix COO is a triple of the vertex v 0. The row number of the matrix hybrid coincides with the row number of the adjacent matrix.
The execution searching module: and the searching module is used for setting the positioned vertex v0 as a starting point, searching all adjacent points corresponding to the starting point, and setting the adjacent points corresponding to the starting point one by one as the starting point to execute searching until the breadth-first searching of the graph is completed.
Fig. 7 is a schematic structural diagram of a hardware device of a graph traversal memory access optimization method based on the Hybrid storage format according to an embodiment of the present application. As shown in fig. 7, the apparatus includes one or more processors and memory. Taking a processor as an example, the apparatus may further include: an input system and an output system.
The processor, memory, input system, and output system may be connected by a bus or other means, as exemplified by the bus connection in fig. 7.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor executes various functional applications and data processing of the electronic device, i.e., implements the processing method of the above-described method embodiment, by executing the non-transitory software program, instructions and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processing system over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input system may receive input numeric or character information and generate a signal input. The output system may include a display device such as a display screen.
The one or more modules are stored in the memory and, when executed by the one or more processors, perform the following for any of the above method embodiments:
step a: an adjacency matrix AM storing graph data is randomly generated.
Step b: and carrying out graph data processing on the adjacent matrix AM to generate a matrix hybrid with a new storage format.
Step c: and according to the generated matrix hybrid of the new storage format, performing BFS search to finish breadth-first search of the graph.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
Embodiments of the present application provide a non-transitory (non-volatile) computer storage medium having stored thereon computer-executable instructions that may perform the following operations:
step a: an adjacency matrix AM storing graph data is randomly generated.
Step b: and carrying out graph data processing on the adjacent matrix AM to generate a new matrix in a hybrid storage format.
Step c: and according to the generated new matrix of the hybrid storage format, performing BFS search to complete breadth-first search of the graph.
Embodiments of the present application provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the following:
step a: an adjacency matrix AM storing graph data is randomly generated.
Step b: and carrying out graph data processing on the adjacent matrix AM to generate a new matrix in a hybrid storage format.
Step c: and according to the generated new matrix of the hybrid storage format, performing BFS search to complete breadth-first search of the graph.
According to the graph traversal access optimization method, the graph traversal access optimization system and the electronic equipment based on the Hybrid storage format, the adjacency matrix AM for storing graph data is generated randomly, graph data processing is carried out on the adjacency matrix AM, and a matrix Hybrid of a new storage format is generated; and according to the generated matrix hybrid of the new storage format, performing BFS search to finish breadth-first search of the graph. Compared with the prior art, the method has the advantages that the waste of memory space is reduced, the graph with larger storage capacity is stored, and meanwhile, the search efficiency of the BFS algorithm is improved.
In the embodiment of the present application, if the size of a graph is not very large and the difference between the row with the most non-0 elements and the row with the least elements in the adjacency matrix is very small, the ELL format can be directly used for calculation efficiency without being converted into the hybrid format.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (9)

1. A graph traversal access optimization method based on a Hybrid storage format is characterized by comprising the following steps: the optimization method comprises the following specific steps:
step a: randomly generating an adjacency matrix AM for storing graph data;
step b: carrying out graph data processing on the adjacent matrix AM to generate a matrix hybrid with a new storage format;
step c: and performing BFS search to complete breadth-first search of the graph according to the generated matrix hybrid of the new storage format.
2. The graph traversal memory access optimization method based on the Hybrid storage format as claimed in claim 1, wherein: the specific treatment process in the step b is as follows:
step S21: carrying out de-0 treatment on the adjacent matrix AM according to rows, only retaining non-0 elements in the adjacent matrix AM, and generating a sparse matrix ELL I; the sparse matrix ELL I comprises a matrix Edata and Offset;
step S22: selecting a proper segmentation point and segmenting a sparse matrix ELL I; the matrix Edata is divided into a matrix Edata-L and an matrix Edata-R, wherein all elements in the matrix Edata-L are non-zero real numbers; the matrix Offset is divided into matrices Offset-L and Offset-R;
step 23: and generating a matrix hybrid of a new storage format according to the segmented sparse matrix ELL I, wherein the matrix hybrid comprises a sparse matrix ELL II and a sparse matrix COO, the sparse matrix ELL II consists of a matrix Edata-L and a matrix Offset-L, and the sparse matrix COO consists of a matrix Edata-R and an Offset-R.
3. The graph traversal memory access optimization method based on the Hybrid storage format as claimed in claim 2, wherein: the step S21 specifically includes the following steps:
step 211: marking the scale of an adjacent matrix AM corresponding to a graph with n vertexes as n x n, respectively calculating the number of a row containing the most non-0 elements and the number of a row containing the most non-0 elements, and respectively marking the number of the rows as max and min;
step 212: taking the number max of one row with the maximum non-0 elements, and establishing two matrixes Edata and Offset of n max, wherein the matrix Edata is used for storing the non-0 elements of the adjacent matrix AM, and the matrix Offset is used for storing the position coordinates of the non-0 elements in the adjacent matrix AM;
step 213: and generating a sparse matrix ELL I according to the matrix Edata and Offset.
4. The graph traversal memory access optimization method based on the Hybrid storage format as claimed in claim 2, wherein: the specific search process in step c is as follows:
step S31: and performing vertex positioning, and searching a starting vertex v0 as a row number of a matrix hybrid, wherein an adjacent point of the vertex v0 is a value contained in any row of the matrix Offset-R, and a row number of the sparse matrix COO is a triple of the vertex v 0. The row number of the matrix hybrid is consistent with that of the adjacent matrix;
step S32: performing a BFS search completes a breadth-first search of the graph according to the located vertex v 0.
5. The graph traversal memory access optimization method based on the Hybrid storage format as claimed in claim 4, wherein: the specific process of step S32 is as follows:
step S321: setting the located vertex v0 as a starting point;
step S322: searching all adjacent points corresponding to the starting point, judging whether the starting point has the adjacent points or not, and if not, finishing the search; if yes, selecting to execute the next step;
step S323: judging whether all the adjacent points of the starting point are completely accessed, if so, executing the next step; if not, continuing the access until the access is finished, and then executing one step;
step S324: selecting one adjacent point of the vertex v0 as a starting point, and repeating the steps S322-S323 to finish the operation;
step S325: the remaining adjacent points of the vertex v0 are set as starting points one by one, and the operations from step S322 to step S323 are repeated until the breadth-first search for the graph is completed.
6. A graph traversal access optimization system based on a Hybrid storage format is characterized in that: the optimization system comprises:
a random matrix generation module: an adjacency matrix AM for randomly generating storage map data;
the graph data processing module: the system is used for carrying out graph data processing on the adjacent matrix AM and generating a matrix hybrid with a new storage format;
a BFS searching module: and the method is used for carrying out BFS search on the generated matrix hybrid with the new storage format to complete breadth-first search of the graph.
7. The graph traversal memory access optimization system based on the Hybrid storage format as claimed in claim 6, wherein: the graph data processing module comprises:
a zero-removing processing module: the method is used for performing de-0 treatment on the adjacent matrix AM according to rows, only retaining non-0 elements in the adjacent matrix AM, and generating a sparse matrix ELL I; the sparse matrix ELL I comprises a matrix Edata and an Offset, wherein the matrix Edata stores non-0 elements of an adjacent matrix AM, and the matrix Offset stores position coordinates of the non-0 elements in the adjacent matrix AM;
selecting a segmentation module: the method is used for selecting a proper segmentation point and segmenting the sparse matrix ELL I; dividing the matrix Edata into a matrix Edata-L and a matrix Edata-R, and dividing the matrix Offset into a matrix Offset-L and an Offset-R;
a new matrix generation module: the matrix hybrid used for generating a new storage format according to the segmented sparse matrix ELL I; the matrix hybrid comprises a matrix sparse ELL II and a sparse matrix COO, wherein the sparse ELL II is composed of a matrix Edata-L and a matrix Offset-L, and the matrix COO is composed of a matrix Edata-R and an Offset-R, wherein the row number is the row number of the matrix Edata-R or the Offset-R, the column number is the value of the Offset-R, and the numerical value is the value of the Edata-R.
8. The graph traversal memory access optimization system based on the Hybrid storage format as claimed in claim 7, wherein: the BFS searching module comprises:
a vertex positioning module: the method is used for positioning the vertex and searching the initial vertex v0 as the row number of the matrix hybrid; the adjacent point of the vertex v0 is a value contained in any row of the matrix Offset-R, and the row number of the sparse matrix COO is a triplet of the vertex v 0; the row number of the matrix hybrid is consistent with that of the adjacent matrix;
the execution searching module: and the searching module is used for setting the positioned vertex v0 as a starting point, searching all adjacent points corresponding to the starting point, and setting the adjacent points corresponding to the starting point one by one as the starting point to execute searching until the breadth-first searching of the graph is completed.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the following operations of the graph traversal access optimization method based on the Hybrid storage format according to any one of the above 1 to 5:
step a: randomly generating an adjacency matrix AM for storing graph data;
step b: carrying out graph data processing on the adjacent matrix AM to generate a matrix hybrid with a new storage format;
step c: and according to the generated matrix hybrid of the new storage format, performing BFS search to finish breadth-first search of the graph.
CN201911127994.2A 2019-11-18 2019-11-18 Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment Active CN112818179B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911127994.2A CN112818179B (en) 2019-11-18 2019-11-18 Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911127994.2A CN112818179B (en) 2019-11-18 2019-11-18 Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment

Publications (2)

Publication Number Publication Date
CN112818179A true CN112818179A (en) 2021-05-18
CN112818179B CN112818179B (en) 2022-06-21

Family

ID=75852513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911127994.2A Active CN112818179B (en) 2019-11-18 2019-11-18 Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment

Country Status (1)

Country Link
CN (1) CN112818179B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068787A (en) * 2015-08-28 2015-11-18 华南理工大学 Heterogeneous parallel computing method for sparse matrix-vector multiplication
CN105225187A (en) * 2015-10-09 2016-01-06 苏州盛景信息科技股份有限公司 Based on the pipe network spacial analytical method of breadth-first search
WO2019001070A1 (en) * 2017-06-28 2019-01-03 浙江大学 Adjacency matrix-based connection information organization system, image feature extraction system, and image classification system and method
US10191998B1 (en) * 2016-09-13 2019-01-29 The United States of America, as represented by the Director, National Security Agency Methods of data reduction for parallel breadth-first search over graphs of connected data elements
CN110175286A (en) * 2019-05-17 2019-08-27 山东师范大学 It is combined into the Products Show method and system to optimization and matrix decomposition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068787A (en) * 2015-08-28 2015-11-18 华南理工大学 Heterogeneous parallel computing method for sparse matrix-vector multiplication
CN105225187A (en) * 2015-10-09 2016-01-06 苏州盛景信息科技股份有限公司 Based on the pipe network spacial analytical method of breadth-first search
US10191998B1 (en) * 2016-09-13 2019-01-29 The United States of America, as represented by the Director, National Security Agency Methods of data reduction for parallel breadth-first search over graphs of connected data elements
WO2019001070A1 (en) * 2017-06-28 2019-01-03 浙江大学 Adjacency matrix-based connection information organization system, image feature extraction system, and image classification system and method
CN110175286A (en) * 2019-05-17 2019-08-27 山东师范大学 It is combined into the Products Show method and system to optimization and matrix decomposition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
任永功等: "基于环型网络模体应用马尔科夫聚类的图挖掘模型", 《模式识别与人工智能》, no. 09, 15 September 2017 (2017-09-15) *
李寅: "基于GPU的混合式全源对最短路径算法研究", 《微电子学与计算机》, 29 February 2016 (2016-02-29) *
衡冬冬等: "并行原型***上BFS算法设计实现与测试分析", 《计算机工程与科学》, no. 01, 15 January 2017 (2017-01-15) *

Also Published As

Publication number Publication date
CN112818179B (en) 2022-06-21

Similar Documents

Publication Publication Date Title
CN110516810B (en) Quantum program processing method and device, storage medium and electronic device
CN105786942A (en) Geographic information storage system based on cloud platform
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
Garcia et al. GPU-based dynamic search on adaptive resolution grids
CN104809161A (en) Method and system for conducting compression and query on sparse matrix
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
CN109189994B (en) CAM structure storage system for graph computation application
CN113094899B (en) Random power flow calculation method and device, electronic equipment and storage medium
CN112818179B (en) Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment
CN108334532B (en) Spark-based Eclat parallelization method, system and device
CN112200310B (en) Intelligent processor, data processing method and storage medium
CN109741421B (en) GPU-based dynamic graph coloring method
CN111402422A (en) Three-dimensional surface reconstruction method and device and electronic equipment
CN104036141A (en) Open computing language (OpenCL)-based red-black tree acceleration algorithm
CN109471635B (en) Algorithm optimization method based on Java Set implementation
CN110706743A (en) Protein interaction network motif detection method for balanced sampling and graph retrieval
CN116841762A (en) Fixed-length type edge point combined sampling mechanism in graph neural network training
Zhao et al. Simulated annealing with a hybrid local search for solving the traveling salesman problem
EP3757821A1 (en) Apparatus and method for transforming matrix, and dataprocessing system
CN116755636B (en) Parallel reading method, device and equipment for grid files and storage medium
CN113312312B (en) Distributed index method and system for efficiently querying stream data based on LSM
CN108809726B (en) Method and system for covering node by box
CN113836481B (en) Matrix computing circuit, method, electronic device, and computer-readable storage medium
CN117078825B (en) Rendering modification method, system, equipment and medium on point cloud data line
CN113505825B (en) Graph calculating device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant