CN113946584A - QRB tree indexing method for massive vector data retrieval - Google Patents

QRB tree indexing method for massive vector data retrieval Download PDF

Info

Publication number
CN113946584A
CN113946584A CN202111251118.8A CN202111251118A CN113946584A CN 113946584 A CN113946584 A CN 113946584A CN 202111251118 A CN202111251118 A CN 202111251118A CN 113946584 A CN113946584 A CN 113946584A
Authority
CN
China
Prior art keywords
tree
hit
grid
retrieval
corner
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111251118.8A
Other languages
Chinese (zh)
Inventor
余接情
韦祎
梁洁
吴立新
张绍良
吴晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INNER MONGOLIA AUTONOMOUS REGION LAND SURVEY AND PLANNING INSTITUTE
China University of Mining and Technology CUMT
Original Assignee
INNER MONGOLIA AUTONOMOUS REGION LAND SURVEY AND PLANNING INSTITUTE
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INNER MONGOLIA AUTONOMOUS REGION LAND SURVEY AND PLANNING INSTITUTE, China University of Mining and Technology CUMT filed Critical INNER MONGOLIA AUTONOMOUS REGION LAND SURVEY AND PLANNING INSTITUTE
Priority to CN202111251118.8A priority Critical patent/CN113946584A/en
Publication of CN113946584A publication Critical patent/CN113946584A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an QRB tree indexing method for massive vector data retrieval, which comprises two steps of index construction and accurate retrieval. When the index is constructed, selecting a quadtree grid which does not exceed the quadtree grid boundary and is closest to the current vector element size as a subordinate grid; respectively inserting the current vector elements into the R tree and the barrel associated with the membership grid; forming QRB a tree index; when accurate retrieval is carried out, a rectangular retrieval frame is given, and the rectangular retrieval frame is decomposed into a plurality of hit grids; obtaining an internal candidate element set, a boundary candidate element set and a corner candidate element set; eliminating the elements which are not intersected with the rectangular retrieval frame in the grid candidate set by fine intersection test to obtain a new corner candidate element set; and then a final result set is obtained. The method can accelerate the positioning and reading steps of the R tree in the QR tree index, and reduce unnecessary R tree retrieval and fine intersection test operation, thereby obviously improving the efficiency of space retrieval.

Description

QRB tree indexing method for massive vector data retrieval
Technical Field
The invention relates to an QRB tree indexing method for massive vector data retrieval, and belongs to the technical field of big data retrieval and query.
Background
With the development of spatial information technology, diversification of acquisition means and geometric-level increase of the scale of geospatial data, how to effectively organize and manage massive spatial data and make the massive spatial data exert the maximum benefit becomes a problem which needs to be solved urgently at present. The retrieval performance of the spatial index determines the efficiency of spatial data organization and management, and directly influences the subsequent analysis and use. Vector data is the most common type of spatial data. Vector data-oriented spatial indexes can be broadly classified into tree indexes, grid indexes, and hybrid indexes.
The typical representation of tree index is R tree and its variants, and its searching performance is easily affected by tree height, node range overlapping degree, etc. A typical representation of the grid index is a GeoHash index, and the retrieval performance of the grid index is susceptible to the size of the grid. The mixed index is the synthesis of tree index and grid index, takes QR tree as typical representation, and the basic idea is as follows: dividing an index space into grids (called quad tree grids) of different levels by utilizing quad trees, wherein each quad tree grid corresponds to one R tree; when the index is constructed, the element is inserted into the R tree associated with the grid where the element is located; during retrieval, the retrieval frame is decomposed into a plurality of quad-tree grids of different levels, and then R trees associated with the quad-tree grids are retrieved one by one to obtain a final retrieval result. The essence of the QR tree index is to decompose a large R tree into a plurality of small R trees by means of a quadtree grid, so as to improve the efficiency of data retrieval, and have the advantages of both grid index and tree index. Similar to the variants of the R tree, there are also multiple variants of the QR tree, for example, Mao et al and qiujianhua et al, which propose the replacement of the R tree in the QR tree index with an R + tree or an R tree, respectively; the LQR tree index is proposed by using loose quadtrees (grid boundary expansion) to replace conventional quadtrees in the Yangzhou and the like; wang et al three-dimensional expansion of QR trees proposed 3DOR tree indexing.
Spatial indexing often uses a Minimum Bounding box (MBR) of elements to replace the real range of elements to improve retrieval performance. For this reason, the actual search can be divided into two different search modes, coarse and precise. The former retrieval result is MBR (not element itself) with intersection with the retrieval frame, and therefore, a certain error exists; the latter further excludes elements that do not intersect the search box on the basis of the former. Compared with tree indexes and grid indexes, the mixed indexes have certain advantages, but the indexes have more redundant operations in the rough retrieval stage and the precise retrieval stage at present, and the retrieval performance of the indexes is influenced. The above-mentioned problem is a problem that should be considered and solved in the massive vector data retrieval process.
Disclosure of Invention
The invention aims to provide an QRB tree indexing method for massive vector data retrieval, which solves the problem that the retrieval performance of the prior art is influenced by redundant operation.
The technical solution of the invention is as follows:
an QRB tree indexing method oriented to massive vector data retrieval comprises the following steps,
s1, constructing QRB tree indexes: dividing the space where the vector data is located into quad-tree grids of different levels by utilizing quad-trees, and setting an R tree and a bucket for each quad-tree grid; giving a vector element, and selecting a quadtree grid which does not exceed a quadtree grid boundary and is closest to a current vector element range as a subordinate grid; inserting the vector elements into the R trees and buckets associated with the membership grids respectively; after all vector elements are inserted, establishing a random access structure to store all non-empty R trees and non-empty buckets to form QRB tree indexes;
s2, using QRB tree to carry out accurate search: giving a rectangular retrieval frame, decomposing the rectangular retrieval frame into a plurality of hit grids, classifying the hit grids into internal hit grids, boundary hit grids and corner hit grids, and respectively retrieving from a bucket or an R tree associated with the hit grids according to the types of the hit grids so as to obtain an internal candidate element set, a boundary candidate element set and a corner candidate element set; removing elements which are not intersected with the rectangular retrieval frame in the corner candidate element set through a fine intersection test to obtain a new corner candidate element set; and combining the internal candidate element set, the boundary candidate element set and the new corner candidate element set to form a final retrieval result set.
Further, in step S1, the quadtree grid that does not exceed the quadtree grid boundary and is closest to the current vector element range is selected as the subordinate grid, specifically,
s11, giving a vector element, and calculating the level l of the quad-tree grid corresponding to the current element MBR;
and S12, calculating the quadtree grid of the current element MBR under the level l as the membership grid according to the range and the position of the current element MBR.
Further, in step S11, calculating a level l of the quadtree grid corresponding to the current element MBR includes:
Figure BDA0003322047460000021
wherein omegaxAnd ΩyRespectively representing the dimension of the current element MBR along the x axis and the dimension of the current element MBR along the y axis; if the quadtree grid is a global grid, then L0And (c) if not, otherwise,
Figure BDA0003322047460000031
wherein, ω isxAnd ωyThe dimension of the MBR along the x-axis and the dimension along the y-axis, respectively, of the entire data set containing all vector elements, a being a fixed constant.
Further, in step S1, vector elements are inserted into the buckets associated with the membership grid, specifically,
s13, inserting the current element into the R tree associated with the membership grid according to the insertion algorithm of the R tree index;
and S14, adding the ID or the pointer of the current element to the array corresponding to the bucket associated with the membership grid.
Further, in step S1, a random access structure is established to store all non-empty R trees and non-empty buckets, specifically,
s15, initializing the R tree address array and the barrel address array, setting the initial value as a null address, and setting the R tree address array and the barrel address array to be the same in size
Figure BDA0003322047460000032
And N iskIs the total number of k-th level quadtree grids, L0Is the maximum allowed level;
s16, sequentially splicing all non-empty R trees according to the order of the quadtree grid coding, storing the result in an R tree set file, registering the head address of each non-empty R tree in the R tree set file on the c-th element of the R tree address array, wherein c is the code of the quadtree grid associated with the current non-empty R tree;
s17, sequentially splicing all non-empty buckets according to the order of the quadtree grid codes, storing the result to a bucket set file, registering the head address of each non-empty bucket in the bucket set file on the c-th element of the bucket address array, wherein c is the code of the quadtree grid associated with the current non-empty bucket;
s18, writing the R tree address array and the bucket address array into corresponding files respectively, wherein the R tree address array file, the bucket address array file, the R tree set file and the bucket set file form a random access structure, namely QRB tree index.
Further, in step S16 and step S17, the encoding of the quadtree grid is specifically calculated as follows:
Figure BDA0003322047460000033
wherein N iskAnd M is the total number of the k-th level quad-tree grids, M is the number of columns of the current grid unit in the l-th level quad-tree grids, and i and j are the column and row sequence numbers of the current grid unit in the l-th level quad-tree grids.
In step S2, the rectangular search box is decomposed into a plurality of hit grids, specifically, the current rectangular search box is divided by using a quadtree grid division methodSearch box, generate all levels less than L0And the quadtree grid intersecting the current rectangular search box is the hit grid.
Further, in step S2, classifying the hit grids into an internal hit grid, a boundary hit grid and a corner hit grid, and retrieving from their associated buckets or R trees according to the types of the hit grids, thereby obtaining an internal candidate element set, a boundary candidate element set and a corner candidate element set; the fine intersection test is used for eliminating the elements which are not intersected with the rectangular retrieval frame in the corner candidate element set to obtain a new corner candidate element set, specifically,
s21, loading the R tree address array file and the barrel address array file from the QRB tree index to a memory and residing;
s22, dividing the hit grids into internal hit grids, boundary hit grids and corner hit grids according to the position relation between the hit grids and the rectangular retrieval frame; calculating the code of each hit grid according to the formula (2) to respectively obtain an internal hit grid code set CIBoundary hit trellis code set CBAnd corner hit trellis code set CC
S23, encoding set C by using internal hit gridIObtaining hit elements to obtain an internal candidate element set piI
S24, encoding set C by using boundary hit gridBObtaining hit elements to obtain a boundary candidate element set piB
S25, using corner point to hit the grid coding set CCObtaining hit elements to obtain a corner point candidate element set piCObtaining a new corner point candidate element set pi 'by excluding elements which do not intersect with the rectangular search box through a fine intersection test'C
Further, in step S23, the internal hit trellis encoding set C is utilizedIObtaining hit elements to obtain an internal candidate element set piISpecifically, the method comprises the following steps of,
inner hit trellis encoding set C based on continuity of encodingIPartitioning, namely, for each partition, firstly, according to the first and last grid codes of continuous intervals, namely the element index of the bucket address arrayAnd (3) number introduction, namely acquiring the first address and the last address of the hit bucket from the bucket address array, loading hit bucket information from the bucket set file according to the first address and the last address of the hit bucket, reading hit element information according to the read hit bucket information, and adding the hit element information to the n-shaped candidate element set in the internal candidate element setI
Further, in step S24, the trellis encoding set C is hit by the boundaryBObtaining hit elements to obtain a boundary candidate element set piBSpecifically, the method comprises the following steps of,
boundary hit trellis encoding set C based on the continuity of encodingBPartitioning, namely for each partition, acquiring first and last addresses of a hit R tree from an R tree address array according to first and last grid codes of a continuous interval, namely element index numbers of the R tree address array, and reading all index data of the hit R tree from an R tree set file at one time according to the first and last addresses; decomposing the index data into mutually independent hit R tree index data in the memory; and then, searching each hit R tree by using the current rectangular search frame, and adding the result into the boundary candidate element set piB
Further, in step S25, the corner points are used to hit the trellis encoding set CCObtaining hit elements to obtain a corner point candidate element set piCObtaining a new corner point candidate element set pi 'by excluding elements which do not intersect with the rectangular search box through a fine intersection test'CSpecifically, the method comprises the following steps of,
s251, for corner hit grid coding set CCFirstly, acquiring an R tree first address of the current grid code from an R tree address array;
s252, reading corresponding index data of hit R trees from the R tree set file according to the first address, retrieving the hit R trees by using the current rectangular retrieval frame, and adding the result to the corner point candidate element set piC
S253, repeating the steps S251 and S252 until the corner points hit the grid code set CCAfter each grid coding in the grid is finished, a corner point candidate element set pi is obtainedC
S254, according to four corner points of the rectangular retrieval frame, a corner point candidate element set pi obtained in the S253 is obtainedCIn turn, theIs divided into piC1、ΠC2、ΠC3And piC4
S255, for the ith corner, searching the R tree corresponding to the hit grid corresponding to the corner by using the corner to obtain a result set
Figure BDA0003322047460000051
Further testing one by one
Figure BDA0003322047460000052
If the element does not intersect the current rectangular search box, it is selected from the set ΠCiRemoving;
s256, repeating the step S255 until the four angular points are processed, and combining the pi setsC1、ΠC2、ΠC3And piC4Is a new corner point candidate element set pi'C
The invention has the beneficial effects that:
the QRB tree indexing method for massive vector data retrieval can accelerate the positioning and reading steps of R tree indexes and reduce unnecessary R tree retrieval operation and fine intersection test operation, thereby improving the retrieval performance of elements and remarkably improving the efficiency of space retrieval.
The method of the invention avoids unnecessary R tree retrieval operation by adding a bucket index to each quad tree grid; accelerating the location and read operations of hit R-trees and hit buckets by specific random access data structures; by limiting the fine intersection test range to the corner hit elements in the candidate elements, the efficiency of accurate retrieval is significantly improved.
Compared with the current mainstream spatial index technology such as GeoHash index or index in PostGIS, the QRB tree index method for searching massive vector data has a larger performance advantage in the aspect of accurate search through experimental verification.
Drawings
Fig. 1 is a schematic structural diagram of an QRB tree indexing method for massive vector data retrieval according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of an embodiment of a quadtree grid with additional bucket indices.
Fig. 3 is an explanatory diagram of selection of a suitable quadtree grid for vector elements in the embodiment.
Fig. 4 is an explanatory diagram of a random access structure in the embodiment.
FIG. 5 is an explanatory diagram of all the quadtree grids covered by the rectangular search box in the fine search in the embodiment.
FIG. 6 is a comparison of the time consumption of fine search of the same data of the QR tree and a GeoHash index in the prior art.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Examples
An QRB tree indexing method oriented to massive vector data retrieval, as shown in FIG. 1, comprises the following steps,
s1, constructing QRB tree indexes: dividing the space where the vector data is located into quad-tree grids of different levels by utilizing quad-trees, and setting an R tree and a bucket for each quad-tree grid, as shown in FIG. 2; given a vector element, selecting a quadtree grid which does not exceed the quadtree grid boundary and is closest to the current vector element range as a subordinate grid, as shown in FIG. 3; inserting the vector elements into the R trees and buckets associated with the membership grids respectively; after all vector elements are inserted, establishing a random access structure to store all non-empty R trees and non-empty buckets, and forming QRB tree indexes as shown in FIG. 4;
in step S1, a quadtree grid that does not exceed the quadtree grid boundary and is closest to the current vector element range is selected as a subordinate grid, specifically,
s11, giving a vector element, and calculating the level l of the quad-tree grid corresponding to the current vector element MBR; the method specifically comprises the following steps:
Figure BDA0003322047460000071
wherein,Ωxand ΩyRespectively representing the dimension of the current element MBR along the x axis and the dimension of the current element MBR along the y axis; if the quadtree grid is a global grid, then L0And (c) if not, otherwise,
Figure BDA0003322047460000072
wherein, ω isxAnd ωyThe MBR of the whole data set containing all vector elements is respectively in the dimension size along the x axis and the dimension size along the y axis, and a is a fixed constant and is usually 7-9.
And S12, calculating the quadtree grid of the current element MBR under the level l as the membership grid according to the range and the position of the current element MBR.
In step S1, vector elements are inserted into the buckets associated with the membership grid, specifically,
s13, inserting the current element into the R tree associated with the membership grid according to the insertion algorithm of the R tree index;
and S14, adding the ID or the pointer of the current element to the array corresponding to the bucket associated with the membership grid. In step S1, a random access structure is established to store all non-empty R trees and non-empty buckets, specifically,
s15, initializing the R tree address array and the barrel address array, setting the initial value as a null address, and setting the R tree address array and the barrel address array to be the same in size
Figure BDA0003322047460000073
And N iskIs the total number of k-th level quadtree grids, L0Is the maximum allowed level;
s16, sequentially splicing all non-empty R trees according to the order of the quadtree grid coding, storing the result in an R tree set file, registering the head address of each non-empty R tree in the R tree set file on the c-th element of the R tree address array, wherein c is the code of the quadtree grid associated with the current non-empty R tree;
s17, sequentially splicing all non-empty buckets according to the order of the quadtree grid codes, storing the result to a bucket set file, registering the head address of each non-empty bucket in the bucket set file on the c-th element of the bucket address array, wherein c is the code of the quadtree grid associated with the current non-empty bucket;
in step S16 and step S17, the specific calculation method of encoding the quadtree grid is as follows:
Figure BDA0003322047460000074
wherein N iskAnd M is the total number of the k-th level quad-tree grids, M is the number of columns of the current grid unit in the l-th level quad-tree grids, and i and j are the column and row sequence numbers of the current grid unit in the l-th level quad-tree grids.
S18, writing the R tree address array and the bucket address array into corresponding files respectively, wherein the R tree address array file, the bucket address array file, the R tree set file and the bucket set file form a random access structure, namely QRB tree index.
S2, using QRB tree to carry out accurate search: a rectangular retrieval frame is given, the rectangular retrieval frame is decomposed into a plurality of hit grids, and the hit grids are divided into internal hit grids, boundary hit grids and corner hit grids; referring to fig. 5, the internal candidate element set pi is obtained by respectively retrieving from the associated bucket or R tree according to the category of the hit gridIAnd boundary candidate element set piCCorner point candidate element set piC(ii) a Eliminating corner point candidate element set pi through fine intersection testCObtaining a new corner point candidate element set pi 'after elements which do not intersect with the rectangular search frame'C(ii) a Collecting pi internal candidate elementsIAnd boundary candidate element set piCAnd new corner point candidate set Π'CAnd combining to form a final retrieval result set.
In step S2, the rectangular search frame is decomposed into a plurality of hit grids, specifically, the current rectangular search frame is divided by using a quadtree grid division method to generate all levels smaller than L0And the quadtree grid intersecting the current rectangular search box is the hit grid.
In step S2, the hit grids are classified into internal hit grids, boundary hit grids, and corner hit grids according to the methodThe category of the hit grid is respectively searched from the related barrel or R tree, so as to obtain an internal candidate element set, a boundary candidate element set and an angular point candidate element set; eliminating corner point candidate element set pi through fine intersection testCObtaining a new corner candidate element set after the elements which are not intersected with the rectangular retrieval frame, specifically,
s21, loading the R tree address array file and the barrel address array file from the QRB tree index to a memory and residing;
s22, dividing the hit grids into internal hit grids, boundary hit grids and corner hit grids according to the position relation between the hit grids and the rectangular retrieval frame; calculating the code of each hit grid according to the formula (2) to respectively obtain an internal hit grid code set CIBoundary hit trellis code set CBAnd corner hit trellis code set CC
S23, encoding set C by using internal hit gridIObtaining hit elements to obtain an internal candidate element set piI(ii) a In particular to a method for preparing a high-performance nano-silver alloy,
inner hit trellis encoding set C based on continuity of encodingIPartitioning, namely, for each partition, acquiring a head address and a tail address of a hit bucket from a bucket address array according to a head and tail grid code of a continuous interval, namely an element index number of the bucket address array, loading hit bucket information from a bucket set file according to the head and tail addresses of the hit bucket, reading hit element information according to the read hit bucket information, and adding the hit element information to an internal candidate element set piI
S24, encoding set C by using boundary hit gridBObtaining hit elements to obtain a boundary candidate element set piB(ii) a In particular to a method for preparing a high-performance nano-silver alloy,
boundary hit trellis encoding set C based on the continuity of encodingBPartitioning, namely for each partition, acquiring first and last addresses of a hit R tree from an R tree address array according to first and last grid codes of a continuous interval, namely element index numbers of the R tree address array, and reading all index data of the hit R tree from an R tree set file at one time according to the first and last addresses; decomposing the index data into mutually independent hit R tree index data in the memory; and then utilize the currentEach hit R tree is retrieved by a rectangular retrieval frame, and the result is added into a boundary candidate element set piB
S25, using corner point to hit the grid coding set CCObtaining hit elements to obtain a corner point candidate element set piCObtaining a new candidate set pi 'of corner points through a fine intersection test to exclude elements which do not intersect the rectangular search box'C. In particular to a method for preparing a high-performance nano-silver alloy,
s251, for corner hit grid coding set CCFirstly, acquiring an R tree first address of the current grid code from an R tree address array;
s252, reading corresponding index data of hit R trees from the R tree set file according to the first address, retrieving the hit R trees by using the current rectangular retrieval frame, and adding the result to the corner point candidate element set piC
S253, repeating the steps S251 and S252 until the corner points hit the grid code set CCAfter each grid coding in the grid is finished, a corner point candidate element set pi is obtainedC
S254, according to four corner points of the rectangular retrieval frame, a corner point candidate element set pi obtained in the S253 is obtainedCAre divided into pi in turnC1、ΠC2、ΠC3And piC4
S255, for the ith corner, searching the R tree corresponding to the hit grid corresponding to the corner by using the corner to obtain a result set
Figure BDA0003322047460000091
Further testing one by one
Figure BDA0003322047460000092
If the element does not intersect the current rectangular search box, it is selected from the set ΠCiRemoving;
s256, repeating the step S255 until the four angular points are processed, and combining the pi setsC1、ΠC2、ΠC3And piC4Is a new corner point candidate element set pi'C
The QRB tree indexing method for massive vector data retrieval comprises two steps of index construction and accurate retrieval. When the index is constructed, selecting a quadtree grid which does not exceed the quadtree grid boundary and is closest to the current vector element size as a subordinate grid; respectively inserting the current vector elements into the R tree and the barrel associated with the membership grid; establishing a random access structure to store all non-empty R trees and non-empty buckets to form QRB tree indexes; when in accurate retrieval, a given rectangular retrieval frame is decomposed into a plurality of hit grids; different hit grids adopt different strategies to search QRB tree indexes to obtain a plurality of candidate sets; eliminating elements which are not intersected with the rectangular retrieval frame in the corner candidate set through a fine intersection test; and merging all candidate sets to obtain a final result set. The method can accelerate the positioning and reading steps of the R tree in the QR tree index, and reduce unnecessary R tree retrieval and accurate intersection test operation, thereby obviously improving the efficiency of space retrieval.
The QRB tree index method facing massive vector data retrieval avoids unnecessary R tree retrieval operation by adding a bucket index to each quadtree grid. In the method of the embodiment, vector elements inserted into the R tree index are all limited to be not beyond the boundary of the quadtree grid corresponding to the R tree, so that all elements in the internal hit grid are inevitably hit by the current retrieval frame, and are not required to be further obtained through time-consuming R tree retrieval, thereby effectively improving the retrieval performance, and avoiding the problem that in the existing QR tree index, in the hit grid, whether the hit grid is the internal hit grid or the non-internal hit grid, the final retrieval result can be obtained by retrieving the corresponding hit R tree, and the retrieval performance is not ideal.
According to the QRB tree indexing method for massive vector data retrieval, the fine intersection test range is limited to the corner point hit elements in the candidate elements, so that the efficiency of accurate retrieval is remarkably improved. According to the embodiment method, three categories of the corner point hit grids, the boundary hit grids and the internal hit grids are carried out, only the corner point hit grid code sets are required to be subjected to fine intersection tests one by one, and the other two internal hit grid code sets and the boundary hit grid code sets are not required to be subjected to further fine intersection tests, so that the efficiency can be improved, and the problem of low efficiency caused by the fact that fine intersection tests are carried out on all candidate elements one by one in the conventional indexing method can be solved.
In the QRB tree indexing method for massive vector data retrieval, as shown in fig. 5, a hit grid is divided into a corner hit grid, a boundary hit grid and an internal hit grid according to the position of a retrieval frame, and then candidate elements are divided into a corner candidate element set, a boundary candidate element set and an internal candidate element set according to different hit grids.
As shown in fig. 5, it is obvious that when the search frame is a rectangular frame, the internal candidate element necessarily intersects with the rectangular search frame, the boundary candidate element is searched from the corresponding R tree through the range, and also necessarily intersects with the rectangular search frame, if there is an element in the boundary hit grid that does not intersect with the rectangular search frame, this element will not be searched by the rectangular search frame in the R tree, so that the boundary candidate element will not appear in the lower two elements in the second row boundary grid in fig. 5.
As shown in fig. 5, the corner point search substitution search frame of the rectangular search frame is used to search from the R tree index associated with the corner point hit grid corresponding to the corner point, to obtain corner point candidate elements, and the result is divided into corner point hit elements such as element 1, element 3, element 6, and element 7 in fig. 5, and non-corner point hit elements such as element 2, element 4, element 5, and element 8 in fig. 5; obviously, when the search frame is a rectangular frame, non-corner hit elements in the corner candidate elements are necessarily intersected with the search frame, such as elements 2, 4, 5, and 8 in fig. 5; testing the intersection condition of the corner point hit elements and the rectangular retrieval frame one by one, and finding out corner point hit elements which are not intersected with the rectangular retrieval frame, such as element 7 in fig. 5; corner hit elements which do not intersect with the retrieval frame, such as element 7 in fig. 5, are removed from the corner candidate elements to obtain final corner candidate elements, such as element 1, element 2, element 3, element 4, element 5, element 6, and element 8 in fig. 5, and the final corner candidate elements, together with the boundary candidate elements and the internal candidate elements, are included in a result set of accurate retrieval.
According to the QRB tree index method for massive vector data retrieval, the positioning and reading operations of a hit R tree and a hit bucket are accelerated by randomly accessing a data structure; as shown in fig. 4, the random access data structure is composed of an R tree address array, a bucket address array, an R tree set file, and a bucket set file. In actual retrieval, the hit R tree or hit bucket needs to be located and read for subsequent retrieval. In the prior art, the R tree is stored in a disk as a file name by using the code of its associated grid, and the R tree file is located and corresponding information is read by using a file system during acquisition. When the R-tree files are very small and numerous, the positioning and reading performance of hitting the R-tree is low. The R tree associated with the specific grid can be positioned and read at random through the R tree address array and the R tree set file; buckets associated with a particular grid may be randomly located and read by the bucket address array and the bucket set file.
According to the QRB tree indexing method for massive vector data retrieval, through constructing QRB tree indexes (Q: Quad tree R: R-tree B: Bucket, and based on a mixed space index of a Quad tree, an R tree and a Bucket), unnecessary R tree retrieval operation and fine intersection test operation can be reduced, the positioning and reading steps of the R tree index are accelerated, and therefore the space retrieval efficiency is remarkably improved, and the advantages are particularly obvious when a retrieval frame is large.
Through experimental verification, as shown in fig. 6, the time consumed for fine retrieval by the QRB tree indexing method for massive vector data retrieval is substantially less than that consumed by existing QR trees and GeoHash indexes, and thus the fine retrieval efficiency of the embodiment method is better than that of the QR trees and the GeoHash indexes.
The above is only a specific application example of the present invention, and the protection scope of the present invention is not limited in any way. All technical solutions formed by equivalent transformation or equivalent replacement (such as R tree of QRB tree replaced by R tree variant) are within the scope of the present invention.

Claims (10)

1. A QRB tree indexing method oriented to massive vector data retrieval is characterized in that: comprises the following steps of (a) carrying out,
s1, constructing QRB tree indexes: dividing the space where the vector data is located into quad-tree grids of different levels by utilizing quad-trees, and setting an R tree and a bucket for each quad-tree grid; giving a vector element, and selecting a quadtree grid which does not exceed a quadtree grid boundary and is closest to a current vector element range as a subordinate grid; inserting the vector elements into the R trees and buckets associated with the membership grids respectively; after all vector elements are inserted, establishing a random access structure to store all non-empty R trees and non-empty buckets to form QRB tree indexes;
s2, using QRB tree to carry out accurate search: a rectangular retrieval frame is given, the rectangular retrieval frame is decomposed into a plurality of hit grids, and the hit grids are divided into internal hit grids, boundary hit grids and corner hit grids; respectively retrieving from the related barrel or R tree according to the type of the hit grid so as to obtain an internal candidate element set, a boundary candidate element set and an angular point candidate element set; removing elements which are not intersected with the rectangular retrieval frame in the corner candidate element set through a fine intersection test to obtain a new corner candidate element set; and combining the internal candidate element set, the boundary candidate element set and the new corner candidate element set to form a final retrieval result set.
2. The QRB tree indexing method for massive vector data retrieval as claimed in claim 1, wherein: in step S1, a quadtree grid that does not exceed the quadtree grid boundary and is closest to the current vector element range is selected as a subordinate grid, specifically,
s11, giving a vector element, and calculating the level l of the quad-tree grid corresponding to the current vector element MBR;
and S12, calculating the quadtree grid of the current element MBR under the level l as the membership grid according to the range and the position of the current element MBR.
3. The QRB tree indexing method for massive vector data retrieval as claimed in claim 2, wherein: in step S11, calculating a level l of the quadtree grid corresponding to the current element MBR, specifically:
Figure FDA0003322047450000011
wherein omegaxAnd ΩyRespectively representing the dimension of the current element MBR along the x axis and the dimension of the current element MBR along the y axis; if the quadtree grid is a global grid, then L0And (c) if not, otherwise,
Figure FDA0003322047450000012
wherein, ω isxAnd ωyThe dimension of the MBR along the x-axis and the dimension along the y-axis, respectively, of the entire data set containing all vector elements, a being a fixed constant.
4. The QRB tree indexing method for massive vector data retrieval as claimed in claim 1, wherein: in step S1, vector elements are inserted into the buckets associated with the membership grid, specifically,
s13, inserting the current element into the R tree associated with the membership grid according to the insertion algorithm of the R tree index;
and S14, adding the ID or the pointer of the current element to the array corresponding to the bucket associated with the membership grid.
5. The QRB tree indexing method for massive vector data search according to any one of claims 1-4, wherein: in step S1, a random access structure is established to store all non-empty R trees and non-empty buckets, specifically,
s15, initializing the R tree address array and the barrel address array, setting the initial value as a null address, and setting the R tree address array and the barrel address array to be the same in size
Figure FDA0003322047450000021
And N iskL0 is the maximum allowable level for the total number of k-th level quadtree grids;
s16, sequentially splicing all non-empty R trees according to the order of the quadtree grid coding, storing the result in an R tree set file, registering the head address of each non-empty R tree in the R tree set file on the c-th element of the R tree address array, wherein c is the code of the quadtree grid associated with the current non-empty R tree;
s17, sequentially splicing all non-empty buckets according to the order of the quadtree grid codes, storing the result to a bucket set file, registering the head address of each non-empty bucket in the bucket set file on the c-th element of the bucket address array, wherein c is the code of the quadtree grid associated with the current non-empty bucket;
s18, writing the R tree address array and the bucket address array into corresponding files respectively, wherein the R tree address array file, the bucket address array file, the R tree set file and the bucket set file form a random access structure, namely QRB tree index.
6. The QRB tree indexing method for massive vector data retrieval as claimed in claim 5, wherein: in step S16 and step S17, the specific calculation method of encoding the quadtree grid is as follows:
Figure FDA0003322047450000022
wherein N iskAnd M is the total number of the k-th level quad-tree grids, M is the number of columns of the current grid unit in the l-th level quad-tree grids, and i and j are the column and row sequence numbers of the current grid unit in the l-th level quad-tree grids.
In step S2, the rectangular search frame is decomposed into a plurality of hit grids, specifically, the current rectangular search frame is divided by using a quadtree grid division method to generate all levels smaller than L0And the quadtree grid intersecting the current rectangular search box is the hit grid.
7. The QRB tree indexing method for massive vector data search according to any one of claims 1-4, wherein: in step S2, dividing the hit mesh into an internal hit mesh, a boundary hit mesh and a corner hit mesh; respectively retrieving from the related barrel or R tree according to the type of the hit grid so as to obtain an internal candidate element set, a boundary candidate element set and an angular point candidate element set; the fine intersection test is used for eliminating the elements which are not intersected with the rectangular retrieval frame in the corner candidate element set to obtain a new corner candidate element set, specifically,
s21, loading the R tree address array file and the barrel address array file from the QRB tree index to a memory and residing;
s22, dividing the hit grids into internal hit grids, boundary hit grids and corner hit grids according to the position relation between the hit grids and the rectangular retrieval frame; calculating the code of each hit grid according to the formula (2) to respectively obtain an internal hit grid code set CIBoundary hit trellis code set CBAnd corner hit trellis code set CC
S23, encoding set C by using internal hit gridIObtaining hit elements to obtain an internal candidate element set piI
S24, encoding set C by using boundary hit gridBObtaining hit elements to obtain a boundary candidate element set piB
S25, using corner point to hit the grid coding set CCObtaining hit elements to obtain a corner point candidate element set piCObtaining a new corner point candidate element set pi 'by excluding elements which do not intersect with the rectangular search box through a fine intersection test'C
8. The QRB tree indexing method for massive vector data retrieval as claimed in claim 7, wherein: in step S23, the set C is encoded using an internal hit trellisIObtaining hit elements to obtain an internal candidate element set piISpecifically, the method comprises the following steps of,
inner hit trellis encoding set C based on continuity of encodingIPartitioning, namely, for each partition, acquiring a head address and a tail address of a hit bucket from a bucket address array according to a head and tail grid code of a continuous interval, namely an element index number of the bucket address array, loading hit bucket information from a bucket set file according to the head and tail addresses of the hit bucket, reading hit element information according to the read hit bucket information, and adding the hit element information to an internal candidate element set piI
9. The QRB tree indexing method for massive vector data retrieval as claimed in claim 7, wherein: in step S24, boundary hit trellis code set C is usedBObtaining hit elements to obtain a boundary candidate element set piBSpecifically, the method comprises the following steps of,
boundary hit trellis encoding set C based on the continuity of encodingBPartitioning, namely for each partition, acquiring first and last addresses of a hit R tree from an R tree address array according to first and last grid codes of a continuous interval, namely element index numbers of the R tree address array, and reading all index data of the hit R tree from an R tree set file at one time according to the first and last addresses; decomposing the index data into mutually independent hit R tree index data in the memory; and then, searching each hit R tree by using the current rectangular search frame, and adding the result into the boundary candidate element set piB
10. The QRB tree indexing method for massive vector data retrieval as claimed in claim 7, wherein: in step S25, the corner points are used to hit the trellis code set CCObtaining hit elements to obtain a corner point candidate element set piCObtaining a new corner point candidate element set pi 'by excluding elements which do not intersect with the rectangular search box through a fine intersection test'CSpecifically, the method comprises the following steps of,
s251, for corner hit grid coding set CCFirstly, acquiring an R tree first address corresponding to the current grid code from an R tree address array for each grid code;
s252, reading corresponding index data of hit R trees from the R tree set file according to the first address, retrieving the hit R trees by using the current rectangular retrieval frame, and adding the result to the corner point candidate element set piC
S253, repeating the steps S251 and S252 until the corner points hit the grid code set CCAfter each grid coding in the grid is finished, a corner point candidate element set pi is obtainedC
S254, according to four corner points of the rectangular retrieval frame, a corner point candidate element set pi obtained in the S253 is obtainedCAre divided into pi in turnC1、ΠC2、ΠC3And piC4
S255, for the ith corner, searching the R tree corresponding to the hit grid corresponding to the corner by using the corner to obtain a result set
Figure FDA0003322047450000041
Further testing one by one
Figure FDA0003322047450000042
If the element does not intersect the current rectangular search box, it is selected from the set ΠCiRemoving;
s256, repeating the step S255 until the four angular points are processed, and combining the pi setsC1、ΠC2、ΠC3And piC4Pi 'as new corner point candidate set'C
CN202111251118.8A 2021-10-26 2021-10-26 QRB tree indexing method for massive vector data retrieval Pending CN113946584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111251118.8A CN113946584A (en) 2021-10-26 2021-10-26 QRB tree indexing method for massive vector data retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111251118.8A CN113946584A (en) 2021-10-26 2021-10-26 QRB tree indexing method for massive vector data retrieval

Publications (1)

Publication Number Publication Date
CN113946584A true CN113946584A (en) 2022-01-18

Family

ID=79332640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111251118.8A Pending CN113946584A (en) 2021-10-26 2021-10-26 QRB tree indexing method for massive vector data retrieval

Country Status (1)

Country Link
CN (1) CN113946584A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794938A (en) * 2023-02-06 2023-03-14 中国人民解放军国防科技大学 Visualization method and device for geographic vector line data and computer equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001014338A (en) * 1999-06-30 2001-01-19 Hitachi Software Eng Co Ltd Method and system for graphic data management, and storage medium
CN103049554A (en) * 2012-12-31 2013-04-17 吴立新 Parallel indexing technology for vector QR trees
CN105138560A (en) * 2015-07-23 2015-12-09 北京天耀宏图科技有限公司 Multilevel spatial index technology based distributed space vector data management method
CN110059067A (en) * 2019-04-04 2019-07-26 南京南瑞水利水电科技有限公司 A kind of water conservancy space vector big data memory management method
CN111639075A (en) * 2020-05-09 2020-09-08 武汉大学 Non-relational database vector data management method based on flattened R tree
CN111723096A (en) * 2020-06-23 2020-09-29 重庆市计量质量检测研究院 Spatial data indexing method integrating GeoHash and Quadtree

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001014338A (en) * 1999-06-30 2001-01-19 Hitachi Software Eng Co Ltd Method and system for graphic data management, and storage medium
CN103049554A (en) * 2012-12-31 2013-04-17 吴立新 Parallel indexing technology for vector QR trees
CN105138560A (en) * 2015-07-23 2015-12-09 北京天耀宏图科技有限公司 Multilevel spatial index technology based distributed space vector data management method
CN110059067A (en) * 2019-04-04 2019-07-26 南京南瑞水利水电科技有限公司 A kind of water conservancy space vector big data memory management method
CN111639075A (en) * 2020-05-09 2020-09-08 武汉大学 Non-relational database vector data management method based on flattened R tree
CN111723096A (en) * 2020-06-23 2020-09-29 重庆市计量质量检测研究院 Spatial data indexing method integrating GeoHash and Quadtree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
余接情;吴立新;杨宜舟;谢磊;贾永基: "基于地球***空间格网的全球海量多维数据的组织方法", 2013年中国地理信息科学理论与方法学术年会, 12 October 2013 (2013-10-12) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794938A (en) * 2023-02-06 2023-03-14 中国人民解放军国防科技大学 Visualization method and device for geographic vector line data and computer equipment

Similar Documents

Publication Publication Date Title
CN107766433B (en) Range query method and device based on Geo-BTree
CN112181991B (en) Earth simulation system grid remapping method based on rapid construction of KD tree
CN108009265B (en) Spatial data indexing method in cloud computing environment
CN112765405B (en) Method and system for clustering and inquiring spatial data search results
CN111522892B (en) Geographic element retrieval method and device
CN110765331A (en) Retrieval method and system of spatio-temporal data
CN111221813B (en) Database index and processing method, device and equipment for database query
CN111552694A (en) Self-adaptive geographic space grid indexing method
CN111414445B (en) Address inverse analysis method applying geographic information
Yuan et al. Iterative graph feature mining for graph indexing
CN107491495A (en) Storage method of the preferential space-time trajectory data file of space attribute in auxiliary storage device
CN112364188A (en) Index establishing method of remote sensing image, remote sensing image retrieval method and device
CN113946584A (en) QRB tree indexing method for massive vector data retrieval
CN113326343B (en) Road network data storage method and system based on multi-level grids and file indexes
Wang et al. Space filling curve based point clouds index
Mao et al. Comprehensive comparison of LSM architectures for spatial data
CN114861444B (en) Unstructured grid earth system mode observation sparsification method based on KD tree
Jones et al. The implicit triangulated irregular network and multiscale spatial databases
Hamdi et al. A pattern growth-based approach for mining spatiotemporal co-occurrence patterns
CN113360551B (en) Method and system for storing and rapidly counting time sequence data in shooting range
CN102436453B (en) Method and device for processing parent-child dimension
CN115145930A (en) GIS vector data hierarchical coding method and device based on tree hierarchical index
CN112686468B (en) Public facility stability optimization method
CN114741388A (en) Novel construction method for integrated circuit layout data index
CN114077581A (en) Database based on data aggregation storage mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination