CN114722139A - Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof - Google Patents
Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof Download PDFInfo
- Publication number
- CN114722139A CN114722139A CN202210241696.1A CN202210241696A CN114722139A CN 114722139 A CN114722139 A CN 114722139A CN 202210241696 A CN202210241696 A CN 202210241696A CN 114722139 A CN114722139 A CN 114722139A
- Authority
- CN
- China
- Prior art keywords
- time
- attribute
- sub
- node
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a space-time multi-attribute index method capable of self-adapting dynamic expansion and a retrieval method thereof. The method comprises the following steps: constructing a document set to be indexed, and constructing a tree data structure of the document set to be indexed, wherein the tree data structure comprises: a root node and a leaf node; the downward expansion of the root node comprises a plurality of levels of time multi-attribute nodes, the downward expansion of the time multi-attribute nodes comprises a plurality of levels of space multi-attribute nodes, the root node is represented by a root node chain table, the time multi-attribute nodes are represented by a time multi-attribute node chain table, the space multi-attribute nodes are represented by a space multi-attribute node chain table, the leaf nodes are represented by element structures, and each document in the document set to be indexed is stored in a tree-shaped data structure. By adopting the method, a multi-layer index structure can be realized, thereby providing a mechanism for subsequent index self-adaptive adjustment.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a time-space multi-attribute index method capable of adaptive dynamic scaling and a retrieval method thereof.
Background
The space-time multi-attribute index refers to an index which can simultaneously index time information, space information and other multi-attribute information, namely the index established finally is only 1 set, and the index established for each dimension is not 1 set. After the data containing time, space and multiple attributes are indexed by the space-time multiple-attribute index structure, a user can only simultaneously search the time, the space and other attributes aiming at the 1 set of index. In contrast, the index construction mode of the traditional concept is to build indexes one by one according to time, space and a plurality of attributes, and a system needs to deliver query conditions to each index for retrieval when a user retrieves, and finally, summarize and filter. It is clear that the spatio-temporal multiattribute index is a better technology in terms of saving storage space and searching efficiency.
However, in the current spatio-temporal retrieval method, since the spatio-temporal multiattribute index structure covers more data items (time, space and other numerous indexes), the storage space occupied by the general spatio-temporal multiattribute index is larger than that occupied by the index with a single dimension. The above-mentioned related techniques do not consider how the storage overhead of the null multi-attribute index can be adaptively adjusted in resource-limited (e.g., storage-limited) environments. In addition, the construction efficiency of the index is one of performance indexes for measuring the index, in order to pursue higher query efficiency, the index is often finely divided when the search space is divided, a large amount of pruning can be performed when index retrieval is utilized, so that the search efficiency is accelerated, but a large amount of time is required when the index is constructed, and how to balance the construction efficiency and the query efficiency is one of the problems that the space-time multi-attribute index needs to be considered. The above techniques do not consider this problem, and cannot make the index construction efficiency dynamically adaptive scale.
Disclosure of Invention
In view of the above, it is necessary to provide a spatio-temporal multiattribute index method capable of adaptive dynamic scaling and a search method thereof.
An adaptive dynamically scalable spatio-temporal multiattribute indexing method, the method comprising:
constructing a document set to be indexed; each document in the document set to be indexed comprises: temporal information, spatial information, and word lists;
constructing a tree data structure of the document set to be indexed; the tree data structure includes: a root node and a leaf node; the downward expansion of the root node comprises multiple levels of time multiple attribute nodes, the downward expansion of the time multiple attribute nodes comprises multiple levels of space multiple attribute nodes, the root node is represented by a root node chain table, the time multiple attribute nodes are represented by a time multiple attribute node chain table, the space multiple attribute nodes are represented by a space multiple attribute node chain table, and the root node chain table comprises: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and a pointer to the next level node, the spatial multi-attribute node linked list comprising: a least-squares bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node, the leaf node represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses;
and storing each document in the document set to be indexed to the tree-shaped data structure.
In one embodiment, the method further comprises the following steps: extracting time information, space information and a word list of each document in the document set to be indexed;
mapping the word list by using a bitmap index to obtain a bitmap element;
inquiring the root node linked list according to the time information, so that the time information is contained in the time value of the root node linked list, and obtaining a time value element;
when the time multi-attribute node linked list is included, inquiring the time multi-attribute node linked list to enable the time information to be included in the time information of the time multi-attribute node linked list until no next-stage time multi-attribute node linked list exists;
and inserting each document into a next-level space multi-attribute node linked list of the time multi-attribute node linked list by using an R tree insertion algorithm according to the space information of each document in the document set to be indexed until the leaf nodes are inserted.
In one embodiment, the method further comprises the following steps: according to the time information D.t, querying the root node linked list, determining an element rln of which the time value contains D.t, and constructing a hit relation as follows:
rln.bmi=rln.bmi|blw
wherein bmi is a bitmap index, blw is a bitmap element;
if element rln is not queried, a root node chain table element rln is created and inserted into the root node, and let rln.bmi ═ rln.bmi | blw.
In one embodiment, the method further comprises the following steps: when the time multi-attribute node linked list is included, inquiring an element rln1 of which value includes D.t in the next level time multi-attribute node of the root node linked list element rln, and constructing a hit relation as follows:
rln1.bmi=rln1.bmi|blw
if rln1 is not queried, a time multi-attribute node is created and inserted into the parent node and associated with element rln1, rl n1.bmi ═ rl n1.bmi | blw, until element rln1 has no next level time multi-attribute node linked list.
In one embodiment, the method further comprises the steps of starting from the set initial proportion M%, constructing subsets of different sizes of the document set DS to be indexed according to the scale proportion of (M% + δ) with the step δ as l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
extracting spatial information s and time information t of each document in each document set DS _ subi in DS _ sub, and extracting all words in the document by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiTransforming the different temporal level progression to construct the tree-like data structureA pair of DS _ subiIncreasing the hierarchy of the time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
when | DS _ sub | ═ n, n × m indexes of the tree data structure are obtained, and the storage amount stor of the indexes of the n × m tree data structures is recorded;
for each DS _ subiAll the spatial information s, the time information t and the lw in the vector vsubi;
Will DS _ subiSetting the storage quantity obtained by using an index construction algorithm with j hierarchy as a time hierarchy as stori,j;
Establishing a mapping<v_subi,stori,j>→j;
And training all the mappings by using an autoregressive model to obtain a machine learning model stor _ m of the space-time multi-attribute index storage mechanism.
In one embodiment, the method further comprises the following steps: reading and scanning the given storage space stor;
for a document set DS to be indexed, extracting spatial information s and time information t for each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to all the spatial information s, the time information t and the lw;
calculating < u, stor > → j according to a machine learning model stor _ m;
and performing a space-time multi-attribute index construction step by taking j as a hierarchy parameter of the time multi-attribute node.
An adaptive dynamically scalable spatiotemporal multiattribute search method, said method comprising:
acquiring a retrieval condition; the retrieval conditions comprise: a space query range, a time query range and a query keyword list;
mapping the list of query keywords to bqw;
performing intersection operation on the time query condition and the time value of each element of a root node linked list of a tree-shaped data structure in the space-time multi-attribute index method capable of self-adapting dynamic stretching of any one of claims 1 to 6 to obtain an element set r _ set;
performing an operation on bqw and the bitmap index bmi of each element in the r _ set to obtain an element set r _ set';
for each element in r _ set', intersecting value in the element child element and bqw intersecting bmi with the temporal query range until recursive to spatial multi-attribute nodes;
for each element of the spatial multi-attribute node, intersecting the MBR with the minimum bounding matrix of the element and intersecting the bmi with the bqw by using a spatial query condition until recursion to a leaf node;
and outputting a retrieval result when the spatial information, the time information and the word list in the leaf node meet the retrieval condition.
In one embodiment, the method further comprises the following steps: giving a document set DS to be indexed;
starting from N%, constructing subsets of different sizes of a document set DS to be indexed according to the scale proportion of (N% + delta) with the step length delta as l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
for each document set in DS _ subiExtracting the spatial information s and the time information t of each document in the document, and extracting words in the document by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiTransforming different time-layer series to construct an index of a tree-shaped data structure; wherein, for DS _ subiIncreasing the time hierarchy of the multi-level time multi-attribute node from 1 to m levels, and constructing indexes of m different tree data structures;
recording the construction time of the index of each tree-shaped data structure to obtain a construction time set of Wherein the content of the first and second substances,representing a representation for a document set DS _ subiThe construction time of the index of the jth tree data structure of (1);
for document set DS _ subiThe index of each tree-shaped data structure adopts a random generation space range and a time range and randomly selects a plurality of query keywords to form query conditions, carries out retrieval, calculates average retrieval response time and forms a retrieval time setRepresenting a representation for a document set DS _ subiThe statistical average retrieval time of the index of the jth tree data structure of (1);
Calculating outIs obtained to enableTemporal level p of the index of the smallest tree data structurei;
For each DS _ subiAll spatial information s, temporal information t and lw in the vector v _ sub are constructedi;
Establishing a mapping v _ subi→pi;
And training all mappings by using an autoregressive model to obtain a machine learning model brbal _ m of a space-time multi-attribute index construction and retrieval efficiency balance mechanism.
In one embodiment, the method further comprises the following steps: for a given document set DS to be indexed, extracting spatial information s and time information t from each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to the spatial information s, the time information t and the lw;
calculating v → p according to the machine learning model brbal _ m;
and performing a construction process of the space-time multi-attribute index by taking p as a hierarchy parameter of the time multi-attribute node.
The above space-time multi-attribute index method and retrieval method thereof capable of adaptive dynamic expansion provide a tree data structure of space-time multi-attribute index, in the structure, the tree data structure includes: root node and leaf node root node expand downwards and include multistage time multiattribute node, and time multiattribute node expands downwards and includes multistage space multiattribute node, and root node represents through the root node linked list, and time multiattribute node represents through time multiattribute node linked list, and space multiattribute node represents through space multiattribute node linked list, includes in the root node linked list: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and pointer to next level node, the spatial multi-attribute node linked list includes: a minimum bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node; the leaf nodes are represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses. The structure mainly comprises a time multi-attribute structure and a space multi-attribute structure, and the time structure and the space structure both consider the function of filtering the keywords, so that the keywords can be utilized for quick pruning during retrieval. In addition, a multi-level time multi-attribute index structure is designed by utilizing the self hierarchical characteristics of time (such as year-month-day), a mechanism is provided for the subsequent index adaptive adjustment, in addition, the bitmap index in the construction process can accelerate the construction efficiency, the combining operation of the bitmap index is bit operation, the speed is high, and the general construction process of the index is accelerated by combining with a tree-shaped searching mechanism in the insertion process.
Drawings
FIG. 1 is a flow diagram of a method for adaptive dynamic scaling spatiotemporal multiattribute indexing in one embodiment;
FIG. 2 is a diagram of a tree data structure in one embodiment;
FIG. 3 is a flow chart of a spatio-temporal multiattribute retrieval method with adaptive dynamic scaling in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In one embodiment, as shown in fig. 1, there is provided an adaptive dynamically scalable spatiotemporal multiattribute indexing method, comprising the steps of:
and 102, constructing a document set to be indexed.
Each document in the document set to be indexed comprises: temporal information, spatial information, and word lists.
And 104, constructing a tree-shaped data structure of the document set to be indexed.
The tree data structure includes: a root node and a leaf node; the root node downwards expands to comprise multi-level time multi-attribute nodes, the time multi-attribute nodes downwards expand to comprise multi-level space multi-attribute nodes, the root node is represented through a root node chain table, the time multi-attribute nodes are represented through a time multi-attribute node chain table, the space multi-attribute nodes are represented through a space multi-attribute node chain table, and the root node chain table comprises the following components: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and pointer to next level node, the spatial multi-attribute node linked list includes: a minimum bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node; the leaf nodes are represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses.
And step 106, storing each document in the document set to be indexed to a tree data structure.
The established tree data structure is shown in fig. 2.
In the above space-time multi-attribute index method capable of adaptive dynamic stretching, a tree data structure of space-time multi-attribute index is proposed, in the structure, the tree data structure includes: a root node and a leaf node; the root node downwards expands to comprise multi-level time multi-attribute nodes, the time multi-attribute nodes downwards expand to comprise multi-level space multi-attribute nodes, the root node is represented through a root node chain table, the time multi-attribute nodes are represented through a time multi-attribute node chain table, the space multi-attribute nodes are represented through a space multi-attribute node chain table, and the root node chain table comprises the following components: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and pointer to next level node, the spatial multi-attribute node linked list includes: a minimum bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node; the leaf nodes are represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses. The structure mainly comprises a time multi-attribute structure and a space multi-attribute structure, and the time structure and the space structure both consider the function of filtering the keywords, so that the keywords can be utilized for quick pruning during retrieval. In addition, a multi-level time multi-attribute index structure is designed by utilizing the self hierarchical characteristics of time (such as year-month-day), a mechanism is provided for the subsequent index adaptive adjustment, in addition, the bitmap index in the construction process can accelerate the construction efficiency, the combining operation of the bitmap index is bit operation, the speed is high, and the general construction process of the index is accelerated by combining with a tree-shaped searching mechanism in the insertion process.
In one embodiment, the time information, the spatial information and the word list of each document in the document set to be indexed are extracted; mapping the word list by using a bitmap index to obtain a bitmap element; inquiring the root node linked list according to the time information, so that the time information is contained in the time value of the root node linked list, and obtaining a time value element; when the time multi-attribute node linked list is included, inquiring the time multi-attribute node linked list to enable the time information to be included in the time information of the time multi-attribute node linked list until no next-stage time multi-attribute node linked list exists; and inserting each document into a next-level space multi-attribute node linked list of the time multi-attribute node linked list by using an R tree insertion algorithm according to the space information of each document in the document set to be indexed until leaf nodes are inserted.
In one embodiment, the root node linked list is queried according to time information D.t to determine that the time value contains element rln of D.t, and the hit relationship is constructed as follows:
rln.bmi=rln.bmi|blw
wherein bmi is a bitmap index, and blw is a bitmap element;
if element rln is not queried, a root node chain table element rln is created and inserted into the root node, and let rln.
In one embodiment, when a time multi-attribute node linked list is included, the value included element rln1 of D.t in the next level time multi-attribute node of the root node linked list element rln is queried, and the hit relationship is constructed as follows:
rln1.bmi=rln1.bmi|blw
if rln1 is not queried, a time multi-attribute node is created and inserted into the parent node and associated with element rln1, rln1.bmi ═ rln1.bmi | blw until element rln1 has no next level time multi-attribute node linked list.
In one embodiment, it is optimal to solve how many levels of the temporal indexing structure are employed under how much data to index and how much defined storage space. Time, space, keywords and storage space are extracted as mapping features for training, so that the optimized key characteristics are embodied, and the training data volume is reduced.
Specifically, starting from the set initial ratio M%, with the step δ as l%, constructing subsets of different sizes of the document set DS to be indexed according to the scale ratio of (M% + δ): the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set; m may take 10.
For each document set in DS _ subiExtracting the spatial information s and the time information t of each document, and extracting all words in the documents by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiAnd transforming different time-layer progression to construct the index of the tree data structure, wherein the DS _ sub isiIncreasing the hierarchy of the time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
when | DS _ sub | ═ n, n × m indexes of the tree data structure are obtained, and the storage amount stor of the indexes of the n × m tree data structures is recorded;
for each DS _ subiAll spatial information s, temporal information t and lw in (a) construct a vector v _ subi;
Will DS _ subiSetting the storage quantity obtained by using an index construction algorithm with j hierarchy as stori,j;
Establishing a mapping<v_subi,stori,j>→j;
And training all the mappings by using an autoregressive model to obtain a machine learning model stor _ m of the space-time multi-attribute index storage mechanism.
In one of the embodiments, the memory space stor given by the read scan;
for a document set DS to be indexed, extracting spatial information s and time information t for each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to all the spatial information s, the time information t and lw;
calculating < v, stor > → j according to a machine learning model stor _ m;
and performing a space-time multi-attribute index construction step by taking j as a hierarchy parameter of the time multi-attribute node.
In the method, the trained storage optimization model is utilized, indexes of different time levels can be constructed along with the preset size of the storage space for the indexes, and the indexes have intelligent flexibility in practice, so that the method is particularly suitable for transparent adjustment and application on the cloud.
In one embodiment, as shown in fig. 3, an adaptive dynamic scaling spatiotemporal multiattribute search method is provided, which includes:
The search conditions include: a spatial query scope, a temporal query scope, and a query keyword list.
And step 306, performing intersection operation on the time query condition and the time value of each element of the root node linked list of the tree data structure in the space-time multi-attribute index method capable of self-adaptive dynamic expansion to obtain an element set r _ set.
At step 310, for each element in r _ set', the value in the element child element is intersected with the temporal query scope and bqw is intersected with bmi until recursive to spatial multi-attribute nodes.
And step 314, outputting a retrieval result when the space information, the time information and the word list in the leaf node meet the retrieval condition.
In the space-time multi-attribute retrieval method capable of self-adapting dynamic expansion, the time multi-attribute structure and the space multi-attribute structure are fully utilized to quickly filter the time-key words and the space-key words so as to realize high-efficiency retrieval.
In one embodiment, a machine learning training process for optimizing search efficiency balance is provided, which includes:
1. giving a document set DS to be indexed;
2. starting from N%, constructing subsets of different sizes of a document set DS to be indexed according to the scale proportion of (N% + delta) with the step length delta as l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
3. extracting spatial information s and time information t of each document in each document set DS _ subi in DS _ sub, and extracting words in the documents by using a word segmentation component to form a word list lw;
4. for each document set DS _ sub in each DS _ subiTransforming different time-layer series to construct an index of a tree-shaped data structure; wherein, for DS _ subiIncreasing the time hierarchy of the multi-level time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
5. recording the construction time of the index of each tree-shaped data structure to obtain a construction time set of Wherein the content of the first and second substances,representing a representation for a document set DS _ subiThe construction time of the index of the jth tree data structure of (1);
6. for document set DS _ subiThe index of each tree-shaped data structure adopts a random generation space range and a time range and randomly selects a plurality of query keywords to form a query condition, carries out retrieval, calculates average retrieval response time and forms a retrieval time setRepresenting a representation for a document set DS _ subiThe statistical average retrieval time of the index of the jth tree data structure of (1);
8. ComputingIs obtained to enableTemporal level p of the index of the smallest tree data structurei;
9. For each DS _ subiAll spatial information s, temporal information t and lw in (a) construct a vector v _ subi;
10. Establishing a mapping v _ subi→pi;
11. And training all the mappings by using an autoregressive model to obtain a machine learning model brbal _ m of a space-time multi-attribute index construction and retrieval efficiency balance mechanism.
In this embodiment, the query efficiency will be improved if there are many levels of index construction, but the construction efficiency will be reduced, whereas otherwise there are few levels of index construction, and the construction efficiency is high, but the query efficiency is low, so a balance point is to be found. The method mainly solves the problem that under the condition of large data volume to be indexed, the query efficiency and the construction efficiency can be balanced by adopting the time structure of the number of levels.
In one embodiment, for a given document set DS to be indexed, for each document D, extracting spatial information s and time information t from the document D, and extracting all words in the document by using a word segmentation component to form a word list lw;
constructing a vector v according to the spatial information s, the time information t and the lw;
calculating v → p according to the machine learning model brbal _ m;
and performing a construction process of the space-time multi-attribute index by taking p as a hierarchy parameter of the time multi-attribute node.
In the embodiment, the number of the levels of the time structure can be automatically adjusted according to different data volumes to be indexed when the index is constructed, so that the efficiency balance of construction and query is realized.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (9)
1. An adaptive dynamic scalable spatio-temporal multiattribute indexing method, characterized in that the method comprises:
constructing a document set to be indexed; each document in the document set to be indexed comprises: temporal information, spatial information, and word lists;
constructing a tree data structure of the document set to be indexed; the tree data structure includes: a root node and a leaf node; the downward expansion of the root node comprises multiple levels of time multiple attribute nodes, the downward expansion of the time multiple attribute nodes comprises multiple levels of space multiple attribute nodes, the root node is represented by a root node chain table, the time multiple attribute nodes are represented by a time multiple attribute node chain table, the space multiple attribute nodes are represented by a space multiple attribute node chain table, and the root node chain table comprises: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and a pointer to the next level node, the spatial multi-attribute node linked list comprising: a minimum bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node; the leaf nodes are represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses;
and storing each document in the document set to be indexed to the tree-shaped data structure.
2. The method of claim 1, wherein storing each document in the set of documents to be indexed to the tree data structure comprises:
extracting time information, space information and a word list of each document in the document set to be indexed;
mapping the word list by using a bitmap index to obtain a bitmap element;
inquiring the root node linked list according to the time information, so that the time information is contained in the time value of the root node linked list, and obtaining a time value element;
when the time multi-attribute node linked list is included, inquiring the time multi-attribute node linked list to enable the time information to be included in the time information of the time multi-attribute node linked list until no next-stage time multi-attribute node linked list exists;
and inserting each document into a next-level space multi-attribute node linked list of the time multi-attribute node linked list by using an R tree insertion algorithm according to the space information of each document in the document set to be indexed until the leaf nodes are inserted.
3. The method of claim 2, wherein querying the root node linked list according to the time information so that the time information is included in a time value of the root node linked list to obtain a time value element comprises:
according to the time information D.t, querying the root node linked list, determining an element rln of which the time value contains D.t, and constructing a hit relation as follows:
rln.bmi=rln.bmi|blw
wherein bmi is a bitmap index, and blw is a bitmap element;
if element rln is not queried, a root node chain table element rln is created and inserted into the root node, and let rln.bmi ═ rln.bmi | blw.
4. The method of claim 3, wherein when a time multiattribute node linked list is included, querying the time multiattribute node linked list so that time information is included in the time information for the time multiattribute node linked list until there is no next level of time multiattribute node linked list comprises:
when the time multi-attribute node linked list is included, inquiring an element rln1 of which value includes D.t in the next level time multi-attribute node of the root node linked list element rln, and constructing a hit relation as follows:
rln1.bmi=rln1.bmi|blw
if rln1 is not queried, a time multi-attribute node is created and inserted into the parent node and associated with element rln1, rln1.bmi ═ rln1.bmi | blw until element rln1 has no next level time multi-attribute node linked list.
5. The method according to any one of claims 1 to 4, further comprising:
starting from the set initial proportion M%, constructing subsets of different sizes of the document set DS to be indexed according to the scale proportion of (M% + delta) with the step length delta as l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
for each document set in DS _ subiExtracting the spatial information s and the time information t of each document, and extracting all words in the documents by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiAnd transforming different time-layer progression to construct the index of the tree data structure, wherein the DS _ sub isiIncreasing the hierarchy of the time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
when | DS _ sub | ═ n, n × m indexes of the tree data structure are obtained, and the storage amount stor of the indexes of the n × m tree data structures is recorded;
for each DS _ subiAll spatial information s, temporal information t and lw in (a) construct a vector v _ subi;
Will DS _ subiSetting the storage quantity obtained by using an index construction algorithm with j hierarchy as a time hierarchy as stori,j;
Establishing a mapping<v_subi,stori,j>→j;
And training all the mappings by using an autoregressive model to obtain a machine learning model stor _ m of the space-time multi-attribute index storage mechanism.
6. The method of claim 5, further comprising:
reading and scanning the given storage space stor;
for a document set DS to be indexed, extracting spatial information s and time information t for each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to all the spatial information s, the time information t and the lw;
calculating < v, stor > → j according to a machine learning model stor _ m;
and performing a space-time multi-attribute index construction step by taking j as a hierarchy parameter of the time multi-attribute node.
7. An adaptive dynamic scalable spatiotemporal multiattribute retrieval method, characterized in that the method comprises:
acquiring a retrieval condition; the retrieval conditions comprise: a space query range, a time query range and a query keyword list;
mapping the list of query keywords to bqw;
performing intersection operation on the time query condition and the time value of each element of a root node linked list of a tree-shaped data structure in the space-time multi-attribute index method capable of self-adapting dynamic stretching of any one of claims 1 to 6 to obtain an element set r _ set;
performing an operation on bqw and the bitmap index bmi of each element in the r _ set to obtain an element set r _ set';
for each element in r _ set', intersecting value in the element child element and bqw intersecting bmi with the temporal query range until recursive to spatial multi-attribute nodes;
for each element of the spatial multi-attribute node, intersecting the minimum definition matrix MBR in the element and intersecting the minimum definition matrix bqw with bmi by using a spatial query condition until recursive to a leaf node;
and outputting a retrieval result when the spatial information, the time information and the word list in the leaf node meet the retrieval condition.
8. The retrieval method of claim 7, wherein the method further comprises:
giving a document set DS to be indexed;
starting from N%, constructing subsets of different sizes of a document set DS to be indexed according to the scale proportion of (N% + delta) with the step length delta of l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
for each document set in DS _ subiExtracting the spatial information s and the time information t of each document, and extracting words in the documents by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiTransforming different time-layer series to construct an index of a tree-shaped data structure; wherein, for DS _ subiIncreasing the time hierarchy of the multi-level time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
recording the construction time of the index of each tree-shaped data structure to obtain a construction time set of Wherein the content of the first and second substances,representing a representation for a document set DS _ subiThe construction time of the index of the jth tree data structure of (1);
for document set DS _ subiEach tree of (1)Indexing of the shape data structure, forming query conditions by randomly generating space range and time range and randomly selecting a plurality of query keywords, searching, calculating average search response time, and forming a search time set Representing a representation for a document set DS _ subiThe statistical average retrieval time of the index of the jth tree data structure of (1);
For each DS _ subiAll spatial information s, temporal information t and lw in (a) construct a vector v _ subi;
Establishing a mapping v _ subi→pi;
And training all the mappings by using an autoregressive model to obtain a machine learning model brbal _ m of a space-time multi-attribute index construction and retrieval efficiency balance mechanism.
9. The retrieval method of claim 8, wherein the method further comprises:
for a given document set DS to be indexed, extracting spatial information s and time information t from each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to the spatial information s, the time information t and the lw;
calculating v → p according to the machine learning model brbal _ m;
and performing a construction process of the space-time multi-attribute index by taking p as a hierarchy parameter of the time multi-attribute node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210241696.1A CN114722139A (en) | 2022-03-11 | 2022-03-11 | Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210241696.1A CN114722139A (en) | 2022-03-11 | 2022-03-11 | Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114722139A true CN114722139A (en) | 2022-07-08 |
Family
ID=82238124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210241696.1A Pending CN114722139A (en) | 2022-03-11 | 2022-03-11 | Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114722139A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115809360A (en) * | 2023-02-08 | 2023-03-17 | 深圳大学 | Large-scale space-time stream data real-time space connection query method and related equipment |
CN117389954A (en) * | 2023-12-13 | 2024-01-12 | 湖南汇智兴创科技有限公司 | Online multi-version document content positioning method, device, equipment and medium |
-
2022
- 2022-03-11 CN CN202210241696.1A patent/CN114722139A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115809360A (en) * | 2023-02-08 | 2023-03-17 | 深圳大学 | Large-scale space-time stream data real-time space connection query method and related equipment |
CN115809360B (en) * | 2023-02-08 | 2023-05-05 | 深圳大学 | Real-time space connection query method for large-scale space-time data and related equipment |
CN117389954A (en) * | 2023-12-13 | 2024-01-12 | 湖南汇智兴创科技有限公司 | Online multi-version document content positioning method, device, equipment and medium |
CN117389954B (en) * | 2023-12-13 | 2024-03-29 | 湖南汇智兴创科技有限公司 | Online multi-version document content positioning method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220261427A1 (en) | Methods and system for semantic search in large databases | |
JP5858432B2 (en) | Method, system, and computer program product for providing a distributed associative memory base | |
US11347741B2 (en) | Efficient use of TRIE data structure in databases | |
CN114722139A (en) | Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof | |
CN111868710B (en) | Random extraction forest index structure for searching large-scale unstructured data | |
US20100106713A1 (en) | Method for performing efficient similarity search | |
CN111581215B (en) | Array tree data storage method, fast search method and readable storage medium | |
CN108399213B (en) | User-oriented personal file clustering method and system | |
CN108304409B (en) | Carry-based data frequency estimation method of Sketch data structure | |
CN106557777A (en) | It is a kind of to be based on the improved Kmeans clustering methods of SimHash | |
Skopal et al. | Nearest Neighbours Search using the PM-tree | |
CN116738988A (en) | Text detection method, computer device, and storage medium | |
Günnemann et al. | Subspace clustering for indexing high dimensional data: a main memory index based on local reductions and individual multi-representations | |
CN113722274A (en) | Efficient R-tree index remote sensing data storage model | |
CN113297266B (en) | Data processing method, device, equipment and computer storage medium | |
WO2023246849A1 (en) | Feedback data graph generation method and refrigerator | |
CN113688702B (en) | Street view image processing method and system based on fusion of multiple features | |
CN110955827B (en) | By using AI 3 Method and system for solving SKQwyy-not problem | |
Terry et al. | Indexing method for multidimensional vector data | |
CN116680367B (en) | Data matching method, data matching device and computer readable storage medium | |
JP2002073390A (en) | Recording medium in which multi-dimensional spatial data structure is recorded, method of updating multi- dimension spatial data, method of searching multi- dimensional spatial data, and recording medium in which program for performing the methods are recorded | |
Terry et al. | Variable granularity space filling curve for indexing multidimensional data | |
CN116910337A (en) | Entity object circle selection method, query method, device, server and medium | |
CN118133044A (en) | Problem extension method, device, computer equipment, storage medium and product | |
CN116136958A (en) | Document processing method, apparatus, computer program product, and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |