CN114722139A - Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof - Google Patents

Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof Download PDF

Info

Publication number
CN114722139A
CN114722139A CN202210241696.1A CN202210241696A CN114722139A CN 114722139 A CN114722139 A CN 114722139A CN 202210241696 A CN202210241696 A CN 202210241696A CN 114722139 A CN114722139 A CN 114722139A
Authority
CN
China
Prior art keywords
time
attribute
sub
node
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210241696.1A
Other languages
Chinese (zh)
Inventor
张翀
葛斌
赵翔
何春辉
肖卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210241696.1A priority Critical patent/CN114722139A/en
Publication of CN114722139A publication Critical patent/CN114722139A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a space-time multi-attribute index method capable of self-adapting dynamic expansion and a retrieval method thereof. The method comprises the following steps: constructing a document set to be indexed, and constructing a tree data structure of the document set to be indexed, wherein the tree data structure comprises: a root node and a leaf node; the downward expansion of the root node comprises a plurality of levels of time multi-attribute nodes, the downward expansion of the time multi-attribute nodes comprises a plurality of levels of space multi-attribute nodes, the root node is represented by a root node chain table, the time multi-attribute nodes are represented by a time multi-attribute node chain table, the space multi-attribute nodes are represented by a space multi-attribute node chain table, the leaf nodes are represented by element structures, and each document in the document set to be indexed is stored in a tree-shaped data structure. By adopting the method, a multi-layer index structure can be realized, thereby providing a mechanism for subsequent index self-adaptive adjustment.

Description

Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a time-space multi-attribute index method capable of adaptive dynamic scaling and a retrieval method thereof.
Background
The space-time multi-attribute index refers to an index which can simultaneously index time information, space information and other multi-attribute information, namely the index established finally is only 1 set, and the index established for each dimension is not 1 set. After the data containing time, space and multiple attributes are indexed by the space-time multiple-attribute index structure, a user can only simultaneously search the time, the space and other attributes aiming at the 1 set of index. In contrast, the index construction mode of the traditional concept is to build indexes one by one according to time, space and a plurality of attributes, and a system needs to deliver query conditions to each index for retrieval when a user retrieves, and finally, summarize and filter. It is clear that the spatio-temporal multiattribute index is a better technology in terms of saving storage space and searching efficiency.
However, in the current spatio-temporal retrieval method, since the spatio-temporal multiattribute index structure covers more data items (time, space and other numerous indexes), the storage space occupied by the general spatio-temporal multiattribute index is larger than that occupied by the index with a single dimension. The above-mentioned related techniques do not consider how the storage overhead of the null multi-attribute index can be adaptively adjusted in resource-limited (e.g., storage-limited) environments. In addition, the construction efficiency of the index is one of performance indexes for measuring the index, in order to pursue higher query efficiency, the index is often finely divided when the search space is divided, a large amount of pruning can be performed when index retrieval is utilized, so that the search efficiency is accelerated, but a large amount of time is required when the index is constructed, and how to balance the construction efficiency and the query efficiency is one of the problems that the space-time multi-attribute index needs to be considered. The above techniques do not consider this problem, and cannot make the index construction efficiency dynamically adaptive scale.
Disclosure of Invention
In view of the above, it is necessary to provide a spatio-temporal multiattribute index method capable of adaptive dynamic scaling and a search method thereof.
An adaptive dynamically scalable spatio-temporal multiattribute indexing method, the method comprising:
constructing a document set to be indexed; each document in the document set to be indexed comprises: temporal information, spatial information, and word lists;
constructing a tree data structure of the document set to be indexed; the tree data structure includes: a root node and a leaf node; the downward expansion of the root node comprises multiple levels of time multiple attribute nodes, the downward expansion of the time multiple attribute nodes comprises multiple levels of space multiple attribute nodes, the root node is represented by a root node chain table, the time multiple attribute nodes are represented by a time multiple attribute node chain table, the space multiple attribute nodes are represented by a space multiple attribute node chain table, and the root node chain table comprises: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and a pointer to the next level node, the spatial multi-attribute node linked list comprising: a least-squares bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node, the leaf node represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses;
and storing each document in the document set to be indexed to the tree-shaped data structure.
In one embodiment, the method further comprises the following steps: extracting time information, space information and a word list of each document in the document set to be indexed;
mapping the word list by using a bitmap index to obtain a bitmap element;
inquiring the root node linked list according to the time information, so that the time information is contained in the time value of the root node linked list, and obtaining a time value element;
when the time multi-attribute node linked list is included, inquiring the time multi-attribute node linked list to enable the time information to be included in the time information of the time multi-attribute node linked list until no next-stage time multi-attribute node linked list exists;
and inserting each document into a next-level space multi-attribute node linked list of the time multi-attribute node linked list by using an R tree insertion algorithm according to the space information of each document in the document set to be indexed until the leaf nodes are inserted.
In one embodiment, the method further comprises the following steps: according to the time information D.t, querying the root node linked list, determining an element rln of which the time value contains D.t, and constructing a hit relation as follows:
rln.bmi=rln.bmi|blw
wherein bmi is a bitmap index, blw is a bitmap element;
if element rln is not queried, a root node chain table element rln is created and inserted into the root node, and let rln.bmi ═ rln.bmi | blw.
In one embodiment, the method further comprises the following steps: when the time multi-attribute node linked list is included, inquiring an element rln1 of which value includes D.t in the next level time multi-attribute node of the root node linked list element rln, and constructing a hit relation as follows:
rln1.bmi=rln1.bmi|blw
if rln1 is not queried, a time multi-attribute node is created and inserted into the parent node and associated with element rln1, rl n1.bmi ═ rl n1.bmi | blw, until element rln1 has no next level time multi-attribute node linked list.
In one embodiment, the method further comprises the steps of starting from the set initial proportion M%, constructing subsets of different sizes of the document set DS to be indexed according to the scale proportion of (M% + δ) with the step δ as l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
extracting spatial information s and time information t of each document in each document set DS _ subi in DS _ sub, and extracting all words in the document by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiTransforming the different temporal level progression to construct the tree-like data structureA pair of DS _ subiIncreasing the hierarchy of the time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
when | DS _ sub | ═ n, n × m indexes of the tree data structure are obtained, and the storage amount stor of the indexes of the n × m tree data structures is recorded;
for each DS _ subiAll the spatial information s, the time information t and the lw in the vector vsubi
Will DS _ subiSetting the storage quantity obtained by using an index construction algorithm with j hierarchy as a time hierarchy as stori,j
Establishing a mapping<v_subi,stori,j>→j;
And training all the mappings by using an autoregressive model to obtain a machine learning model stor _ m of the space-time multi-attribute index storage mechanism.
In one embodiment, the method further comprises the following steps: reading and scanning the given storage space stor;
for a document set DS to be indexed, extracting spatial information s and time information t for each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to all the spatial information s, the time information t and the lw;
calculating < u, stor > → j according to a machine learning model stor _ m;
and performing a space-time multi-attribute index construction step by taking j as a hierarchy parameter of the time multi-attribute node.
An adaptive dynamically scalable spatiotemporal multiattribute search method, said method comprising:
acquiring a retrieval condition; the retrieval conditions comprise: a space query range, a time query range and a query keyword list;
mapping the list of query keywords to bqw;
performing intersection operation on the time query condition and the time value of each element of a root node linked list of a tree-shaped data structure in the space-time multi-attribute index method capable of self-adapting dynamic stretching of any one of claims 1 to 6 to obtain an element set r _ set;
performing an operation on bqw and the bitmap index bmi of each element in the r _ set to obtain an element set r _ set';
for each element in r _ set', intersecting value in the element child element and bqw intersecting bmi with the temporal query range until recursive to spatial multi-attribute nodes;
for each element of the spatial multi-attribute node, intersecting the MBR with the minimum bounding matrix of the element and intersecting the bmi with the bqw by using a spatial query condition until recursion to a leaf node;
and outputting a retrieval result when the spatial information, the time information and the word list in the leaf node meet the retrieval condition.
In one embodiment, the method further comprises the following steps: giving a document set DS to be indexed;
starting from N%, constructing subsets of different sizes of a document set DS to be indexed according to the scale proportion of (N% + delta) with the step length delta as l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
for each document set in DS _ subiExtracting the spatial information s and the time information t of each document in the document, and extracting words in the document by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiTransforming different time-layer series to construct an index of a tree-shaped data structure; wherein, for DS _ subiIncreasing the time hierarchy of the multi-level time multi-attribute node from 1 to m levels, and constructing indexes of m different tree data structures;
recording the construction time of the index of each tree-shaped data structure to obtain a construction time set of
Figure BDA0003542396000000051
Figure BDA0003542396000000052
Wherein the content of the first and second substances,
Figure BDA0003542396000000053
representing a representation for a document set DS _ subiThe construction time of the index of the jth tree data structure of (1);
for document set DS _ subiThe index of each tree-shaped data structure adopts a random generation space range and a time range and randomly selects a plurality of query keywords to form query conditions, carries out retrieval, calculates average retrieval response time and forms a retrieval time set
Figure BDA0003542396000000054
Representing a representation for a document set DS _ subiThe statistical average retrieval time of the index of the jth tree data structure of (1);
computing
Figure BDA0003542396000000055
Is an arithmetic mean of
Figure BDA0003542396000000056
Is an arithmetic mean of
Figure BDA0003542396000000057
Calculating out
Figure BDA0003542396000000058
Is obtained to enable
Figure BDA0003542396000000059
Temporal level p of the index of the smallest tree data structurei
For each DS _ subiAll spatial information s, temporal information t and lw in the vector v _ sub are constructedi
Establishing a mapping v _ subi→pi
And training all mappings by using an autoregressive model to obtain a machine learning model brbal _ m of a space-time multi-attribute index construction and retrieval efficiency balance mechanism.
In one embodiment, the method further comprises the following steps: for a given document set DS to be indexed, extracting spatial information s and time information t from each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to the spatial information s, the time information t and the lw;
calculating v → p according to the machine learning model brbal _ m;
and performing a construction process of the space-time multi-attribute index by taking p as a hierarchy parameter of the time multi-attribute node.
The above space-time multi-attribute index method and retrieval method thereof capable of adaptive dynamic expansion provide a tree data structure of space-time multi-attribute index, in the structure, the tree data structure includes: root node and leaf node root node expand downwards and include multistage time multiattribute node, and time multiattribute node expands downwards and includes multistage space multiattribute node, and root node represents through the root node linked list, and time multiattribute node represents through time multiattribute node linked list, and space multiattribute node represents through space multiattribute node linked list, includes in the root node linked list: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and pointer to next level node, the spatial multi-attribute node linked list includes: a minimum bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node; the leaf nodes are represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses. The structure mainly comprises a time multi-attribute structure and a space multi-attribute structure, and the time structure and the space structure both consider the function of filtering the keywords, so that the keywords can be utilized for quick pruning during retrieval. In addition, a multi-level time multi-attribute index structure is designed by utilizing the self hierarchical characteristics of time (such as year-month-day), a mechanism is provided for the subsequent index adaptive adjustment, in addition, the bitmap index in the construction process can accelerate the construction efficiency, the combining operation of the bitmap index is bit operation, the speed is high, and the general construction process of the index is accelerated by combining with a tree-shaped searching mechanism in the insertion process.
Drawings
FIG. 1 is a flow diagram of a method for adaptive dynamic scaling spatiotemporal multiattribute indexing in one embodiment;
FIG. 2 is a diagram of a tree data structure in one embodiment;
FIG. 3 is a flow chart of a spatio-temporal multiattribute retrieval method with adaptive dynamic scaling in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In one embodiment, as shown in fig. 1, there is provided an adaptive dynamically scalable spatiotemporal multiattribute indexing method, comprising the steps of:
and 102, constructing a document set to be indexed.
Each document in the document set to be indexed comprises: temporal information, spatial information, and word lists.
And 104, constructing a tree-shaped data structure of the document set to be indexed.
The tree data structure includes: a root node and a leaf node; the root node downwards expands to comprise multi-level time multi-attribute nodes, the time multi-attribute nodes downwards expand to comprise multi-level space multi-attribute nodes, the root node is represented through a root node chain table, the time multi-attribute nodes are represented through a time multi-attribute node chain table, the space multi-attribute nodes are represented through a space multi-attribute node chain table, and the root node chain table comprises the following components: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and pointer to next level node, the spatial multi-attribute node linked list includes: a minimum bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node; the leaf nodes are represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses.
And step 106, storing each document in the document set to be indexed to a tree data structure.
The established tree data structure is shown in fig. 2.
In the above space-time multi-attribute index method capable of adaptive dynamic stretching, a tree data structure of space-time multi-attribute index is proposed, in the structure, the tree data structure includes: a root node and a leaf node; the root node downwards expands to comprise multi-level time multi-attribute nodes, the time multi-attribute nodes downwards expand to comprise multi-level space multi-attribute nodes, the root node is represented through a root node chain table, the time multi-attribute nodes are represented through a time multi-attribute node chain table, the space multi-attribute nodes are represented through a space multi-attribute node chain table, and the root node chain table comprises the following components: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and pointer to next level node, the spatial multi-attribute node linked list includes: a minimum bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node; the leaf nodes are represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses. The structure mainly comprises a time multi-attribute structure and a space multi-attribute structure, and the time structure and the space structure both consider the function of filtering the keywords, so that the keywords can be utilized for quick pruning during retrieval. In addition, a multi-level time multi-attribute index structure is designed by utilizing the self hierarchical characteristics of time (such as year-month-day), a mechanism is provided for the subsequent index adaptive adjustment, in addition, the bitmap index in the construction process can accelerate the construction efficiency, the combining operation of the bitmap index is bit operation, the speed is high, and the general construction process of the index is accelerated by combining with a tree-shaped searching mechanism in the insertion process.
In one embodiment, the time information, the spatial information and the word list of each document in the document set to be indexed are extracted; mapping the word list by using a bitmap index to obtain a bitmap element; inquiring the root node linked list according to the time information, so that the time information is contained in the time value of the root node linked list, and obtaining a time value element; when the time multi-attribute node linked list is included, inquiring the time multi-attribute node linked list to enable the time information to be included in the time information of the time multi-attribute node linked list until no next-stage time multi-attribute node linked list exists; and inserting each document into a next-level space multi-attribute node linked list of the time multi-attribute node linked list by using an R tree insertion algorithm according to the space information of each document in the document set to be indexed until leaf nodes are inserted.
In one embodiment, the root node linked list is queried according to time information D.t to determine that the time value contains element rln of D.t, and the hit relationship is constructed as follows:
rln.bmi=rln.bmi|blw
wherein bmi is a bitmap index, and blw is a bitmap element;
if element rln is not queried, a root node chain table element rln is created and inserted into the root node, and let rln.
In one embodiment, when a time multi-attribute node linked list is included, the value included element rln1 of D.t in the next level time multi-attribute node of the root node linked list element rln is queried, and the hit relationship is constructed as follows:
rln1.bmi=rln1.bmi|blw
if rln1 is not queried, a time multi-attribute node is created and inserted into the parent node and associated with element rln1, rln1.bmi ═ rln1.bmi | blw until element rln1 has no next level time multi-attribute node linked list.
In one embodiment, it is optimal to solve how many levels of the temporal indexing structure are employed under how much data to index and how much defined storage space. Time, space, keywords and storage space are extracted as mapping features for training, so that the optimized key characteristics are embodied, and the training data volume is reduced.
Specifically, starting from the set initial ratio M%, with the step δ as l%, constructing subsets of different sizes of the document set DS to be indexed according to the scale ratio of (M% + δ): the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set; m may take 10.
For each document set in DS _ subiExtracting the spatial information s and the time information t of each document, and extracting all words in the documents by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiAnd transforming different time-layer progression to construct the index of the tree data structure, wherein the DS _ sub isiIncreasing the hierarchy of the time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
when | DS _ sub | ═ n, n × m indexes of the tree data structure are obtained, and the storage amount stor of the indexes of the n × m tree data structures is recorded;
for each DS _ subiAll spatial information s, temporal information t and lw in (a) construct a vector v _ subi
Will DS _ subiSetting the storage quantity obtained by using an index construction algorithm with j hierarchy as stori,j
Establishing a mapping<v_subi,stori,j>→j;
And training all the mappings by using an autoregressive model to obtain a machine learning model stor _ m of the space-time multi-attribute index storage mechanism.
In one of the embodiments, the memory space stor given by the read scan;
for a document set DS to be indexed, extracting spatial information s and time information t for each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to all the spatial information s, the time information t and lw;
calculating < v, stor > → j according to a machine learning model stor _ m;
and performing a space-time multi-attribute index construction step by taking j as a hierarchy parameter of the time multi-attribute node.
In the method, the trained storage optimization model is utilized, indexes of different time levels can be constructed along with the preset size of the storage space for the indexes, and the indexes have intelligent flexibility in practice, so that the method is particularly suitable for transparent adjustment and application on the cloud.
In one embodiment, as shown in fig. 3, an adaptive dynamic scaling spatiotemporal multiattribute search method is provided, which includes:
step 302, obtaining the search condition.
The search conditions include: a spatial query scope, a temporal query scope, and a query keyword list.
Step 304, the list of query terms is mapped to bqw.
And step 306, performing intersection operation on the time query condition and the time value of each element of the root node linked list of the tree data structure in the space-time multi-attribute index method capable of self-adaptive dynamic expansion to obtain an element set r _ set.
Step 308, carry out the inter-operation of bqw and the bitmap index bmi of each element in r _ set to obtain a set of elements r _ set'.
At step 310, for each element in r _ set', the value in the element child element is intersected with the temporal query scope and bqw is intersected with bmi until recursive to spatial multi-attribute nodes.
Step 312, for each element of the spatial multi-attribute node, intersecting the minimum bounding matrix MBR and bqw intersecting bmi in the element with the spatial query condition until recursive to leaf nodes.
And step 314, outputting a retrieval result when the space information, the time information and the word list in the leaf node meet the retrieval condition.
In the space-time multi-attribute retrieval method capable of self-adapting dynamic expansion, the time multi-attribute structure and the space multi-attribute structure are fully utilized to quickly filter the time-key words and the space-key words so as to realize high-efficiency retrieval.
In one embodiment, a machine learning training process for optimizing search efficiency balance is provided, which includes:
1. giving a document set DS to be indexed;
2. starting from N%, constructing subsets of different sizes of a document set DS to be indexed according to the scale proportion of (N% + delta) with the step length delta as l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
3. extracting spatial information s and time information t of each document in each document set DS _ subi in DS _ sub, and extracting words in the documents by using a word segmentation component to form a word list lw;
4. for each document set DS _ sub in each DS _ subiTransforming different time-layer series to construct an index of a tree-shaped data structure; wherein, for DS _ subiIncreasing the time hierarchy of the multi-level time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
5. recording the construction time of the index of each tree-shaped data structure to obtain a construction time set of
Figure BDA0003542396000000111
Figure BDA0003542396000000112
Wherein the content of the first and second substances,
Figure BDA0003542396000000113
representing a representation for a document set DS _ subiThe construction time of the index of the jth tree data structure of (1);
6. for document set DS _ subiThe index of each tree-shaped data structure adopts a random generation space range and a time range and randomly selects a plurality of query keywords to form a query condition, carries out retrieval, calculates average retrieval response time and forms a retrieval time set
Figure BDA0003542396000000114
Representing a representation for a document set DS _ subiThe statistical average retrieval time of the index of the jth tree data structure of (1);
7. computing
Figure BDA0003542396000000115
Is an arithmetic mean of
Figure BDA0003542396000000116
Is an arithmetic mean of
Figure BDA0003542396000000117
8. Computing
Figure BDA0003542396000000118
Is obtained to enable
Figure BDA0003542396000000119
Temporal level p of the index of the smallest tree data structurei
9. For each DS _ subiAll spatial information s, temporal information t and lw in (a) construct a vector v _ subi
10. Establishing a mapping v _ subi→pi
11. And training all the mappings by using an autoregressive model to obtain a machine learning model brbal _ m of a space-time multi-attribute index construction and retrieval efficiency balance mechanism.
In this embodiment, the query efficiency will be improved if there are many levels of index construction, but the construction efficiency will be reduced, whereas otherwise there are few levels of index construction, and the construction efficiency is high, but the query efficiency is low, so a balance point is to be found. The method mainly solves the problem that under the condition of large data volume to be indexed, the query efficiency and the construction efficiency can be balanced by adopting the time structure of the number of levels.
In one embodiment, for a given document set DS to be indexed, for each document D, extracting spatial information s and time information t from the document D, and extracting all words in the document by using a word segmentation component to form a word list lw;
constructing a vector v according to the spatial information s, the time information t and the lw;
calculating v → p according to the machine learning model brbal _ m;
and performing a construction process of the space-time multi-attribute index by taking p as a hierarchy parameter of the time multi-attribute node.
In the embodiment, the number of the levels of the time structure can be automatically adjusted according to different data volumes to be indexed when the index is constructed, so that the efficiency balance of construction and query is realized.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. An adaptive dynamic scalable spatio-temporal multiattribute indexing method, characterized in that the method comprises:
constructing a document set to be indexed; each document in the document set to be indexed comprises: temporal information, spatial information, and word lists;
constructing a tree data structure of the document set to be indexed; the tree data structure includes: a root node and a leaf node; the downward expansion of the root node comprises multiple levels of time multiple attribute nodes, the downward expansion of the time multiple attribute nodes comprises multiple levels of space multiple attribute nodes, the root node is represented by a root node chain table, the time multiple attribute nodes are represented by a time multiple attribute node chain table, the space multiple attribute nodes are represented by a space multiple attribute node chain table, and the root node chain table comprises: time level information, a time value, a bitmap index, a pointer to a next linked list element, and a pointer to a next level node, the multi-attribute node linked list comprising: time level, time value, bitmap index and a pointer to the next level node, the spatial multi-attribute node linked list comprising: a minimum bounding rectangle of the R-tree, a bitmap index, and a pointer to a next level node; the leaf nodes are represented by an element structure comprising: spatial information, temporal information, word lists, and URL addresses;
and storing each document in the document set to be indexed to the tree-shaped data structure.
2. The method of claim 1, wherein storing each document in the set of documents to be indexed to the tree data structure comprises:
extracting time information, space information and a word list of each document in the document set to be indexed;
mapping the word list by using a bitmap index to obtain a bitmap element;
inquiring the root node linked list according to the time information, so that the time information is contained in the time value of the root node linked list, and obtaining a time value element;
when the time multi-attribute node linked list is included, inquiring the time multi-attribute node linked list to enable the time information to be included in the time information of the time multi-attribute node linked list until no next-stage time multi-attribute node linked list exists;
and inserting each document into a next-level space multi-attribute node linked list of the time multi-attribute node linked list by using an R tree insertion algorithm according to the space information of each document in the document set to be indexed until the leaf nodes are inserted.
3. The method of claim 2, wherein querying the root node linked list according to the time information so that the time information is included in a time value of the root node linked list to obtain a time value element comprises:
according to the time information D.t, querying the root node linked list, determining an element rln of which the time value contains D.t, and constructing a hit relation as follows:
rln.bmi=rln.bmi|blw
wherein bmi is a bitmap index, and blw is a bitmap element;
if element rln is not queried, a root node chain table element rln is created and inserted into the root node, and let rln.bmi ═ rln.bmi | blw.
4. The method of claim 3, wherein when a time multiattribute node linked list is included, querying the time multiattribute node linked list so that time information is included in the time information for the time multiattribute node linked list until there is no next level of time multiattribute node linked list comprises:
when the time multi-attribute node linked list is included, inquiring an element rln1 of which value includes D.t in the next level time multi-attribute node of the root node linked list element rln, and constructing a hit relation as follows:
rln1.bmi=rln1.bmi|blw
if rln1 is not queried, a time multi-attribute node is created and inserted into the parent node and associated with element rln1, rln1.bmi ═ rln1.bmi | blw until element rln1 has no next level time multi-attribute node linked list.
5. The method according to any one of claims 1 to 4, further comprising:
starting from the set initial proportion M%, constructing subsets of different sizes of the document set DS to be indexed according to the scale proportion of (M% + delta) with the step length delta as l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
for each document set in DS _ subiExtracting the spatial information s and the time information t of each document, and extracting all words in the documents by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiAnd transforming different time-layer progression to construct the index of the tree data structure, wherein the DS _ sub isiIncreasing the hierarchy of the time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
when | DS _ sub | ═ n, n × m indexes of the tree data structure are obtained, and the storage amount stor of the indexes of the n × m tree data structures is recorded;
for each DS _ subiAll spatial information s, temporal information t and lw in (a) construct a vector v _ subi
Will DS _ subiSetting the storage quantity obtained by using an index construction algorithm with j hierarchy as a time hierarchy as stori,j
Establishing a mapping<v_subi,stori,j>→j;
And training all the mappings by using an autoregressive model to obtain a machine learning model stor _ m of the space-time multi-attribute index storage mechanism.
6. The method of claim 5, further comprising:
reading and scanning the given storage space stor;
for a document set DS to be indexed, extracting spatial information s and time information t for each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to all the spatial information s, the time information t and the lw;
calculating < v, stor > → j according to a machine learning model stor _ m;
and performing a space-time multi-attribute index construction step by taking j as a hierarchy parameter of the time multi-attribute node.
7. An adaptive dynamic scalable spatiotemporal multiattribute retrieval method, characterized in that the method comprises:
acquiring a retrieval condition; the retrieval conditions comprise: a space query range, a time query range and a query keyword list;
mapping the list of query keywords to bqw;
performing intersection operation on the time query condition and the time value of each element of a root node linked list of a tree-shaped data structure in the space-time multi-attribute index method capable of self-adapting dynamic stretching of any one of claims 1 to 6 to obtain an element set r _ set;
performing an operation on bqw and the bitmap index bmi of each element in the r _ set to obtain an element set r _ set';
for each element in r _ set', intersecting value in the element child element and bqw intersecting bmi with the temporal query range until recursive to spatial multi-attribute nodes;
for each element of the spatial multi-attribute node, intersecting the minimum definition matrix MBR in the element and intersecting the minimum definition matrix bqw with bmi by using a spatial query condition until recursive to a leaf node;
and outputting a retrieval result when the spatial information, the time information and the word list in the leaf node meet the retrieval condition.
8. The retrieval method of claim 7, wherein the method further comprises:
giving a document set DS to be indexed;
starting from N%, constructing subsets of different sizes of a document set DS to be indexed according to the scale proportion of (N% + delta) with the step length delta of l%; the method comprises the following steps that a subset of a document set to be indexed is DS _ sub, and each element in the DS _ sub is a document set;
for each document set in DS _ subiExtracting the spatial information s and the time information t of each document, and extracting words in the documents by using a word segmentation component to form a word list lw;
for each document set DS _ sub in each DS _ subiTransforming different time-layer series to construct an index of a tree-shaped data structure; wherein, for DS _ subiIncreasing the time hierarchy of the multi-level time multi-attribute node from 1 to m levels, and constructing indexes of m different tree-like data structures;
recording the construction time of the index of each tree-shaped data structure to obtain a construction time set of
Figure FDA0003542395990000041
Figure FDA0003542395990000042
Wherein the content of the first and second substances,
Figure FDA0003542395990000043
representing a representation for a document set DS _ subiThe construction time of the index of the jth tree data structure of (1);
for document set DS _ subiEach tree of (1)Indexing of the shape data structure, forming query conditions by randomly generating space range and time range and randomly selecting a plurality of query keywords, searching, calculating average search response time, and forming a search time set
Figure FDA0003542395990000044
Figure FDA0003542395990000045
Representing a representation for a document set DS _ subiThe statistical average retrieval time of the index of the jth tree data structure of (1);
computing
Figure FDA0003542395990000046
Is an arithmetic mean of
Figure FDA0003542395990000047
Figure FDA0003542395990000048
Is an arithmetic mean of
Figure FDA0003542395990000049
Computing
Figure FDA00035423959900000410
Is obtained to enable
Figure FDA00035423959900000411
Temporal level p of the index of the smallest tree data structurei
For each DS _ subiAll spatial information s, temporal information t and lw in (a) construct a vector v _ subi
Establishing a mapping v _ subi→pi
And training all the mappings by using an autoregressive model to obtain a machine learning model brbal _ m of a space-time multi-attribute index construction and retrieval efficiency balance mechanism.
9. The retrieval method of claim 8, wherein the method further comprises:
for a given document set DS to be indexed, extracting spatial information s and time information t from each document D, and extracting all words in the documents by using a word segmentation component to form a word list lw;
constructing a vector v according to the spatial information s, the time information t and the lw;
calculating v → p according to the machine learning model brbal _ m;
and performing a construction process of the space-time multi-attribute index by taking p as a hierarchy parameter of the time multi-attribute node.
CN202210241696.1A 2022-03-11 2022-03-11 Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof Pending CN114722139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210241696.1A CN114722139A (en) 2022-03-11 2022-03-11 Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210241696.1A CN114722139A (en) 2022-03-11 2022-03-11 Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof

Publications (1)

Publication Number Publication Date
CN114722139A true CN114722139A (en) 2022-07-08

Family

ID=82238124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210241696.1A Pending CN114722139A (en) 2022-03-11 2022-03-11 Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof

Country Status (1)

Country Link
CN (1) CN114722139A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809360A (en) * 2023-02-08 2023-03-17 深圳大学 Large-scale space-time stream data real-time space connection query method and related equipment
CN117389954A (en) * 2023-12-13 2024-01-12 湖南汇智兴创科技有限公司 Online multi-version document content positioning method, device, equipment and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115809360A (en) * 2023-02-08 2023-03-17 深圳大学 Large-scale space-time stream data real-time space connection query method and related equipment
CN115809360B (en) * 2023-02-08 2023-05-05 深圳大学 Real-time space connection query method for large-scale space-time data and related equipment
CN117389954A (en) * 2023-12-13 2024-01-12 湖南汇智兴创科技有限公司 Online multi-version document content positioning method, device, equipment and medium
CN117389954B (en) * 2023-12-13 2024-03-29 湖南汇智兴创科技有限公司 Online multi-version document content positioning method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US20220261427A1 (en) Methods and system for semantic search in large databases
JP5858432B2 (en) Method, system, and computer program product for providing a distributed associative memory base
US11347741B2 (en) Efficient use of TRIE data structure in databases
CN114722139A (en) Space-time multi-attribute index method capable of self-adaptive dynamic expansion and retrieval method thereof
CN111868710B (en) Random extraction forest index structure for searching large-scale unstructured data
US20100106713A1 (en) Method for performing efficient similarity search
CN111581215B (en) Array tree data storage method, fast search method and readable storage medium
CN108399213B (en) User-oriented personal file clustering method and system
CN108304409B (en) Carry-based data frequency estimation method of Sketch data structure
CN106557777A (en) It is a kind of to be based on the improved Kmeans clustering methods of SimHash
Skopal et al. Nearest Neighbours Search using the PM-tree
CN116738988A (en) Text detection method, computer device, and storage medium
Günnemann et al. Subspace clustering for indexing high dimensional data: a main memory index based on local reductions and individual multi-representations
CN113722274A (en) Efficient R-tree index remote sensing data storage model
CN113297266B (en) Data processing method, device, equipment and computer storage medium
WO2023246849A1 (en) Feedback data graph generation method and refrigerator
CN113688702B (en) Street view image processing method and system based on fusion of multiple features
CN110955827B (en) By using AI 3 Method and system for solving SKQwyy-not problem
Terry et al. Indexing method for multidimensional vector data
CN116680367B (en) Data matching method, data matching device and computer readable storage medium
JP2002073390A (en) Recording medium in which multi-dimensional spatial data structure is recorded, method of updating multi- dimension spatial data, method of searching multi- dimensional spatial data, and recording medium in which program for performing the methods are recorded
Terry et al. Variable granularity space filling curve for indexing multidimensional data
CN116910337A (en) Entity object circle selection method, query method, device, server and medium
CN118133044A (en) Problem extension method, device, computer equipment, storage medium and product
CN116136958A (en) Document processing method, apparatus, computer program product, and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination