CN112800065A - Efficient data retrieval method based on improved block storage structure - Google Patents
Efficient data retrieval method based on improved block storage structure Download PDFInfo
- Publication number
- CN112800065A CN112800065A CN202110182161.7A CN202110182161A CN112800065A CN 112800065 A CN112800065 A CN 112800065A CN 202110182161 A CN202110182161 A CN 202110182161A CN 112800065 A CN112800065 A CN 112800065A
- Authority
- CN
- China
- Prior art keywords
- node
- data
- tree
- query
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000012795 verification Methods 0.000 claims abstract description 12
- 238000003780 insertion Methods 0.000 claims description 9
- 230000037431 insertion Effects 0.000 claims description 9
- QSHDDOUJBYECFT-UHFFFAOYSA-N mercury Chemical compound [Hg] QSHDDOUJBYECFT-UHFFFAOYSA-N 0.000 abstract description 4
- 229910052753 mercury Inorganic materials 0.000 abstract description 4
- 238000010276 construction Methods 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a high-efficiency data retrieval method based on an improved block chain storage structure, which is characterized in that according to the efficiency problem existing in the current block chain storage and retrieval, starting from a block structure, a three-layer retrieval model based on a block chain system is firstly provided; secondly, according to the searching stability of the B + tree and the advantage of short query path, the memory structure of the Mercury tree is improved by combining the excellent verification performance of the Mercury tree, corresponding index data are added to the nodes, the data retrieval capability is improved on the premise of ensuring the verification efficiency, range query is supported, and a corresponding construction algorithm and a query algorithm are designed; finally, the effectiveness and the availability of the proposed tree structure are tested through experiments, and the method has good query performance and stability.
Description
Technical Field
The invention relates to a data retrieval method in a block chain, mainly aims at the design of a data structure of a block in the block chain, designs a data retrieval model based on the block chain, and belongs to the field of the block chain.
Background
The block chain technology has the characteristics of decentralization, non-tamper property and the like, can safely and effectively reduce the cost of a new task, can store data safely, and is a hotspot of current computer field research. The block chain has the characteristics of public transparency, verifiability, data traceability, non-falsification and the like. The system is a decentralized distributed account book formed by connecting a time stamp information with a block, and the data of the block is synchronized mainly by a consensus mechanism.
The blockchain technique originates from the paper "bitcoin" published by the chinese tomb in 2008: a point-to-point electronic cash system which lays the foundation of cryptocurrency, and since this bitcoin and many other cryptocurrencies have become popular words in the industry and academia. Bitcoin has enjoyed tremendous success as one of the most successful cryptocurrencies, and its capital market has reached $ 1000 million in 2016. The block chain technology is used as a bottom layer supporting technology of the bit currency, has the characteristics of openness, transparency, non-tampering and the like, and is mainly characterized in that the security and the reliability are ensured by the technology of cryptography. In the aspect of safety, the method mainly comprises a data signature technology, a public key and private key system idea and a timestamp mechanism; in terms of reliability, verification of nodes and data is mainly achieved by a consensus mechanism. The problem of the general of Byzantine, such as a workload consensus mechanism, a stock right certification and other distributed consensus mechanisms, is solved. The traditional trust problem and the double-flower attack problem are solved, and the traditional trust economic cost is reduced; in the aspect of data storage, the Mercker tree technology is mainly adopted. Androluaki et al introduced a super ledger architecture: the system is an extensible real blockchain system which supports a modularized consensus protocol, identity authentication and the like, allows the system to be customized for specific trust models and application scenes, is also a blockchain system which supports the operation of a standard general programming language, supports more high-level programming languages to perform contract development and deployment, and puts an intelligent contract rule into a container to operate. McConaghy et al propose BigChainDB, a large-scale distributed database, introduce a block chain technology to a traditional database by combining the characteristics of distributed control, non-tamper property and the like of the block chain technology, improve the security of data, propose a block chain database consistency strategy, and improve the capacity of data to support the storage of larger data on the storage capacity problem of the block chain. Wilkinson et al propose an end-to-end blockchain storage network where users transmit and share data independent of third party data providers, eliminating central control, and checking file availability and integrity with periodic passwords. Dinh et al have built a set of system framework for testing the blockchain system, have comprehensively evaluated the data model, consensus algorithm and super ledger, have separated storage, execution engine and consensus layer from each other, then have optimized and expanded them independently, and have tested the query ability of the data of the UStore, help the developer to identify the system bottleneck and improve the platform accordingly. Tsai et al analyzed the needs of block-chain application systems in terms of consistency, expandability, and database, and developed a "North aviation chain" in a double-chain architecture mode with the goal of expandability. Li et al propose an efficient block chain retrieval method by taking MongoDB as a repository, support range query and Top-K query, and have better flexibility, but Li et al also realize efficient retrieval by combining uplink and downlink databases of chains, store data by adopting the MongoDB database, store the data of a block chain under the chains in a K-V key value pair mode, query by using the characteristics of the data, do not solve the problem of the query efficiency of the block chain, and increase the communication traffic.
The Merck tree is a binary tree, which is composed of a group of leaf nodes, a group of intermediate nodes and a root node and is mainly applied to quick verification; b + tree B tree variants are widely used, and are mainly used in database index query and file systems. There are three types of B + tree major nodes: the leaf node, the root node and the internal node have good search characteristics, are a multi-path search tree and support range query.
Disclosure of Invention
According to the scheme, based on the efficiency problem of the current block chain storage and retrieval, starting from a block structure, a three-layer retrieval model based on a block chain system is provided firstly; secondly, according to the searching stability of the B + tree and the advantage of short query path, the memory structure of the Mercury tree is improved by combining the excellent verification performance of the Mercury tree, corresponding index data are added to the nodes, the data retrieval capability is improved on the premise of ensuring the verification efficiency, range query is supported, and a corresponding construction algorithm and a query algorithm are designed; finally, the effectiveness and the availability of the proposed tree structure are tested through experiments, and the method has good query performance and stability. The data retrieval model is mainly divided into 3 layers which respectively comprise a user layer, a query layer and a storage layer. As shown in fig. 1.
A user initiates a data query request, firstly, a user layer is accessed, the data is queried from the cache data of the user layer, if the data which is consistent with the data is found in the cache, the searching is completed, and a result is returned; otherwise, the query layer continues to be accessed. The query layer processes the query service and the common node verification service in a labor division manner according to the stability and the computing power of the node, and query data are returned to the user layer. The storage layer is positioned at the bottommost layer and is mainly responsible for storing block chain data and responding to a query request of the query layer. For the storage structure of the block, a storage structure based on an improved Merckel tree is proposed.
The block chain storage structure based on the M _ H + tree structure combines the characteristics of efficient verification of the Mercker tree and quick retrieval of the B + tree, and the data structure is shown in FIG. 2. The node mainly improves the memory structure of the Mercker tree and mainly comprises a hash value, an index address and an end point index value.
The insertion algorithm and the search algorithm are mainly designed.
The M _ H + tree establishment algorithm comprises the following steps:
step 1, determining the structure of a B + tree;
step 2: taking out leaf data N of the B + tree according to the result of the step 1;
and step 3: judging whether N is an M _ H + tree leaf node;
and 4, step 4: inserting an N node;
and 5: repeating the step 2;
the algorithm is as follows: m _ H + tree establishment algorithm
Input is a constructed B + tree structure;
output is M _ H + tree structure storage model;
1. confirming the constructed B + tree structure;
2. processing data from the bottommost node according to the structure of the B + tree;
front (B + tree node N) front
Opening at leaf node
5. Hash data;
6.}
7.else{
8. acquiring a minimum index;
9. acquiring a maximum index;
10. recording the values of minval and maxval of the node;
11. data hashing;
12.}
13.N=N-1;
14.}
15. the finally obtained root is stored in the block head as a result;
the query algorithm comprises the following steps:
step 1: inputting a block data query index;
step 2: determining a block head;
step 2: accessing block header data to obtain minval, maxval;
and step 3: judging whether the index is between the interval minval-maxval and entering the next layer;
and 4, step 4: returning data or finding no data;
the algorithm is as follows: m _ H + tree query algorithm
Input: block chain data, query condition Index;
output is the transaction result.
If (node does not satisfy stability condition)
2. Finding neighboring super nodes
3.}
Front (traversal block)
5. Accessing a block header tree root minval, maxval value;
6.while(minval≤Index≤maxval){
front opening of If (leaf node)
Front (traversing leaf node) front
If (index found) leaf
Return data;
11.}
12.}
13.}
14. acquiring new node information;
15. updating minval and maxval;
16.}
17. accessing a next block;
}
in the storage structure of the current block chain, the block body is mainly stored based on a merkel tree structure, the merkel tree is a tree with high verification speed, but the efficiency is low because all data needs to be traversed in the aspect of query. The technology of the invention is based on a block storage structure, designs an improved memory structure of the Mercker tree, and is mainly based on the advantages of a B + tree: the search performance is stable, the query path is short, and the range query is supported. The retrieval speed is improved without changing the original quick verification. And (3) searching performance: the data retrieval efficiency is almost maintained within an average time, and the problem that different data retrieval times are greatly different does not exist. The query path is short: depending on the nature of the B + tree, a three-level B + tree can store thousands of data, whereas the Merck tree requires more than ten levels of tree height. And (3) range query: the bottom layer of the designed tree structure is ordered data connected by a linked list, and a whole section of data can be inquired according to a data range.
Drawings
FIG. 1 data retrieval model.
Fig. 2 data structure.
Fig. 3 new tree structure.
Detailed Description
The present invention is described in detail below with reference to the accompanying drawings and embodiments, and the embodiments mainly include tree building and data retrieval processes. The process of building the tree is as follows:
the key to be inserted is a key.
1) And if the tree is empty, a leaf node is newly established, the record is inserted into the leaf node, the leaf node is also the root node at the moment, and the insertion operation is finished.
2) For leaf type nodes: and finding a leaf node according to the key value, and inserting a record into the leaf node. After insertion, if the number of the current node key is not more than m-1, the insertion is finished. Otherwise, splitting the leaf node into a left leaf node and a right leaf node, wherein the left leaf node comprises the first m/2 records, the right node comprises the rest records, carrying the key of the m/2+1 record into the father node (the father node is necessarily an index type node), carrying the key into the father node, wherein the left child pointer of the key is the left node, and the right child pointer is the right node. And pointing the pointer of the current node to the parent node, and then executing the next step.
3) For index type nodes: and if the number of the current node keys is less than or equal to m-1, ending the insertion. Otherwise, splitting the index type node into two index nodes, wherein the left index node comprises the first (m-1)/2 keys, the right node comprises the m- (m-1)/2 keys, the m/2 key is carried into the father node, the left child of the key carried into the father node points to the left node, and the right child of the key carried into the father node points to the right node. The pointer of the current node is pointed to the parent node and the operation is repeated.
As shown in fig. 3, the B + tree is inserted with keys 40, 45, and 62 sequentially.
The searching mode firstly inquires from an application layer, retrieves whether the inquired data is in a cache or not, if the inquired data exists, the data is directly returned, otherwise, the inquiring layer searches for the super node inquiry, the storage layer inquiry is carried out through the super node, the storage structure of the storage layer is an improved Mercker tree structure, and the inquiry is carried out through an inquiry algorithm.
In the M _ H + tree shown in fig. 3, to search for data with an index address of 69, it is first determined from the root node that the index range is 25-99, the interval in which 69 is located is satisfied, according to the access root node (2589), the first child node is found, the node of the node range 25-69 of the subtree is found, the interval is satisfied, according to the node information (255067), the third child node is found, the search result is traversed, and the data information is returned to the search node.
Claims (3)
1. Based on the high-efficiency data retrieval method of the improved block storage structure, a data retrieval model for realizing the method is mainly divided into 3 layers which respectively comprise a user layer, a query layer and a storage layer;
the method is characterized in that: a user initiates a data query request, firstly, a user layer is accessed, the data is queried from the cache data of the user layer, if the data which is consistent with the data is found in the cache, the searching is completed, and a result is returned; otherwise, continuing to access the query layer; the query layer processes query services and common node verification services in a labor division manner according to the stability and the computing power of the nodes, and query data are returned to the user layer; the storage layer is positioned at the bottommost layer and is responsible for storing block chain data and responding to the query request of the query layer.
2. The method of claim 1, wherein the method further comprises: the block chain storage structure based on the M _ H + tree structure combines the characteristics of efficient verification of the Mercker tree and quick retrieval of the B + tree; the node improves the storage structure of the Mercker tree, and consists of a hash value, an index address and an endpoint index value;
m _ H + tree establishment:
step 1, determining the structure of a B + tree;
step 2: taking out leaf data N of the B + tree according to the result of the step 1;
and step 3: judging whether N is an M _ H + tree leaf node;
and 4, step 4: inserting an N node;
and 5: repeating the step 2;
the query steps are as follows:
step 1: inputting a block data query index;
step 2: determining a block head;
step 2: accessing block header data to obtain minval, maxval;
and step 3: judging whether the index is between the interval minval-maxval and entering the next layer;
and 4, step 4: return data or find no data.
3. The method of claim 1, wherein the method further comprises: the method comprises the steps of establishing a tree and retrieving data;
the tree building process comprises the following steps:
the key to be inserted is key;
1) if the tree is empty, a leaf node is newly established, the record is inserted into the leaf node, and the leaf node is also a root node at the moment, so that the insertion operation is finished;
2) for leaf type nodes: finding a leaf node according to the key value, and inserting a record into the leaf node; after insertion, if the number of the current node key is not more than m-1, the insertion is finished; otherwise, splitting the leaf node into a left leaf node and a right leaf node, wherein the left leaf node comprises the first m/2 records, the right node comprises the rest records, carrying the key of the m/2+1 record into the father node, carrying the key into the father node, and carrying the key to the left child pointer of the father node to the left node and the right child pointer to the right node; pointing the pointer of the current node to the father node;
3) for index type nodes: if the number of the current node keys is less than or equal to m-1, ending the insertion; otherwise, splitting the index type node into two index nodes, wherein the left index node comprises the first (m-1)/2 keys, the right node comprises the m- (m-1)/2 keys, the m/2 key is carried into the father node, the left child of the key carried into the father node points to the left node, and the right child of the key carried into the father node points to the right node; and pointing the pointer of the current node to the parent node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110182161.7A CN112800065A (en) | 2021-02-09 | 2021-02-09 | Efficient data retrieval method based on improved block storage structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110182161.7A CN112800065A (en) | 2021-02-09 | 2021-02-09 | Efficient data retrieval method based on improved block storage structure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112800065A true CN112800065A (en) | 2021-05-14 |
Family
ID=75815070
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110182161.7A Pending CN112800065A (en) | 2021-02-09 | 2021-02-09 | Efficient data retrieval method based on improved block storage structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112800065A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113626432A (en) * | 2021-08-03 | 2021-11-09 | 浪潮云信息技术股份公司 | Improvement method of self-adaptive radix tree supporting any Key value |
CN113779025A (en) * | 2021-08-06 | 2021-12-10 | 西安电子科技大学 | Optimization method, system and application of classified data retrieval efficiency in block chain |
CN113901131A (en) * | 2021-09-02 | 2022-01-07 | 北京邮电大学 | Index-based on-chain data query method and device |
CN114756603A (en) * | 2022-05-23 | 2022-07-15 | 天津大学 | High-efficiency verifiable query method for lightweight block chain |
CN116303586A (en) * | 2022-12-09 | 2023-06-23 | 中电云数智科技有限公司 | Metadata cache elimination method based on multi-level b+tree |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165224A (en) * | 2018-08-24 | 2019-01-08 | 东北大学 | A kind of indexing means being directed to keyword key on block chain database |
CN112035491A (en) * | 2020-09-30 | 2020-12-04 | 中山大学 | Data storage method based on block chain, electronic integral processing method and system |
-
2021
- 2021-02-09 CN CN202110182161.7A patent/CN112800065A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165224A (en) * | 2018-08-24 | 2019-01-08 | 东北大学 | A kind of indexing means being directed to keyword key on block chain database |
CN112035491A (en) * | 2020-09-30 | 2020-12-04 | 中山大学 | Data storage method based on block chain, electronic integral processing method and system |
Non-Patent Citations (1)
Title |
---|
郑浩瀚;申德荣;聂铁铮;寇月;: "面向混合索引的区块链***的可查询性优化", 计算机科学, no. 10, pages 309 - 316 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113626432A (en) * | 2021-08-03 | 2021-11-09 | 浪潮云信息技术股份公司 | Improvement method of self-adaptive radix tree supporting any Key value |
CN113626432B (en) * | 2021-08-03 | 2023-10-13 | 上海沄熹科技有限公司 | Improved method of self-adaptive radix tree supporting arbitrary Key value |
CN113779025A (en) * | 2021-08-06 | 2021-12-10 | 西安电子科技大学 | Optimization method, system and application of classified data retrieval efficiency in block chain |
CN113779025B (en) * | 2021-08-06 | 2024-01-26 | 西安电子科技大学 | Optimization method, system and application of classified data retrieval efficiency in block chain |
CN113901131A (en) * | 2021-09-02 | 2022-01-07 | 北京邮电大学 | Index-based on-chain data query method and device |
CN113901131B (en) * | 2021-09-02 | 2024-06-07 | 北京邮电大学 | Index-based on-chain data query method and device |
CN114756603A (en) * | 2022-05-23 | 2022-07-15 | 天津大学 | High-efficiency verifiable query method for lightweight block chain |
CN116303586A (en) * | 2022-12-09 | 2023-06-23 | 中电云数智科技有限公司 | Metadata cache elimination method based on multi-level b+tree |
CN116303586B (en) * | 2022-12-09 | 2024-01-30 | 中电云计算技术有限公司 | Metadata cache elimination method based on multi-level b+tree |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112800065A (en) | Efficient data retrieval method based on improved block storage structure | |
CN111339106B (en) | Block chain data indexing method | |
CN109165224B (en) | Indexing method for key words on block chain database | |
CN102122285B (en) | Data cache system and data inquiry method | |
US20230109969A1 (en) | Data processing method and apparatus based on node internal memory, device and medium | |
KR102232641B1 (en) | Method for searching using data structure supporting multiple search in blockchain based IoT environment, and apparatus thereof | |
CN102945249B (en) | A kind of policing rule matching inquiry tree generation method, matching process and device | |
US8229916B2 (en) | Method for massively parallel multi-core text indexing | |
CN104794123A (en) | Method and device for establishing NoSQL database index for semi-structured data | |
CN113821564B (en) | Heterogeneous parallel blockchain and method for coordinating on-chain data and under-chain contracts thereof | |
CN101771537A (en) | Processing method and certificating method for distribution type certificating system and certificates of certification thereof | |
WO2021190179A1 (en) | Synchronous processing method and related apparatus | |
CN110109874A (en) | A kind of non-stop layer distributed document retrieval method based on block chain | |
CN101277252A (en) | Method for traversing multi-branch Trie tree | |
CN115052008B (en) | Block chain data under-chain storage method based on cloud storage | |
Liu et al. | Finding smallest k-compact tree set for keyword queries on graphs using mapreduce | |
CN114791788B (en) | Data storage method and device based on block chain | |
CN116701452A (en) | Data processing method, related device, storage medium and program product | |
Zegour | Scalable distributed compact trie hashing (CTH*) | |
CN114095373A (en) | Knowledge graph-based alliance chain management method, system, equipment and storage medium | |
CN112463890B (en) | Cross-system data sharing method based on block chain and machine learning | |
CN113495982B (en) | Transaction node management method and device, computer equipment and storage medium | |
CN112035485B (en) | Method and system for realizing efficient query of credit information data based on distributed architecture | |
CN117056342B (en) | Data processing method based on block chain and related equipment | |
WO2023160040A1 (en) | Data processing method and apparatus based on blockchain, and device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |