CN112800065A

CN112800065A - Efficient data retrieval method based on improved block storage structure

Info

Publication number: CN112800065A
Application number: CN202110182161.7A
Authority: CN
Inventors: 梁保陈; 张兴兰
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2021-05-14

Abstract

The invention discloses a high-efficiency data retrieval method based on an improved block chain storage structure, which is characterized in that according to the efficiency problem existing in the current block chain storage and retrieval, starting from a block structure, a three-layer retrieval model based on a block chain system is firstly provided; secondly, according to the searching stability of the B + tree and the advantage of short query path, the memory structure of the Mercury tree is improved by combining the excellent verification performance of the Mercury tree, corresponding index data are added to the nodes, the data retrieval capability is improved on the premise of ensuring the verification efficiency, range query is supported, and a corresponding construction algorithm and a query algorithm are designed; finally, the effectiveness and the availability of the proposed tree structure are tested through experiments, and the method has good query performance and stability.

Description

Efficient data retrieval method based on improved block storage structure

Technical Field

The invention relates to a data retrieval method in a block chain, mainly aims at the design of a data structure of a block in the block chain, designs a data retrieval model based on the block chain, and belongs to the field of the block chain.

Background

The block chain technology has the characteristics of decentralization, non-tamper property and the like, can safely and effectively reduce the cost of a new task, can store data safely, and is a hotspot of current computer field research. The block chain has the characteristics of public transparency, verifiability, data traceability, non-falsification and the like. The system is a decentralized distributed account book formed by connecting a time stamp information with a block, and the data of the block is synchronized mainly by a consensus mechanism.

The blockchain technique originates from the paper "bitcoin" published by the chinese tomb in 2008: a point-to-point electronic cash system which lays the foundation of cryptocurrency, and since this bitcoin and many other cryptocurrencies have become popular words in the industry and academia. Bitcoin has enjoyed tremendous success as one of the most successful cryptocurrencies, and its capital market has reached $ 1000 million in 2016. The block chain technology is used as a bottom layer supporting technology of the bit currency, has the characteristics of openness, transparency, non-tampering and the like, and is mainly characterized in that the security and the reliability are ensured by the technology of cryptography. In the aspect of safety, the method mainly comprises a data signature technology, a public key and private key system idea and a timestamp mechanism; in terms of reliability, verification of nodes and data is mainly achieved by a consensus mechanism. The problem of the general of Byzantine, such as a workload consensus mechanism, a stock right certification and other distributed consensus mechanisms, is solved. The traditional trust problem and the double-flower attack problem are solved, and the traditional trust economic cost is reduced; in the aspect of data storage, the Mercker tree technology is mainly adopted. Androluaki et al introduced a super ledger architecture: the system is an extensible real blockchain system which supports a modularized consensus protocol, identity authentication and the like, allows the system to be customized for specific trust models and application scenes, is also a blockchain system which supports the operation of a standard general programming language, supports more high-level programming languages to perform contract development and deployment, and puts an intelligent contract rule into a container to operate. McConaghy et al propose BigChainDB, a large-scale distributed database, introduce a block chain technology to a traditional database by combining the characteristics of distributed control, non-tamper property and the like of the block chain technology, improve the security of data, propose a block chain database consistency strategy, and improve the capacity of data to support the storage of larger data on the storage capacity problem of the block chain. Wilkinson et al propose an end-to-end blockchain storage network where users transmit and share data independent of third party data providers, eliminating central control, and checking file availability and integrity with periodic passwords. Dinh et al have built a set of system framework for testing the blockchain system, have comprehensively evaluated the data model, consensus algorithm and super ledger, have separated storage, execution engine and consensus layer from each other, then have optimized and expanded them independently, and have tested the query ability of the data of the UStore, help the developer to identify the system bottleneck and improve the platform accordingly. Tsai et al analyzed the needs of block-chain application systems in terms of consistency, expandability, and database, and developed a "North aviation chain" in a double-chain architecture mode with the goal of expandability. Li et al propose an efficient block chain retrieval method by taking MongoDB as a repository, support range query and Top-K query, and have better flexibility, but Li et al also realize efficient retrieval by combining uplink and downlink databases of chains, store data by adopting the MongoDB database, store the data of a block chain under the chains in a K-V key value pair mode, query by using the characteristics of the data, do not solve the problem of the query efficiency of the block chain, and increase the communication traffic.

The Merck tree is a binary tree, which is composed of a group of leaf nodes, a group of intermediate nodes and a root node and is mainly applied to quick verification; b + tree B tree variants are widely used, and are mainly used in database index query and file systems. There are three types of B + tree major nodes: the leaf node, the root node and the internal node have good search characteristics, are a multi-path search tree and support range query.

Disclosure of Invention

According to the scheme, based on the efficiency problem of the current block chain storage and retrieval, starting from a block structure, a three-layer retrieval model based on a block chain system is provided firstly; secondly, according to the searching stability of the B + tree and the advantage of short query path, the memory structure of the Mercury tree is improved by combining the excellent verification performance of the Mercury tree, corresponding index data are added to the nodes, the data retrieval capability is improved on the premise of ensuring the verification efficiency, range query is supported, and a corresponding construction algorithm and a query algorithm are designed; finally, the effectiveness and the availability of the proposed tree structure are tested through experiments, and the method has good query performance and stability. The data retrieval model is mainly divided into 3 layers which respectively comprise a user layer, a query layer and a storage layer. As shown in fig. 1.

A user initiates a data query request, firstly, a user layer is accessed, the data is queried from the cache data of the user layer, if the data which is consistent with the data is found in the cache, the searching is completed, and a result is returned; otherwise, the query layer continues to be accessed. The query layer processes the query service and the common node verification service in a labor division manner according to the stability and the computing power of the node, and query data are returned to the user layer. The storage layer is positioned at the bottommost layer and is mainly responsible for storing block chain data and responding to a query request of the query layer. For the storage structure of the block, a storage structure based on an improved Merckel tree is proposed.

The block chain storage structure based on the M _ H + tree structure combines the characteristics of efficient verification of the Mercker tree and quick retrieval of the B + tree, and the data structure is shown in FIG. 2. The node mainly improves the memory structure of the Mercker tree and mainly comprises a hash value, an index address and an end point index value.

The insertion algorithm and the search algorithm are mainly designed.

The M _ H + tree establishment algorithm comprises the following steps:

step 1, determining the structure of a B + tree;

step 2: taking out leaf data N of the B + tree according to the result of the step 1;

and step 3: judging whether N is an M _ H + tree leaf node;

and 4, step 4: inserting an N node;

and 5: repeating the step 2;

the algorithm is as follows: m _ H + tree establishment algorithm

Input is a constructed B + tree structure;

output is M _ H + tree structure storage model;

1. confirming the constructed B + tree structure;

2. processing data from the bottommost node according to the structure of the B + tree;

front (B + tree node N) front

Opening at leaf node

5. Hash data;

6.}

7.else{

8. acquiring a minimum index;

9. acquiring a maximum index;

10. recording the values of minval and maxval of the node;

11. data hashing;

12.}

13.N＝N-1；

14.}

15. the finally obtained root is stored in the block head as a result;

the query algorithm comprises the following steps:

step 1: inputting a block data query index;

step 2: determining a block head;

step 2: accessing block header data to obtain minval, maxval;

and step 3: judging whether the index is between the interval minval-maxval and entering the next layer;

and 4, step 4: returning data or finding no data;

the algorithm is as follows: m _ H + tree query algorithm

Input: block chain data, query condition Index;

output is the transaction result.

If (node does not satisfy stability condition)

2. Finding neighboring super nodes

3.}

Front (traversal block)

5. Accessing a block header tree root minval, maxval value;

6.while(minval≤Index≤maxval){

front opening of If (leaf node)

Front (traversing leaf node) front

If (index found) leaf

Return data;

11.}

12.}

13.}

14. acquiring new node information;

15. updating minval and maxval;

16.}

17. accessing a next block;

}

in the storage structure of the current block chain, the block body is mainly stored based on a merkel tree structure, the merkel tree is a tree with high verification speed, but the efficiency is low because all data needs to be traversed in the aspect of query. The technology of the invention is based on a block storage structure, designs an improved memory structure of the Mercker tree, and is mainly based on the advantages of a B + tree: the search performance is stable, the query path is short, and the range query is supported. The retrieval speed is improved without changing the original quick verification. And (3) searching performance: the data retrieval efficiency is almost maintained within an average time, and the problem that different data retrieval times are greatly different does not exist. The query path is short: depending on the nature of the B + tree, a three-level B + tree can store thousands of data, whereas the Merck tree requires more than ten levels of tree height. And (3) range query: the bottom layer of the designed tree structure is ordered data connected by a linked list, and a whole section of data can be inquired according to a data range.

Drawings

FIG. 1 data retrieval model.

Fig. 2 data structure.

Fig. 3 new tree structure.

Detailed Description

The present invention is described in detail below with reference to the accompanying drawings and embodiments, and the embodiments mainly include tree building and data retrieval processes. The process of building the tree is as follows:

the key to be inserted is a key.

1) And if the tree is empty, a leaf node is newly established, the record is inserted into the leaf node, the leaf node is also the root node at the moment, and the insertion operation is finished.

2) For leaf type nodes: and finding a leaf node according to the key value, and inserting a record into the leaf node. After insertion, if the number of the current node key is not more than m-1, the insertion is finished. Otherwise, splitting the leaf node into a left leaf node and a right leaf node, wherein the left leaf node comprises the first m/2 records, the right node comprises the rest records, carrying the key of the m/2+1 record into the father node (the father node is necessarily an index type node), carrying the key into the father node, wherein the left child pointer of the key is the left node, and the right child pointer is the right node. And pointing the pointer of the current node to the parent node, and then executing the next step.

3) For index type nodes: and if the number of the current node keys is less than or equal to m-1, ending the insertion. Otherwise, splitting the index type node into two index nodes, wherein the left index node comprises the first (m-1)/2 keys, the right node comprises the m- (m-1)/2 keys, the m/2 key is carried into the father node, the left child of the key carried into the father node points to the left node, and the right child of the key carried into the father node points to the right node. The pointer of the current node is pointed to the parent node and the operation is repeated.

As shown in fig. 3, the B + tree is inserted with

keys

40, 45, and 62 sequentially.

The searching mode firstly inquires from an application layer, retrieves whether the inquired data is in a cache or not, if the inquired data exists, the data is directly returned, otherwise, the inquiring layer searches for the super node inquiry, the storage layer inquiry is carried out through the super node, the storage structure of the storage layer is an improved Mercker tree structure, and the inquiry is carried out through an inquiry algorithm.

In the M _ H + tree shown in fig. 3, to search for data with an index address of 69, it is first determined from the root node that the index range is 25-99, the interval in which 69 is located is satisfied, according to the access root node (2589), the first child node is found, the node of the node range 25-69 of the subtree is found, the interval is satisfied, according to the node information (255067), the third child node is found, the search result is traversed, and the data information is returned to the search node.

Claims

1. Based on the high-efficiency data retrieval method of the improved block storage structure, a data retrieval model for realizing the method is mainly divided into 3 layers which respectively comprise a user layer, a query layer and a storage layer;

the method is characterized in that: a user initiates a data query request, firstly, a user layer is accessed, the data is queried from the cache data of the user layer, if the data which is consistent with the data is found in the cache, the searching is completed, and a result is returned; otherwise, continuing to access the query layer; the query layer processes query services and common node verification services in a labor division manner according to the stability and the computing power of the nodes, and query data are returned to the user layer; the storage layer is positioned at the bottommost layer and is responsible for storing block chain data and responding to the query request of the query layer.

2. The method of claim 1, wherein the method further comprises: the block chain storage structure based on the M _ H + tree structure combines the characteristics of efficient verification of the Mercker tree and quick retrieval of the B + tree; the node improves the storage structure of the Mercker tree, and consists of a hash value, an index address and an endpoint index value;

m _ H + tree establishment:

step 1, determining the structure of a B + tree;

and step 3: judging whether N is an M _ H + tree leaf node;

and 4, step 4: inserting an N node;

and 5: repeating the step 2;

the query steps are as follows:

step 1: inputting a block data query index;

step 2: determining a block head;

step 2: accessing block header data to obtain minval, maxval;

and 4, step 4: return data or find no data.

3. The method of claim 1, wherein the method further comprises: the method comprises the steps of establishing a tree and retrieving data;

the tree building process comprises the following steps:

the key to be inserted is key;

1) if the tree is empty, a leaf node is newly established, the record is inserted into the leaf node, and the leaf node is also a root node at the moment, so that the insertion operation is finished;

2) for leaf type nodes: finding a leaf node according to the key value, and inserting a record into the leaf node; after insertion, if the number of the current node key is not more than m-1, the insertion is finished; otherwise, splitting the leaf node into a left leaf node and a right leaf node, wherein the left leaf node comprises the first m/2 records, the right node comprises the rest records, carrying the key of the m/2+1 record into the father node, carrying the key into the father node, and carrying the key to the left child pointer of the father node to the left node and the right child pointer to the right node; pointing the pointer of the current node to the father node;

3) for index type nodes: if the number of the current node keys is less than or equal to m-1, ending the insertion; otherwise, splitting the index type node into two index nodes, wherein the left index node comprises the first (m-1)/2 keys, the right node comprises the m- (m-1)/2 keys, the m/2 key is carried into the father node, the left child of the key carried into the father node points to the left node, and the right child of the key carried into the father node points to the right node; and pointing the pointer of the current node to the parent node.