CN112800065A - Efficient data retrieval method based on improved block storage structure - Google Patents

Efficient data retrieval method based on improved block storage structure Download PDF

Info

Publication number
CN112800065A
CN112800065A CN202110182161.7A CN202110182161A CN112800065A CN 112800065 A CN112800065 A CN 112800065A CN 202110182161 A CN202110182161 A CN 202110182161A CN 112800065 A CN112800065 A CN 112800065A
Authority
CN
China
Prior art keywords
node
data
tree
query
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110182161.7A
Other languages
Chinese (zh)
Inventor
梁保陈
张兴兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110182161.7A priority Critical patent/CN112800065A/en
Publication of CN112800065A publication Critical patent/CN112800065A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a high-efficiency data retrieval method based on an improved block chain storage structure, which is characterized in that according to the efficiency problem existing in the current block chain storage and retrieval, starting from a block structure, a three-layer retrieval model based on a block chain system is firstly provided; secondly, according to the searching stability of the B + tree and the advantage of short query path, the memory structure of the Mercury tree is improved by combining the excellent verification performance of the Mercury tree, corresponding index data are added to the nodes, the data retrieval capability is improved on the premise of ensuring the verification efficiency, range query is supported, and a corresponding construction algorithm and a query algorithm are designed; finally, the effectiveness and the availability of the proposed tree structure are tested through experiments, and the method has good query performance and stability.

Description

Efficient data retrieval method based on improved block storage structure
Technical Field
The invention relates to a data retrieval method in a block chain, mainly aims at the design of a data structure of a block in the block chain, designs a data retrieval model based on the block chain, and belongs to the field of the block chain.
Background
The block chain technology has the characteristics of decentralization, non-tamper property and the like, can safely and effectively reduce the cost of a new task, can store data safely, and is a hotspot of current computer field research. The block chain has the characteristics of public transparency, verifiability, data traceability, non-falsification and the like. The system is a decentralized distributed account book formed by connecting a time stamp information with a block, and the data of the block is synchronized mainly by a consensus mechanism.
The blockchain technique originates from the paper "bitcoin" published by the chinese tomb in 2008: a point-to-point electronic cash system which lays the foundation of cryptocurrency, and since this bitcoin and many other cryptocurrencies have become popular words in the industry and academia. Bitcoin has enjoyed tremendous success as one of the most successful cryptocurrencies, and its capital market has reached $ 1000 million in 2016. The block chain technology is used as a bottom layer supporting technology of the bit currency, has the characteristics of openness, transparency, non-tampering and the like, and is mainly characterized in that the security and the reliability are ensured by the technology of cryptography. In the aspect of safety, the method mainly comprises a data signature technology, a public key and private key system idea and a timestamp mechanism; in terms of reliability, verification of nodes and data is mainly achieved by a consensus mechanism. The problem of the general of Byzantine, such as a workload consensus mechanism, a stock right certification and other distributed consensus mechanisms, is solved. The traditional trust problem and the double-flower attack problem are solved, and the traditional trust economic cost is reduced; in the aspect of data storage, the Mercker tree technology is mainly adopted. Androluaki et al introduced a super ledger architecture: the system is an extensible real blockchain system which supports a modularized consensus protocol, identity authentication and the like, allows the system to be customized for specific trust models and application scenes, is also a blockchain system which supports the operation of a standard general programming language, supports more high-level programming languages to perform contract development and deployment, and puts an intelligent contract rule into a container to operate. McConaghy et al propose BigChainDB, a large-scale distributed database, introduce a block chain technology to a traditional database by combining the characteristics of distributed control, non-tamper property and the like of the block chain technology, improve the security of data, propose a block chain database consistency strategy, and improve the capacity of data to support the storage of larger data on the storage capacity problem of the block chain. Wilkinson et al propose an end-to-end blockchain storage network where users transmit and share data independent of third party data providers, eliminating central control, and checking file availability and integrity with periodic passwords. Dinh et al have built a set of system framework for testing the blockchain system, have comprehensively evaluated the data model, consensus algorithm and super ledger, have separated storage, execution engine and consensus layer from each other, then have optimized and expanded them independently, and have tested the query ability of the data of the UStore, help the developer to identify the system bottleneck and improve the platform accordingly. Tsai et al analyzed the needs of block-chain application systems in terms of consistency, expandability, and database, and developed a "North aviation chain" in a double-chain architecture mode with the goal of expandability. Li et al propose an efficient block chain retrieval method by taking MongoDB as a repository, support range query and Top-K query, and have better flexibility, but Li et al also realize efficient retrieval by combining uplink and downlink databases of chains, store data by adopting the MongoDB database, store the data of a block chain under the chains in a K-V key value pair mode, query by using the characteristics of the data, do not solve the problem of the query efficiency of the block chain, and increase the communication traffic.
The Merck tree is a binary tree, which is composed of a group of leaf nodes, a group of intermediate nodes and a root node and is mainly applied to quick verification; b + tree B tree variants are widely used, and are mainly used in database index query and file systems. There are three types of B + tree major nodes: the leaf node, the root node and the internal node have good search characteristics, are a multi-path search tree and support range query.
Disclosure of Invention
According to the scheme, based on the efficiency problem of the current block chain storage and retrieval, starting from a block structure, a three-layer retrieval model based on a block chain system is provided firstly; secondly, according to the searching stability of the B + tree and the advantage of short query path, the memory structure of the Mercury tree is improved by combining the excellent verification performance of the Mercury tree, corresponding index data are added to the nodes, the data retrieval capability is improved on the premise of ensuring the verification efficiency, range query is supported, and a corresponding construction algorithm and a query algorithm are designed; finally, the effectiveness and the availability of the proposed tree structure are tested through experiments, and the method has good query performance and stability. The data retrieval model is mainly divided into 3 layers which respectively comprise a user layer, a query layer and a storage layer. As shown in fig. 1.
A user initiates a data query request, firstly, a user layer is accessed, the data is queried from the cache data of the user layer, if the data which is consistent with the data is found in the cache, the searching is completed, and a result is returned; otherwise, the query layer continues to be accessed. The query layer processes the query service and the common node verification service in a labor division manner according to the stability and the computing power of the node, and query data are returned to the user layer. The storage layer is positioned at the bottommost layer and is mainly responsible for storing block chain data and responding to a query request of the query layer. For the storage structure of the block, a storage structure based on an improved Merckel tree is proposed.
The block chain storage structure based on the M _ H + tree structure combines the characteristics of efficient verification of the Mercker tree and quick retrieval of the B + tree, and the data structure is shown in FIG. 2. The node mainly improves the memory structure of the Mercker tree and mainly comprises a hash value, an index address and an end point index value.
The insertion algorithm and the search algorithm are mainly designed.
The M _ H + tree establishment algorithm comprises the following steps:
step 1, determining the structure of a B + tree;
step 2: taking out leaf data N of the B + tree according to the result of the step 1;
and step 3: judging whether N is an M _ H + tree leaf node;
and 4, step 4: inserting an N node;
and 5: repeating the step 2;
the algorithm is as follows: m _ H + tree establishment algorithm
Input is a constructed B + tree structure;
output is M _ H + tree structure storage model;
1. confirming the constructed B + tree structure;
2. processing data from the bottommost node according to the structure of the B + tree;
front (B + tree node N) front
Opening at leaf node
5. Hash data;
6.}
7.else{
8. acquiring a minimum index;
9. acquiring a maximum index;
10. recording the values of minval and maxval of the node;
11. data hashing;
12.}
13.N=N-1;
14.}
15. the finally obtained root is stored in the block head as a result;
the query algorithm comprises the following steps:
step 1: inputting a block data query index;
step 2: determining a block head;
step 2: accessing block header data to obtain minval, maxval;
and step 3: judging whether the index is between the interval minval-maxval and entering the next layer;
and 4, step 4: returning data or finding no data;
the algorithm is as follows: m _ H + tree query algorithm
Input: block chain data, query condition Index;
output is the transaction result.
If (node does not satisfy stability condition)
2. Finding neighboring super nodes
3.}
Front (traversal block)
5. Accessing a block header tree root minval, maxval value;
6.while(minval≤Index≤maxval){
front opening of If (leaf node)
Front (traversing leaf node) front
If (index found) leaf
Return data;
11.}
12.}
13.}
14. acquiring new node information;
15. updating minval and maxval;
16.}
17. accessing a next block;
}
in the storage structure of the current block chain, the block body is mainly stored based on a merkel tree structure, the merkel tree is a tree with high verification speed, but the efficiency is low because all data needs to be traversed in the aspect of query. The technology of the invention is based on a block storage structure, designs an improved memory structure of the Mercker tree, and is mainly based on the advantages of a B + tree: the search performance is stable, the query path is short, and the range query is supported. The retrieval speed is improved without changing the original quick verification. And (3) searching performance: the data retrieval efficiency is almost maintained within an average time, and the problem that different data retrieval times are greatly different does not exist. The query path is short: depending on the nature of the B + tree, a three-level B + tree can store thousands of data, whereas the Merck tree requires more than ten levels of tree height. And (3) range query: the bottom layer of the designed tree structure is ordered data connected by a linked list, and a whole section of data can be inquired according to a data range.
Drawings
FIG. 1 data retrieval model.
Fig. 2 data structure.
Fig. 3 new tree structure.
Detailed Description
The present invention is described in detail below with reference to the accompanying drawings and embodiments, and the embodiments mainly include tree building and data retrieval processes. The process of building the tree is as follows:
the key to be inserted is a key.
1) And if the tree is empty, a leaf node is newly established, the record is inserted into the leaf node, the leaf node is also the root node at the moment, and the insertion operation is finished.
2) For leaf type nodes: and finding a leaf node according to the key value, and inserting a record into the leaf node. After insertion, if the number of the current node key is not more than m-1, the insertion is finished. Otherwise, splitting the leaf node into a left leaf node and a right leaf node, wherein the left leaf node comprises the first m/2 records, the right node comprises the rest records, carrying the key of the m/2+1 record into the father node (the father node is necessarily an index type node), carrying the key into the father node, wherein the left child pointer of the key is the left node, and the right child pointer is the right node. And pointing the pointer of the current node to the parent node, and then executing the next step.
3) For index type nodes: and if the number of the current node keys is less than or equal to m-1, ending the insertion. Otherwise, splitting the index type node into two index nodes, wherein the left index node comprises the first (m-1)/2 keys, the right node comprises the m- (m-1)/2 keys, the m/2 key is carried into the father node, the left child of the key carried into the father node points to the left node, and the right child of the key carried into the father node points to the right node. The pointer of the current node is pointed to the parent node and the operation is repeated.
As shown in fig. 3, the B + tree is inserted with keys 40, 45, and 62 sequentially.
The searching mode firstly inquires from an application layer, retrieves whether the inquired data is in a cache or not, if the inquired data exists, the data is directly returned, otherwise, the inquiring layer searches for the super node inquiry, the storage layer inquiry is carried out through the super node, the storage structure of the storage layer is an improved Mercker tree structure, and the inquiry is carried out through an inquiry algorithm.
In the M _ H + tree shown in fig. 3, to search for data with an index address of 69, it is first determined from the root node that the index range is 25-99, the interval in which 69 is located is satisfied, according to the access root node (2589), the first child node is found, the node of the node range 25-69 of the subtree is found, the interval is satisfied, according to the node information (255067), the third child node is found, the search result is traversed, and the data information is returned to the search node.

Claims (3)

1. Based on the high-efficiency data retrieval method of the improved block storage structure, a data retrieval model for realizing the method is mainly divided into 3 layers which respectively comprise a user layer, a query layer and a storage layer;
the method is characterized in that: a user initiates a data query request, firstly, a user layer is accessed, the data is queried from the cache data of the user layer, if the data which is consistent with the data is found in the cache, the searching is completed, and a result is returned; otherwise, continuing to access the query layer; the query layer processes query services and common node verification services in a labor division manner according to the stability and the computing power of the nodes, and query data are returned to the user layer; the storage layer is positioned at the bottommost layer and is responsible for storing block chain data and responding to the query request of the query layer.
2. The method of claim 1, wherein the method further comprises: the block chain storage structure based on the M _ H + tree structure combines the characteristics of efficient verification of the Mercker tree and quick retrieval of the B + tree; the node improves the storage structure of the Mercker tree, and consists of a hash value, an index address and an endpoint index value;
m _ H + tree establishment:
step 1, determining the structure of a B + tree;
step 2: taking out leaf data N of the B + tree according to the result of the step 1;
and step 3: judging whether N is an M _ H + tree leaf node;
and 4, step 4: inserting an N node;
and 5: repeating the step 2;
the query steps are as follows:
step 1: inputting a block data query index;
step 2: determining a block head;
step 2: accessing block header data to obtain minval, maxval;
and step 3: judging whether the index is between the interval minval-maxval and entering the next layer;
and 4, step 4: return data or find no data.
3. The method of claim 1, wherein the method further comprises: the method comprises the steps of establishing a tree and retrieving data;
the tree building process comprises the following steps:
the key to be inserted is key;
1) if the tree is empty, a leaf node is newly established, the record is inserted into the leaf node, and the leaf node is also a root node at the moment, so that the insertion operation is finished;
2) for leaf type nodes: finding a leaf node according to the key value, and inserting a record into the leaf node; after insertion, if the number of the current node key is not more than m-1, the insertion is finished; otherwise, splitting the leaf node into a left leaf node and a right leaf node, wherein the left leaf node comprises the first m/2 records, the right node comprises the rest records, carrying the key of the m/2+1 record into the father node, carrying the key into the father node, and carrying the key to the left child pointer of the father node to the left node and the right child pointer to the right node; pointing the pointer of the current node to the father node;
3) for index type nodes: if the number of the current node keys is less than or equal to m-1, ending the insertion; otherwise, splitting the index type node into two index nodes, wherein the left index node comprises the first (m-1)/2 keys, the right node comprises the m- (m-1)/2 keys, the m/2 key is carried into the father node, the left child of the key carried into the father node points to the left node, and the right child of the key carried into the father node points to the right node; and pointing the pointer of the current node to the parent node.
CN202110182161.7A 2021-02-09 2021-02-09 Efficient data retrieval method based on improved block storage structure Pending CN112800065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110182161.7A CN112800065A (en) 2021-02-09 2021-02-09 Efficient data retrieval method based on improved block storage structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110182161.7A CN112800065A (en) 2021-02-09 2021-02-09 Efficient data retrieval method based on improved block storage structure

Publications (1)

Publication Number Publication Date
CN112800065A true CN112800065A (en) 2021-05-14

Family

ID=75815070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110182161.7A Pending CN112800065A (en) 2021-02-09 2021-02-09 Efficient data retrieval method based on improved block storage structure

Country Status (1)

Country Link
CN (1) CN112800065A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626432A (en) * 2021-08-03 2021-11-09 浪潮云信息技术股份公司 Improvement method of self-adaptive radix tree supporting any Key value
CN113779025A (en) * 2021-08-06 2021-12-10 西安电子科技大学 Optimization method, system and application of classified data retrieval efficiency in block chain
CN113901131A (en) * 2021-09-02 2022-01-07 北京邮电大学 Index-based on-chain data query method and device
CN114756603A (en) * 2022-05-23 2022-07-15 天津大学 High-efficiency verifiable query method for lightweight block chain
CN116303586A (en) * 2022-12-09 2023-06-23 中电云数智科技有限公司 Metadata cache elimination method based on multi-level b+tree

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165224A (en) * 2018-08-24 2019-01-08 东北大学 A kind of indexing means being directed to keyword key on block chain database
CN112035491A (en) * 2020-09-30 2020-12-04 中山大学 Data storage method based on block chain, electronic integral processing method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165224A (en) * 2018-08-24 2019-01-08 东北大学 A kind of indexing means being directed to keyword key on block chain database
CN112035491A (en) * 2020-09-30 2020-12-04 中山大学 Data storage method based on block chain, electronic integral processing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑浩瀚;申德荣;聂铁铮;寇月;: "面向混合索引的区块链***的可查询性优化", 计算机科学, no. 10, pages 309 - 316 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626432A (en) * 2021-08-03 2021-11-09 浪潮云信息技术股份公司 Improvement method of self-adaptive radix tree supporting any Key value
CN113626432B (en) * 2021-08-03 2023-10-13 上海沄熹科技有限公司 Improved method of self-adaptive radix tree supporting arbitrary Key value
CN113779025A (en) * 2021-08-06 2021-12-10 西安电子科技大学 Optimization method, system and application of classified data retrieval efficiency in block chain
CN113779025B (en) * 2021-08-06 2024-01-26 西安电子科技大学 Optimization method, system and application of classified data retrieval efficiency in block chain
CN113901131A (en) * 2021-09-02 2022-01-07 北京邮电大学 Index-based on-chain data query method and device
CN113901131B (en) * 2021-09-02 2024-06-07 北京邮电大学 Index-based on-chain data query method and device
CN114756603A (en) * 2022-05-23 2022-07-15 天津大学 High-efficiency verifiable query method for lightweight block chain
CN116303586A (en) * 2022-12-09 2023-06-23 中电云数智科技有限公司 Metadata cache elimination method based on multi-level b+tree
CN116303586B (en) * 2022-12-09 2024-01-30 中电云计算技术有限公司 Metadata cache elimination method based on multi-level b+tree

Similar Documents

Publication Publication Date Title
CN112800065A (en) Efficient data retrieval method based on improved block storage structure
CN111339106B (en) Block chain data indexing method
CN109165224B (en) Indexing method for key words on block chain database
CN102122285B (en) Data cache system and data inquiry method
US20230109969A1 (en) Data processing method and apparatus based on node internal memory, device and medium
KR102232641B1 (en) Method for searching using data structure supporting multiple search in blockchain based IoT environment, and apparatus thereof
CN102945249B (en) A kind of policing rule matching inquiry tree generation method, matching process and device
US8229916B2 (en) Method for massively parallel multi-core text indexing
CN104794123A (en) Method and device for establishing NoSQL database index for semi-structured data
CN113821564B (en) Heterogeneous parallel blockchain and method for coordinating on-chain data and under-chain contracts thereof
CN101771537A (en) Processing method and certificating method for distribution type certificating system and certificates of certification thereof
WO2021190179A1 (en) Synchronous processing method and related apparatus
CN110109874A (en) A kind of non-stop layer distributed document retrieval method based on block chain
CN101277252A (en) Method for traversing multi-branch Trie tree
CN115052008B (en) Block chain data under-chain storage method based on cloud storage
Liu et al. Finding smallest k-compact tree set for keyword queries on graphs using mapreduce
CN114791788B (en) Data storage method and device based on block chain
CN116701452A (en) Data processing method, related device, storage medium and program product
Zegour Scalable distributed compact trie hashing (CTH*)
CN114095373A (en) Knowledge graph-based alliance chain management method, system, equipment and storage medium
CN112463890B (en) Cross-system data sharing method based on block chain and machine learning
CN113495982B (en) Transaction node management method and device, computer equipment and storage medium
CN112035485B (en) Method and system for realizing efficient query of credit information data based on distributed architecture
CN117056342B (en) Data processing method based on block chain and related equipment
WO2023160040A1 (en) Data processing method and apparatus based on blockchain, and device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination