CN112835977A

CN112835977A - Database management method and system based on block chain

Info

Publication number: CN112835977A
Application number: CN202110073325.2A
Authority: CN
Inventors: 王明生; ***; 亓彬; 范雄; 马杰; 王晓蓓; 罗明; 邓智洪
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2021-01-20
Filing date: 2021-01-20
Publication date: 2021-05-25
Anticipated expiration: 2041-01-20
Also published as: CN112835977B

Abstract

The invention discloses a database management method and a system based on a block chain, which comprises the following steps: the monitoring server divides the data in the database into m sub-tables, performs k backups on each sub-table, and stores each backup in each database node P_iPerforming the following steps; a user side sends a data operation request to a database server where data are located; database server pre-execution transaction B_jTo pre-execute the transaction result T_jSending the data to a sequencing server; the sequencing server packages the transactions after sequencing, and sends blocks containing Q transactions to a database node P_iThe output block information is synchronously sent to the monitoring server; each database node P_iBlock execution of transactions B_jAnd recording the execution result to a local database. The invention ensures that the system can deal with large-scale high-concurrency actual production environment, improves the data storage efficiency and the safety, and leads the system to be more flexible and diversified; the problems of data storage safety, access control, user privacy and the like are solved.

Description

Database management method and system based on block chain

Technical Field

The invention belongs to the technical field of computer application, and particularly relates to a database management method and system based on a block chain.

Background

The block chain technology is applied to a bit currency system for the first time, and can support long-term stable operation of the system without centralized management as a distributed accounting platform behind the bit currency. A de-centering system based on a block chain can effectively solve the problems of high cost, low efficiency, unsafe data storage and the like commonly existing in a centering system.

Core advantages of blockchains include decentralization, time series data, collective maintenance, programmability, security, trust, and the like. In a block chain system, the processes of data verification, accounting, storage, maintenance, transmission and the like are all based on a distributed consensus mechanism, and all nodes realize game balance through methods of cryptography, mathematics and the like so as to maintain the consistency and the safety of data. Moreover, the block chain adopts a chain block structure with a time stamp to store data, so that the time dimension is increased for the data, and the block chain has extremely strong verifiability and traceability (time sequence data). The collective maintenance means that the block chain system adopts a specific consensus mechanism to ensure that each participating node agrees with the block data and the account book, and ensures that all nodes maintain the consistency, the safety and the traceability of the data. The programmable block chain technology provides a flexible intelligent contract mechanism, and various customized service logic operation strategies can be provided for users, so that the operability of a block chain system is improved. However, the block chain has performance problems, and the block chain has a block structure which cannot continuously store data in a large quantity, so that the expansibility of the block chain is limited.

A database is a "warehouse that organizes, stores, and manages data according to a data structure," which is an organized, sharable, uniformly managed collection of large amounts of data that is stored in a computer for a long period of time. With the development of technology, databases have evolved into various data management modes required by users, and the data management modes are widely applied from simplest tables for storing data to large database systems capable of storing mass data. Meanwhile, in order to meet the storage requirement of a complex scene, distributed data is more and more concerned by people, and the development of the distributed data has been greatly developed in the aspects of flexible system structure, adaptability to the distributed storage scene, superior economic performance, reliable system, good availability, fast response speed, good expandability and the like, and is just the 'infrastructure' of government and enterprise information processing.

However, the existing distributed database system also has the problems of poor security and confidentiality of data, complex access structure and the like; moreover, due to different business data logics and responsibility divisions among government units and enterprises, data cannot be effectively shared. How to construct credible intercommunication of data among different relevant departments and further improve the stability and high efficiency of social operation is always a problem to be solved urgently in each world. The invention utilizes the advantages of decentralized block chain, safety and transparency, simultaneously keeps the characteristics of flexibility, high availability, economy and high efficiency of the distributed database, and ensures the data consistency of various database systems (such as MySQL and SQLserver).

Disclosure of Invention

In order to solve the problems, the invention provides a database management method and a database management system based on a block chain, namely, a safe, reliable and efficient chained database system is built by combining the advantages of decentralized, transparent, safe and distributed databases, such as high efficiency, strong expandability, economy, applicability, quick response and the like.

The technical scheme of the invention is as follows:

a database management method based on a block chain is suitable for the block chain consisting of a group of monitoring server clusters, a group of sequencing server clusters, n database nodes and a plurality of user terminals, and comprises the following steps:

1) the monitoring server divides the data in the database into m sub-tables, performs k backups on each sub-table, and stores each backup in each database node P_iWherein k is more than or equal to 2 and less than or equal to m and k, and i is more than or equal to 1 and less than or equal to n;

2) when a user side needs to acquire data, sending a data operation request to a database server where the address is located according to the address of the database server where the data is located by inquiring a monitoring server and/or inquiring a local data and database server address mapping table;

3) the database server where the address is located pre-executes transaction B_jAnd will pre-execute transaction result T_jSending the data to a sequencing server;

4) the ranking server receives a set of pre-execution results for each transaction T_jAfter the transaction is sequenced, the transaction is packed into blocks, and the blocks containing Q transactions are sent to a database node P_iMeanwhile, the information of the output block is synchronized to the monitoring server, wherein j is more than or equal to 1 and is less than or equal to Q;

5) each database node P_iExecuting each transaction B in turn according to the block containing Q transactions_jAnd recording the execution result to a local database.

Further, the private data is stored to a database by:

1) the data owner generates an encryption key ek and a decryption key dk;

2) encrypting a plaintext D of the private data by using an encryption key ek to obtain a ciphertext [ D ];

3) by an active deviceThe new threshold secret sharing scheme shares the decryption key dk into multiple segments dk_iAnd use node P_iIs to fragment dk_iEncrypting to obtain segment ciphertext [ dk_i]Wherein i is more than or equal to 1 and less than or equal to n;

4) the data owner calculates the hash value H ═ H (D, [ D ] according to the set access policy]Policy), create transaction T_W＝<h，policy>And accompanied by its signature σ_W；

5) Will trade T_WC, signature σ_WCiphertext [ D ]]And segment cipher text [ dk_i]Sent to a set P comprising all or part of the database nodes P_i；

6) Database node P in set P_iVerifying signature sigma_WDecrypting and verifying the segment ciphertext [ dk ]_i]To trade T_WStore to block chain, ciphertext [ D ]]Storing to partial or all database nodes P in the set P_i。

Further, the private data in the database is accessed by:

1) creation of a transaction T by a data consumer_R＝<attr，pk_B>Requesting access to the set P and attaching the signature of the data consumer, wherein attr is the attribute of the data consumer, pk_BIs the identity of the data user;

2) if the attribute attr satisfies the access policy, the set P will pass through the identity pk_BEncrypted segment and ciphertext [ D ]]And sending the data to the data user.

Further, the set P periodically runs the update algorithm of the proactive update threshold secret sharing scheme to update all the segments for global state update.

Further, the user side obtains the address of the database server where the data is located through the following strategies:

1) the monitoring server is inquired to obtain the address of the database server where the data is located, and the method comprises the following steps:

a) sending a request F to a monitoring server⁰Obtaining data node information H⁰；

b) Root of herbaceous plantAccording to data node information H⁰To k database nodes

Requesting the data;

c) if k database nodes

All can return the data correctly, then k database nodes are connected

As the database server address where the data is located; otherwise, sending a request F to the monitoring server¹So that the monitoring server informs the database node that the data cannot be returned correctly

From a database node that can return said data correctly

The data is obtained, and k database nodes are connected

As the database server address where the data is located;

2) inquiring a mapping table of local data and a database server address to obtain a database server address where the data is located, wherein the steps comprise:

a) obtaining data node information H by inquiring local data and database server address mapping table¹；

b) According to data node information H¹To k database nodes

Requesting the data;

c) if the database node

The part can return the data correctly and sends k database nodes

As the database server address where the data is located;

3) the method comprises the following steps of inquiring a monitoring server and inquiring a local data and database server address mapping table to obtain a database server address where data are located, wherein the steps comprise:

a) obtaining data node information H by inquiring a local mapping relation table²；

b) According to data node information H²To k database nodes

Requesting the data;

c) if k database nodes

Can not all return the data correctly, sends a request F to the monitoring server²So that the monitoring server judges the consistency of the local mapping relation table and the mapping relation table in the monitoring server;

if the data are consistent, the monitoring server informs the database nodes which can not return the data correctly

From a database node that can return said data correctly

The data is obtained, and k database nodes are connected

As the database server address where the data is located;

if not, updating the local mapping relation table through the monitoring server, and obtaining data node information H³(ii) a According to data node information H³To k database nodes

Requesting the data;

if k database nodes

All can return the data correctly, the database node is connected

As the database server address where the data is located;

if k database nodes

If the data can not be returned correctly, a request F is sent to the monitoring server³So that the monitoring server informs the database node that the data cannot be returned correctly

From a database node that can return said data correctly

The data is obtained, and k database nodes are connected

As the database server address where the data is located.

Further, the ranking server comprises: a conflict transaction preprocessing module, a kafka server and a zookeeper cluster; the kafka server comprises: a plurality of producers, a plurality of service broker nodes and a plurality of consumers; generating a block containing Q transactions by:

1) conflict transaction preprocessing module receives each set of pre-execution results T_jPreprocessing conflict transactions to obtain preprocessed transaction transactions;

2) sending the preprocessed transaction to service agent nodes, and taking the sequence of the transaction reaching the consumption subject subarea as a sequencing sequence, wherein a zookeeper cluster manages each service agent node, and the management content comprises the following contents:

a) recording message topics divided by the kafka server and corresponding proxy nodes;

b) recording message consumption progress offset records;

c) when some transaction transactions are packed and sent to the blockchain network, zookeeper timely moves the offset pointer to the consumption record which is not used;

3) when the packaging condition is satisfied, the transaction result T is executed for each pre-execution_jPackaging is carried out, and a block containing Q transactions is generated.

Further, the pre-processed transaction is obtained by:

1) the sequencing server receives the transaction results T_jEstablishing a conflict graph;

2) acquiring mutually conflicting transactions in the execution operation according to the conflict graph, and establishing a directed graph;

3) identifying the strongly connected subgraph in the directed graph by using a Tarjan algorithm, confirming all periods of the strongly connected subgraph by using a Johnson algorithm, and marking the occurrence frequency of each transaction in the directed graph;

4) according to all periods of the directed graph, the strongly connected subgraphs and the occurrence frequency of each transaction in the directed graph, for the subgraphs which occur in the repeated period, deleting the transactions which participate in the period from the transactions which occur in most periods and have the most occurrence frequency iteratively until all rings in the directed graph are removed to obtain a directed acyclic graph;

5) and constructing a transaction sequence according to the directed acyclic graph to obtain the preprocessed transaction.

Further, the monitoring server provides a user message subscription and publication service to the user terminal, where the user message includes: updating the current block height, the data node index and the address; the monitoring server provides a node message subscription and publishing service to the database node, wherein the node message comprises: current block height, node operating state and data storage information.

Further, for the newly added data, performing database data division and table division through the following strategies:

1) the monitoring server random distribution mode comprises the following steps:

a) the monitoring server acquires the number of real addresses in each database node;

b) randomly selecting a real address to store the newly added data according to the number of real addresses in each database node, and establishing an index of the ID and the real address of the newly added data;

2) the ID distribution method comprises the following steps:

a) adding the newly added data into the set G according to the ID of the newly added data_iWherein the set G_iStored in a database node and set G_iDividing original data according to the ID of the original data in the database to obtain the original data;

b) and establishing an index of the ID and the real address of the newly added data.

3) The die Hash is distributed in sequence, and the steps comprise:

a) calculating the remainder hash (ID) of the hash value of the newly added data ID;

b) obtaining a database node for newly adding data storage by calculating i ═ hash (ID) + t +1, wherein t is more than or equal to 0 and less than or equal to n-1;

c) establishing an index of the ID and the real address of the newly added data;

4) the heat distribution mode comprises the following steps:

a) searching whether the newly added data contains hotspot query data;

b) if yes, separately tabulating the hot spot query data and the id of the newly added data, and tabulating other data in the newly added data; otherwise, the newly added data is subjected to table division;

c) an index is established with the real address.

10. A blockchain-based database management system, comprising:

the monitoring server is used for splitting the data in the database into m sub-tables, performing k backups on each sub-table, and storing each backup in each database sectionPoint P_iPerforming the following steps; wherein k is more than or equal to 2 and less than or equal to m and k, and i is more than or equal to 1 and less than or equal to n; returning the address of the database server where the data is according to the request of the user side; receiving block output information containing Q transaction blocks;

the system comprises a client, a monitoring server and a database server, wherein the client is used for inquiring the monitoring server and/or inquiring a local data and database server address mapping table when data needs to be acquired, and sending a data operation request to a database server where a data is located according to the address of the database server where the data is located;

a sequencing server for receiving a set of pre-execution results { T } for each transaction_jAfter the transaction is sequenced, the transaction is packed into blocks, and the blocks containing Q transactions are sent to a database node P_iMeanwhile, the information of the output block is synchronized to the monitoring server, wherein j is more than or equal to 1 and is less than or equal to Q; (ii) a

Database node P_iTo pre-execute transaction B_jAnd will pre-execute transaction result T_jSending the data to a sequencing server; executing each transaction B in turn according to the block containing Q transactions_jAnd recording the execution result to a local database.

Compared with the prior art, the invention has the following advantages:

1) the invention designs a safe, quick and efficient alliance chain consensus algorithm, which can ensure that the method can cope with large-scale high-concurrency actual production environment;

2) the invention designs a monitoring cluster service as an auxiliary mechanism of the system, improves the data storage efficiency and the safety of the system on the basis of not influencing the characteristics of a block chain and a database and not participating in the actual business logic of the system, and also leads the system to be more flexible and diversified;

3) the invention specially designs a private data storage and access control scheme, and solves the problems of data storage safety, access control, user privacy and the like.

Drawings

FIG. 1 is a diagram of an example architecture of a system.

Fig. 2 shows the main workflow of the monitoring server (monitor).

FIG. 3 is a flow chart of database sorting.

Fig. 4 shows a random allocation manner of the monitoring servers.

Fig. 5 shows a monitoring server ID assignment method.

Fig. 6 shows a mode of sequential Hash allocation of the monitoring servers.

Fig. 7 shows how the monitoring servers are assigned according to the heat.

Fig. 8 is a basic flow diagram of the consensus mechanism.

Fig. 9 shows an access control system configuration.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

The database management system manages the existing various database systems based on the block chain thinking, keeps the block chain going to the center and credible safety, and can also give consideration to the characteristics of flexibility, high performance and economy of the database. Based on the thought, the invention firstly designs an efficient, rapid and safe alliance chain block chain consensus mechanism to ensure that each node can rapidly process a large number of concurrent database operation requests; moreover, the consensus algorithm designs a specific conflict transaction processing mechanism, and can find and process conflicting database read-write operations in time before transaction sorting blocks; meanwhile, in order to better cope with the large-scale transaction concurrency scene, the invention has the advantages that a component specially used for transaction sequencing is separated in the sequencing server, and the mature zookeeper and kafka technology is used for coping with the requirement of large-scale data concurrency. Secondly, in order to meet the requirement of system service diversification and simultaneously lighten the redundancy of data storage of each node, the invention designs a supervision service node (hereinafter referred to as monitoring service) for distributing and managing data storage; the design idea is that different node database storage services are divided according to different service logics or specific allocation algorithms, or each data storage is managed by a corresponding number of nodes, so that unnecessary data storage of node management can be avoided, a certain number of nodes can be maintained in each data storage, and the safety and reliability of data are improved while the system storage is lightened as much as possible; in addition, the monitoring server can also provide message subscription service for users or block link points, and timely release services such as block confirmation, system version number, block height and the like, so that the operation efficiency of the system is improved. Finally, in order to solve the problems of security of system data privacy, data censorability and the like, the invention provides a safe private data storage and access control scheme, which can safely store private data while maintaining the inherent characteristics (unchangeable, transparent, decentralized and the like) of a block chain, authorize access according to the requirement and reduce the risk of privacy disclosure.

Fig. 1 shows a system example framework diagram of the present invention, and the main technical solution of the present invention includes three aspects of a monitoring service mechanism, a blockchain consensus algorithm, and a private data storage and access control scheme, which are specifically discussed as follows:

monitoring service (monitor) mechanism

In order to save the data storage space of the system, improve the read-write efficiency of the database and slow down the service pressure of the sequencing server, the invention designs a monitoring cluster service named as a monitoring server as an auxiliary mechanism of the system, improves the data storage efficiency and the safety of the system on the basis of not influencing block chains and database characteristics and not participating in the actual service logic of the system, and simultaneously slows down the load pressure of the sequencing service; in addition, the monitoring server can also provide message subscription service for users or block chain nodes by combining with sequencing service so as to enable the users or the block chain nodes to acquire the running state of the system in time, thereby avoiding the problems of transaction conflict, user addressing error, block acquisition delay of nodes and the like caused by system message delay. The main work flow of the monitoring server is shown in fig. 2, the main functions of the monitoring server will be discussed in detail mainly from the message subscription and publication service and the database sub-table service, and for convenience of discussion, the roles related to the monitoring server are divided into the following three categories:

monitoring server (monitor): a principal that provides the present monitoring service.

Database node (or block link point): multiple nodes of the database maintain the same complete form, and each node maintains a part of the form. Because of multiple backups of a database form, a certain piece of data in the same form is stored in multiple database nodes. In addition, the method also has the function of checking the legality of the user request, if the user has the inquiry authority, the real address of the data is returned, otherwise, the user is refused to inquire the data.

The user: a user side requests data related operation in a database, and after the first request is successful, a mapping relation table is locally stored, and the id (serial number) of the data and the corresponding real address are recorded; while the user also subscribes to the monitoring server for messages of interest.

1. Message subscription and publication service

The monitoring server can assist in monitoring the running states of each node of the block chain and the sequencing server at any time without interfering the core running logic of the system, and can know whether the nodes are down and the related information of the system blocks, so as to coordinate the running process of the system and provide the message subscription and release service for the user side and the block chain nodes.

And the monitoring server provides the current block height, data node index, address update and other message subscription and release services for the user terminal. Therefore, a user can master the management state of the database in real time, and further, the transaction can be initiated and finally confirmed more quickly and effectively, and the efficient operation of the system is ensured.

And facing to the block chain nodes, the monitoring server provides the current block height, the node running state, the data storage information and other message subscription and release services for the monitoring server, and simultaneously coordinates the data storage of each node. If a certain node goes down, the monitoring server can arrange a new node to backup the data stored by the down node from other nodes, so that each piece of data can be stored by a sufficient number of nodes, and the safety of the distributed database is further protected. Moreover, under the conditions of part of messages being delayed or delayed and restarted, some nodes may not be able to timely acquire the generation information of the new block, so that the transaction endorsement initiated for the user cannot be correctly made, and finally the transaction is failed, thereby greatly wasting system resources; in this case, the monitoring server can inform the node of the data of the adjacent server synchronization block which has good operation and smooth communication in time, and update the corresponding data in time so as to avoid unnecessary resource consumption.

2. Data sub-base and sub-table service

The monitoring server provides a linked database sub-database and sub-table service for the system, on one hand, customized operation based on the monitoring server can be adopted according to the requirements of users, and the business requirements of users such as query and the like are facilitated. On the other hand, by using the database-dividing and table-dividing technology, the storage space of the database can be saved while the data maintenance is convenient, and the reading and writing efficiency of the database is improved. The flow chart of the database sub-table is shown in fig. 3, and the specific operation flow is detailed as follows:

1) the user end requests data, firstly, the local mapping relation table is inquired, if the real address corresponding to the data does not exist, the real address of the data is requested to the monitoring server, wherein a plurality of data requests or composite type conditional data requests can be sent to the monitoring server at the same time. If the real address corresponding to the data exists, requesting the data from the corresponding database;

2) after receiving a request of a user side, the monitoring server obtains node information of the data by analyzing a data id (serial number) required by the user, and returns all the node information to the user;

3) after acquiring data node information through a local mapping relation table or a monitoring server, a user side requests data from a corresponding database node;

4) after receiving the application of the user request data, the database node firstly verifies the identity of the user and examines whether the user has the corresponding authority of the data. If the data request has the corresponding authority, returning a data request result corresponding to the user; otherwise, the user is informed that the user does not have the corresponding authority. And if the data requested by the user is not found in the database, informing the user that the data is missing. On one hand, data loss may be caused by data loss of the database, on the other hand, due to conditions such as database node expansion, the data is transferred to other databases, and a user successfully accesses the data through the databases before, but after the database node expansion, the local mapping relation of the user is not updated, so that the user cannot access the data, and at the moment, the returned data is lost, and the user is informed that the user needs to request the monitoring server to obtain a real data address again;

5) and the user side acquires a return result of the database node. If the requested data is returned, the data request process ends. If the returned data is missing, the user node requests the monitoring server for a correct data address again and informs the monitoring server of the missing data of the database;

6) and the monitoring server receives a data missing report of the user node. Firstly, whether the data request address is updated by the monitoring server due to the expansion of the database node is judged, and the user does not update the local address of the user. At this time, the real address of the user data is returned again. Otherwise, if the data is judged to be the data loss caused by the problems of the database node and possibly caused by external conditions or interference factors, the monitoring server informs the database node that the data needs to be acquired again, and simultaneously sends the backup address of the real data analyzed by the monitoring server to the database node. At this time, the monitoring server also sends the backup address of the real data to the user side;

7) and the database node receives the message of needing to reacquire the data, requests the data from the backup address of the real data acquired by the monitoring server, and updates the corresponding data.

8) And the user side receives the real data backup address of the monitoring server and re-requests the data.

The data distribution method of the chained database nodes designed by the invention has the following ideas: according to the set rule, m database nodes maintain the same table together, k (k is more than or equal to 2 and less than or equal to n) (n is the total number of the block chain link points) backups are carried out on the same table, namely the total data needs to be split into m sub-tables which are marked as D₁，…，D_mEach sub-tableNeeds to be stored in at least k servers. In an ideal case, only one sub-table is stored in one database, and then m · k database nodes are needed. In actual deployment, one database node is allowed to store a plurality of sub-tables, so that the required number of database nodes is at least k and at most m · k. If the number of the backup nodes is less than k, k backup nodes cannot be performed, and if the number of the backup nodes is more than m · k, the resource of the database node is wasted, and the value of m or k needs to be fully allocated or adjusted. In particular, when the database node is equal to k, it maintains a complete table for each database node, having k backups, and this time, it is used in the case of the form data size is not large and there is no need for table splitting. Based on the method, four database data sub-database sub-table modes including random distribution, ID distribution, modular Hash sequential distribution and heat distribution are designed, and the specific contents are discussed as follows:

1) random distribution mode of monitoring server

Fig. 4 is a schematic diagram of a random allocation manner of the monitoring servers. For the newly added data, the monitoring server first detects whether the data is recorded. If not, the newly added data is identified. The monitoring server randomly selects a real address to store the data according to the number of the real addresses of the database stored by the monitoring server, and establishes an index of the id of the data and the real address, so that the user returns the corresponding real address when requesting the data according to the id.

The distribution mode has the advantages that the distribution mode is uniform, the hot spot data requested by the user are uniformly distributed in each database node, and the tasks required to be borne by each database node are relatively balanced. In addition, if the database is expanded, the existing database does not need to be migrated, and only the monitoring server is required to store the data in a certain proportion to the newly added data nodes when adding new data until the number of the newly added data nodes is close to that of other databases. The monitoring server will then randomly assign the newly added data to the respective databases. The disadvantage is that the acquisition of the data address is entirely dependent on the allocation of the monitoring server.

2) ID distribution mode

Fig. 5 is a schematic diagram illustrating an ID assignment method. And the databases allocate data storage positions according to the ID, for example, the 1 st to 100 th databases are placed in the 1 st database node and simultaneously placed in the 8 th database node, the 101 st and 200 th databases are placed in the 2 nd and 9 th database nodes, and so on. The advantage of such allocation is that the monitoring server does not need to store a one-to-one corresponding index table, and only needs to determine the real address of the data according to the interval. The scheme is suitable for the condition that the hot requests of the database entries are uniform, namely the condition that new users frequently use to request new data for multiple times and old users rarely use to cause sparse data requests to cause unbalanced load among the database nodes can be avoided. In addition, the scheme also avoids the situation of reallocating data when a database node is newly added. It is only necessary to increase the trend at the monitoring server so that the data is more likely to be distributed to the newly added nodes, and this trend ends up when the number of data entries in the new database node is close to the number of other database entries.

3) Die Hash sequential distribution mode

Fig. 6 is a schematic diagram of the modulo Hash sequential allocation manner. After the newly added data ID is hashed, the remainder obtained by modulo m of the obtained Hash value plus (i +1) is the database location where the data needs to be stored, that is, the node needs to be stored in Hash (ID) mod m + i +1, where i is 0, …, n-1. The data distribution mode has the advantages that the data distribution is very uniform, and the condition that the loads of the database are different due to different heat degrees of new and old data does not exist. In addition, the distribution mode has the advantages that each database node can broadcast the stored sub-table, and when local data is operated after uplink data, the data operation can be completed without a monitoring server. But this approach requires data migration and reallocation work when new data nodes are added. Therefore, the method is applied to the condition that the number of the database nodes is not required to be changed frequently.

4) Heat distribution mode

Fig. 7 is a schematic diagram of the heat distribution method. If a certain column of data in the database belongs to query data of a hot spot, the data can be separately tabulated together with id to relieve the storage and reading pressure of the data.

Two, block chain consensus algorithm

The efficient, safe and rapid alliance chain consensus mechanism designed by the invention can ensure that the system can rapidly deal with database operation under large-scale and high-concurrency conditions; an effective conflict transaction preprocessing mechanism is adopted, conflict transaction sets are found and processed in time, and the final consistency of the database can be ensured on the basis of improving the transaction processing efficiency; finally, the functional components only responsible for sorting are independently decoupled out to serve as a pluggable application component, and the mature zookeeper and kafka technologies are adopted to process the transaction sorting problem of the system, so that the complex production environment can be better dealt with.

The basic flow chart of the consensus mechanism of the present invention is shown in fig. 8, and we will discuss the block chain consensus mechanism in more detail below:

the block chain network is formed by P₁，P₂，...，P_nService cluster composed of n nodes, each node P_iManaging one or more different types of databases (such as MySQL, SQLserver and the like), receiving a database operation request from a user, simulating and executing the request, sending the request and a simulation execution result to a sequencing server for sequencing, and executing a transaction and changing the state to a local data server after receiving a transaction block from the sequencing server. Each node P_iThe managed database can be selected autonomously according to the service module, and can also be distributed by the monitoring server according to a specific algorithm.

Firstly, before a transaction is formally initiated, a user needs to request a monitoring server for a storage address or node information of data to be operated; after receiving the user request, the monitoring server returns the identity ID, the server address and other related information of the stored data node to the user; in order to avoid the blocking of the monitoring server caused by frequent user requests, the user side can locally maintain a data service node list, the node information after the first data operation request is stored locally, and meanwhile, the submitting speed of the following transaction instruction is also ensured.

After receiving the node information of the target database provided by the monitoring server (or directly read from the local cache), the user end sets the management node of the target database as { P (for convenience of description)_i，P_j，…，P_k}) to send the transaction information to be initiated to { P) simultaneously_i，P_j，…，P_k}。

Block chain node { P_i，P_j，…，P_kAfter receiving a data operation request sent by a user, verifying the read-write permission of the user, after determining that the user has the permission of corresponding operation, pre-executing the transaction initiated by the user by combining the state information of a local database, and then sending the pre-executed result signature and the transaction to a system ordering service together.

The ordering service (also called an Orderer node) is comprised of a transaction processing component and an ordering component. In the ranking stage, the trusted ranking service receives transactions from the kafka server that have been processed by the pre-processing module. In all received transactions, it creates a global order and packages it into blocks containing a certain number of transactions. And in default, after the transaction passes the endorsement check and the preprocessing check, sequencing the transactions in the sequence of reaching the service. Then, the ordering service sends the blocks composed of a plurality of transactions to the block chain node, and synchronizes the block information to the monitoring server. The ranking service includes four links: transaction statistics/verification, conflict transaction preprocessing, transaction sequencing by a sequencing component, and packaging into blocks according to certain conditions.

In the transaction statistics/verification stage, when the ordering server receives the data from the P_i，P_j，…，P_kAfter pre-executing the book-backing of the result (if exceeding 2/3, specifically, it can be set according to the system policy), the transaction will be examined, and the contents include: whether most endorsements are consistent, whether the transaction signature is correct, etc. If the check fails, the endorser or the user maliciously tampers with the transaction, and the system marks the transaction as invalid at the moment; if the check is successful and most endorsements are consistent, the transaction is proved to be agreed at the endorsement stageAnd entering a transaction conflict preprocessing link. If the number of nodes in the network is less, the system can judge that the transaction check is successful when the endorsements are all consistent.

In a linked database system, not all transaction tasks can be executed and linked-a large number of transactions can be suspended due to serialization conflicts, which is also a negative impact under high concurrent execution. A common sequenced transaction strategy is to follow the order in which they arrive at the server, which, while quickly establishing transaction sequence, can result in sequence conflicts, thereby increasing the number of invalid transactions for which the user must resubmit and perform a new round of processing. Therefore, a conflict transaction preprocessing module is required to be added to convert the possibly interrupted transaction into a successful transaction, so that the effective throughput of the whole chained database system is improved. The conflict transaction preprocessing link adopts a transaction reordering algorithm, because the linked database system packs the incoming transactions into blocks and links the blocks, and then the blocks are transmitted into the database to execute the transaction, the transaction reordering in the early stage is used as a preprocessing screening method, which does not cause great influence on the outgoing blocks, and thus, the transaction reordering mechanism is adopted, and a large amount of expenses are not introduced.

The conflict transaction preprocessing module establishes a conflict graph for all pre-executed transaction results passing the legal check, confirms which transactions actually conflict with each other in the execution operation of the transactions, and establishes a directed graph. Then, a strong connected subgraph in the directed graph is identified by applying a Tarjan algorithm, namely all rings in the directed graph are found, and Johnson algorithm confirms all periods of the subgraph and marks the number of times each transaction appears in the directed graph. And deleting the transactions participating in the period from the transaction which appears in most periods and has the most appearance times for the sub-graph appearing in the repeated period iteratively until all the rings in the directed graph are released according to the directed graph, the strongly connected sub-graph and all periods of the strongly connected sub-graph and the appearance times of each transaction in the directed graph. And finally, constructing a reasonable transaction sequence according to the final directed acyclic graph result.

And (4) handing the transaction passing through the conflict transaction preprocessing link to a sequencing component, wherein the sequencing component is composed of a kafka server and a zookeeper cluster. The kafka server comprises a plurality of producers (producers), a plurality of service agent nodes (Broker) and a plurality of consumers (Consumer), and divides a message subject topic according to specific business requirements. And the sequencing service component sends the preprocessed transaction transactions to the kafka proxy node (namely, the service proxy node Broker), and performs the next packaging by taking the sequence of the transactions reaching the topic partition as a sequencing sequence. zookeeper provides efficient and easy-to-use cooperative service for kafka, manages all proxy node servers, records message topics divided by the kafka servers and corresponding proxy nodes, and records message consumption progress offset records. When certain transaction transactions are packed and sent to the blockchain network, zookeeper moves the offset pointer to the not yet used consumption record in time for the next time the kafka consumer pulls the information.

And finally, packaging the blocks according to certain conditions to obtain the blocks. The chained database system designs four transaction packaging strategies by considering a plurality of factors such as system throughput, system algorithm performance and the like, and packages all received transactions if one of the following four conditions is met: a certain number of transactions are reached, the block reaches a certain data size, a certain time has been reached since the first transaction of these transactions was received, and a certain number of different sets of variables accessed in all transactions received. The four transaction packing strategies can be selectively suitable for various application scenes, flexibly cope with the possible network conditions, ensure the packing efficiency of the chained database and effectively avoid the information congestion of the sequencing module. And the packed blocks are sent to a database management node again in an asynchronous or synchronous communication mode, and the next data uplink operation is started. The sequencing service component generally defaults to synchronous communication when sending; the number of transaction requests in the current message queue can be detected, the asynchronous communication mode can be flexibly selected, the processing time of the system is saved, and meanwhile the throughput and the performance of the system in unit time are guaranteed.

Third, private data storage and access control scheme

The privacy data storage and access control scheme provided by the invention mainly aims to solve the following problems of the chained database system designed by the invention:

on the premise that each participant of the blockchain is not endowed with more permissions and functions, the existing structure of the blockchain is utilized, so that auditing and authorization of data visitors and updating of access permissions and policies can be realized while data transaction is issued;

access authorization control of the user. After the private data are published in the block chain, a data visitor submits an access request, so that the successful access to the private data is realized through auditing and authorization;

confirm successful access of the visitor to the data. The opening and decentralized of the blockchain cannot ensure the access of the data and the audit of the visitor, thereby causing the owner of the private data to be unable to confirm whether the owner of the private data really accesses the data although the owner can authorize the visitor;

data encryption is stored in a block chain, the permissions given to data accessors in different stages need to be different, the forward security and the backward security of the combined threat of the access permissions in different stages are avoided, therefore, the access permissions are updated regularly, or information is released again to realize the updating of an access control strategy, the access permissions in different stages are ensured to be different, and the isolation of the permissions of subsequent accessors from the permissions of previous accessors is realized. Under the assumption of distributed management of access rights and that an adversary controls at most part of the manager, it is ensured that the visitor gets the right rights.

For convenience of discussion, the system model of the present solution abstracts the entire data storage and access into three parts:

data owner as Alice. Alice can be a user or a set of multiple users, and her private data is encrypted and then stored in a block chain;

data visitor is denoted as Bob. Bob is a visitor to Alice's private data, and may be an individual or a group;

set of nodes P ═ { P ═ P_i: 1.. n.. P management of blockchains (running consensus algorithms, record transactions, etc.)。

Each node P_iA local database is maintained. Assuming that each node has a public and private key pair, the data sent to the node by the user is encrypted by using the public key to ensure the security (a secure channel, if necessary, an authentication channel can be adopted in the actual application).

Fig. 9 is an overall frame diagram of the present embodiment. The basic flow is as follows: the data owner Alice stores the original data (encrypted) using a blockchain. Optionally, Alice may delegate access control authorization to an AC (i.e., an access control node), or may control authorization by itself; the data user Bob sends out an access request on the chain; the AC replaces Alice or Alice self-authorizes and allows Bob to access data.

The private data storage and access control scheme can be divided into three stages, namely a storage stage, an access stage and an updating stage. Hereinafter, H is a public hash algorithm such as SHA256, and E ═ E (Enc, Dec) is a public encryption scheme. The PSS is an actively updated threshold secret sharing scheme, with the threshold value being consistent with the consensus threshold used for the blockchain (e.g., taking the number of nodes at 2/3). PSS data is marked with D in plain text and [ D ] in cipher text. The stages are detailed below:

a storage stage: and Alice owns the private data D to be stored in the block chain. Alice operates as follows: randomly selecting an encryption key ek and a decryption key dk, and encrypting D to obtain [ D []Enc (ek, D), sharing dk into multiple segments dk with PSS_iUsing node P_iThe public key encrypts the segment to obtain a segment ciphertext [ dk ]_i]Alice adds an access policy (e.g., a signature issued by Alice) and only visitors who satisfy the policy can obtain the data. Finally, H ═ H (D, [ D ] is calculated]Policy), create transaction T_W＝<h，policy>And accompanied by its signature σ_W. Alice sends to the set of nodes P as follows:

(T_W，σ_W)，[D]，{[dk_i]：i＝1，…，n}

each node P_iVerify transaction signature, decrypt [ dk ]_i]And validating fragment [ dk ]_i]If both are verified to be correct (using PSS)Node P_iStore [ dk_i]And participate in consensus. Once transaction T_WRecorded on the chain, then there must be more than a threshold number of nodes holding the correct fragment. Since the data itself may be large, we do not suggest direct storage [ D ]]To the block chain. Storage may be by a collection of nodes (all or a portion of the nodes).

And an access phase: bob retrieves the blockchain to obtain data, provides attribute attr to request access from set P. He creates a transaction T_R＝<attr，pk_B>And sends its signature to the set P. Running a consensus algorithm by the set P, verifying whether attr meets policy of Alice, and if not, ignoring the request; if so, recording transaction T_ROn the chain, Bob is authorized to access. Each node P_iSegment dk of self_i(by using pk_BAfter encryption) and [ D]And then sent to Bob.

And (3) an updating stage: the set P runs the update algorithm of the PSS periodically to update all the shared segments (global state update). Or, to reduce the computational overhead of the set P, a new segment (but corresponding to the same key dk) is sent to the set P after PSS is run by Alice, each P_iThe original segment is deleted and the new segment is stored.

The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims

1. A database management method based on a block chain is suitable for the block chain consisting of a group of monitoring server clusters, a group of sequencing server clusters, n database nodes and a plurality of user terminals, and comprises the following steps:

2. The method of claim 1, wherein the private data is stored to a database by:

1) the data owner generates an encryption key ek and a decryption key dk;

3) sharing decryption keys dk into segments dk through an actively updated threshold secret sharing scheme_iAnd use node P_iIs to fragment dk_iEncrypting to obtain segment ciphertext [ dk_i]Wherein i is more than or equal to 1 and less than or equal to n;

6) Database node P in set P_iVerifying signature sigma_WDecrypting and verifying fragmentsCiphertext [ dk_i]To trade T_WStore to block chain, ciphertext [ D ]]Storing to partial or all database nodes P in the set P_i。

3. The method of claim 2, wherein the private data in the database is accessed by:

4. The method of claim 2, wherein the set P periodically runs an update algorithm of an unsolicited threshold secret sharing scheme to update all segments for global state updates.

5. The method of claim 1, wherein the user terminal obtains the address of the database server where the data is located by the following policies:

b) According to data node information H⁰To k database nodes

Requesting the data;

c) if k database nodes

All can return the data correctly, then k database nodes are connected

From a database node that can return said data correctly

The data is obtained, and k database nodes are connected

As the database server address where the data is located;

b) According to data node information H¹To k database nodes

Requesting the data;

c) if the database node

All can return the data correctly and connect k database nodes

As the database server address where the data is located;

b) According to data node information H²To k database nodes

Requesting the data;

c) if k database nodes

From a database node that can return said data correctly

The data is obtained, and k database nodes are connected

As the database server address where the data is located;

Requesting the data;

if k database nodes

All can return the data correctly, the database node is connected

As the database server address where the data is located;

if k database nodes

From a database node that can return said data correctly

The data is obtained, and k database nodes are connected

As the database server address where the data is located.

6. The method of claim 1, wherein the ranking server comprises: a conflict transaction preprocessing module, a kafka server and a zookeeper cluster; the kafka server comprises: a plurality of producers, a plurality of service broker nodes and a plurality of consumers; generating a block containing Q transactions by:

b) recording message consumption progress offset records;

7. The method of claim 6, wherein the pre-processed transaction is obtained by:

8. The method of claim 1, wherein the monitoring server provides a user message subscription publication service to the user side, the user message comprising: updating the current block height, the data node index and the address; the monitoring server provides a node message subscription and publishing service to the database node, wherein the node message comprises: current block height, node operating state and data storage information.

9. The method of claim 1, wherein for the newly added data, performing database data base sorting by the following strategies:

2) the ID distribution method comprises the following steps:

3) The die Hash is distributed in sequence, and the steps comprise:

4) the heat distribution mode comprises the following steps:

a) searching whether the newly added data contains hotspot query data;

c) an index is established with the real address.

10. A blockchain-based database management system, comprising:

a monitoring server for connectingSplitting data in a database into m sub-tables, performing k backups on each sub-table, and storing each backup in each database node P_iPerforming the following steps; wherein k is more than or equal to 2 and less than or equal to m and k, and i is more than or equal to 1 and less than or equal to n; returning the address of the database server where the data is according to the request of the user side; receiving block output information containing Q transaction blocks;