CN111585751A - Data sharing method based on block chain - Google Patents
Data sharing method based on block chain Download PDFInfo
- Publication number
- CN111585751A CN111585751A CN202010284363.8A CN202010284363A CN111585751A CN 111585751 A CN111585751 A CN 111585751A CN 202010284363 A CN202010284363 A CN 202010284363A CN 111585751 A CN111585751 A CN 111585751A
- Authority
- CN
- China
- Prior art keywords
- data
- domain
- index
- data set
- transaction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0816—Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
- H04L9/085—Secret sharing or secret splitting, e.g. threshold schemes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/0618—Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
- H04L9/0631—Substitution permutation network [SPN], i.e. cipher composed of a number of stages or rounds each involving linear and nonlinear transformations, e.g. AES algorithms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/50—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Security & Cryptography (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data sharing method based on a block chain. The method includes the problems of the same blockchain and different blockchain switching. Finally, prototype realization is carried out to prove the feasibility of the prototype realization, and the distributed storage of the research and application foundation is laid for the further development of the value internet to store data to each node of the network in a scattered way. The distributed storage is beneficial to improving the safety and the redundancy of the system, and if data of a certain node is modified, deleted or forged, errors can be found and the data can be recovered through data of other nodes. Experimental analysis results show that the method provides a sharing mechanism which is feasible and efficient.
Description
Technical Field
The invention relates to the technical field of block chain data sharing, in particular to a research on a data sharing method based on a block chain.
Background
Data sharing is a prerequisite for data assets to mine their potential value. Traditional data management approaches, such as the data market, follow the following management patterns: and the data provider uploads the data to a data market, and the data demander downloads the data and analyzes the data to obtain the data value. This mode has the following significant disadvantages: firstly, the data searching mode is single, only keyword retrieval or data browsing is provided, and useful data cannot be efficiently obtained; secondly, the data owner loses the control right of the data, and the data ownership and the data security cannot be guaranteed; finally, data transaction lacks transparency, and fraud behaviors such as collusion of transaction participants cannot be effectively detected.
The method establishes a brand new data management mode by means of a block chain technology, and has the core idea that original data does not depart from the control range of a data provider, data analysis is completed by the data provider, and only analysis result data is sent. To this end, the present method discusses the following problems: 1) for effectively discovering data, a data provider index establishing mechanism is discussed, wherein the data provider index establishing mechanism comprises a metadata extraction mechanism and a domain index establishing mechanism; 2) the data transaction based on the block chain is analyzed, and the transaction record format and the consensus mechanism are included, so that transparent transaction, collusion prevention and cheating prevention of data and the like are realized; 3) and according to the calculation requirements of the data demanders, the safety calculation of privacy protection is realized among the data suppliers in an intelligent contract mode. Experimental analysis results show that the method provides a sharing mechanism which is feasible and safe.
Disclosure of Invention
The invention aims to provide a data sharing method research based on a block chain.
The technical scheme adopted by the invention for solving the technical problem is as follows:
a data sharing method based on a block chain comprises a data set index establishing step, a data set retrieving step, a data requirement contract compiling step, a data trading step and a data safety calculating step, and specifically comprises the following steps:
1) establishing a data set index: according to a domain index establishing mechanism, a data set is segmented according to domains, a domain index is formed on the basis of confirming, limiting or expanding the domains and domain values, and the index is optimized and stored;
and (3) metadata extraction: extracting domains (metadata or attributes) in the data set according to a specific rule, determining domain values and limiting or expanding the range of the domain values according to the domain sizes;
the method judges whether the data sets are the same or not based on Jaccard similarity, and the calculation formula is that if X and Y are metadata items of the data sets X and Y respectively, the Jaccard similarity is as follows:
logically dividing the domains into a plurality of groups, and storing the domain indexes in nodes adjacent to the hash values of the domains according to the adjacent relation of the LSH values of the domains and the hash values of all nodes;
2) and (3) data set retrieval step: forming a query domain according to the query requirement of a data demander, and retrieving a required data set from the domain index in the step 1);
by utilizing a domain index retrieval mechanism, a domain similarity search technology is adopted to query the coincidence degree of the terms and the indexes to obtain a retrieval data set;
if Q is the query domain and I is the index domain, the domain similarity can be represented by formula (2);
3) a data requirement contract writing step: compiling a data ordering contract according to the characteristics of the data set obtained in the step 2) and the requirements of the data demand side;
4) data transaction and evaluation mechanism: paying a certain fee to a system by a data demand party according to the value strategy and the balance price demand of the provider of the data set obtained in the step 2) to compensate the data provider and related participants; transaction refers to the processing logic of data, and the consensus mechanism of the method adopts an improved authorized Byzantine fault-tolerant algorithm (dBFT); setting the number of nodes in the network as N, numbering each participated node from 0 to N-1 in sequence, and arranging the participated nodes in a descending order according to the reliability trust, and taking N nodes as consensus nodes; the height of the current consensus block is h, and the transaction number is v; the positive evaluation number and the negative evaluation number p, f of the two transaction parties can be calculated by the formulas (3) and (4):
p=(h-v)modn (3)
calculating the credit degree of the node n in the ith transaction according to the formula (5) by the positive evaluation number and the negative evaluation number;
5) data security calculation and privacy protection mechanisms: on the basis of completing the system payment in the step 4), the system node completes the safe calculation of the data and obtains the output meeting the privacy requirement;
data connection and sharing: records with the same key in different data sets are merged together and shared secretly among parties participating in multi-party computing, and pi is setSFor a pseudo-random permutation cluster, where the secret key s uniquely determines a particular permutation, the algorithm is as follows:
inputting: secret shared data table Ti, kiRepresenting the primary key column in the table
And (3) outputting: equivalent connection table T shared by secret of each input table*
a. Randomly disorganizing each database table Ti by each calculation party, and using Ti*The data table after disorder is a main key column after disorder
b. Selecting a random permutation function pi using a secret shared key s between partiess
c. Each party in turn using a permutation function pisEvaluating query primary key columnsAnd will take valueSequentially transmitting to a subsequent calculator; each calculator is connected with the result sent by the previous calculator in sequence, and finally, a result table T is generated*
Sorting: according to the generated result table T*Determining the order of the vectors; the essence is random ordering, setting n computation party shared secret vector x1,x2,...,xnWithout loss of generality, use]Representing the secret sharing vector, the sharing vector is represented as [ x ]1],[x2],……,[xn]。
Drawings
FIG. 1 dataset retrieval accuracy
FIG. 2 data set retrieval recall
FIG. 3 statistics of internet access volume in one day
FIG. 4 data comparison of time consumption
Detailed Description
In order to evaluate the system performance, a prototype system was constructed. The performance of the core module of the experiment display system mainly comprises: index performance, transaction blockchain network performance, and security computation performance.
1) Description of the Experimental Environment
The experiment was performed on 5 ubuntu cloud servers, each configured as follows: the processor adopts Intel (R) Xeon (R)2.0G and double CPUs; the memory is 2 GB; the operation uses Ubuntu 14.04.5 LTS server with kernel version number 4.4.0-31-genetic. The 5 servers are named node1, node2, … and node5 respectively, wherein node4 and node5 store original data sets, which are data providers and participants of secure computing. Any one of the nodes 1-3 can be used as a data demand side. All nodes are candidate nodes that are common to the block.
The software implementation aspect depends on an open source software architecture, wherein an index part is constructed according to minHash LSH Ensemble; the block chain adopts ethereum technology, and modifies the block structure and the consensus mechanism; the obliv-c technology and programming language are used for reference in multi-party computing.
2) Description of the Experimental samples
The method uses python to generate two main test data sets (education data set and tax data set) and some interference data sets (for measuring index and retrieval performance). The attributes of the educational data set include: personal ID, course ID, gender, year and month of birth, level of school calendar (scholar, master, doctor, higher vocational education), length of course prescription, school attended, date of admission, learning status (in progress, back school, graduation), graduation date/back school date. Attributes of the tax data set include personal ID, year, month, social security fee paid, dividend income, whether the employer is from ICT (or ITL).
3) Index performance
Recall and precision are two important indexes of a search engine, and the other important index, namely the F value, is dependent on the recall and precision and is also an important index for measuring the system. The results of comparing the accuracy and recall rate of data set retrieval according to different similarity threshold values t and a local sensitive hash integration (LSH ensemble) algorithm are shown in fig. 1 and 2. As can be seen from fig. 1 and 2, the algorithm of the present disclosure adds prefix information to the index, so that the accuracy and recall rate are improved to some extent.
4) Block chaining network performance
The block-out speed is a hard limit for most blockchain networks, e.g. the block-out speed for a bitcoin is about one block every 10 minutes. Under the condition that the block size (such as a bitcoin network is 1MB) is constant, the block output speed determines the number of transactions which can be processed in unit time, and obviously, the block output speed is an important factor influencing the real-time performance of the system and the network growth rate. In the dFT consensus algorithm, the time interval, the number of upper bound blocks, and the number of network nodes are all related to the size of the block.
Because the transaction times in different periods have randomness, the transaction amount of the system is simulated by borrowing the internet access probability, and fig. 3 is a statistical chart obtained according to the access periods according to certain internet. As can be seen from fig. 3, there is often an irrational factor simply regarding the transaction time interval as a consensus. Therefore, a limit on the number of transaction blocks is added to the consensus algorithm to increase the peak transaction frequency.
5) Secure computing performance
In order to ensure the safety of data, the performance of the connection query mode with high efficiency is replaced by sub-queries, and the connection query mode is verified through experiments, as shown in fig. 4. As can be seen from the figure, in order to improve the safety of the data, the algorithm provided by the method increases the running time to a certain extent.
Claims (5)
1. A data sharing method based on a block chain is characterized by comprising the steps of data set index establishment, data set retrieval, data requirement contract compiling, data transaction and data security calculation, and specifically comprises the following steps:
s1: establishing a data set index: according to a domain index establishing mechanism, a data set is segmented according to domains, a domain index is formed on the basis of confirming, limiting or expanding the domains and domain values, and the index is optimized and stored;
s2: and (3) data set retrieval step: forming a query domain according to the query requirement of the data demander, and retrieving a required data set from the domain index of the step S1;
s3: a data requirement contract writing step: compiling a data ordering contract according to the characteristics of the data set obtained in the step S2 and the requirements of the data demander;
s4: data transaction and evaluation mechanism: paying a certain fee to the system to compensate the data provider and the relevant participants according to the value strategy and the balance price demand of the provider of the data set obtained in the step S2;
s5: data security calculation and privacy protection mechanisms: on the basis of completing the system payment S4, the system node completes the secure computation of the data and obtains an output meeting the privacy requirements.
2. The method according to claim 1, wherein the step S1 domain index establishing mechanism comprises:
s11: and (3) metadata extraction: extracting domains (metadata or attributes) in the data set according to a specific rule, determining domain values and limiting or expanding the range of the domain values according to the domain sizes;
the method judges whether the data sets are the same or not based on Jaccard similarity, and the calculation formula is that if X and Y are metadata items of the data sets X and Y respectively, the Jaccard similarity is as follows:
s12: the domains are logically divided into a plurality of groups, and domain indexes are stored in nodes adjacent to the hash values of the domains according to the adjacent relation of the LSH values of the domains and the hash values of all nodes.
3. The method according to claim 1, wherein the step S2 is a step of retrieving the data set, in which a domain index retrieval mechanism is used, and a domain similarity search technique is used to query the degree of overlap between the term and the index to obtain a retrieved data set;
assuming that Q is the query domain and I is the index domain, the domain similarity can be represented by t (Q, I).
4. The method according to claim 1, wherein the step S4 is a data transaction and evaluation mechanism, the transaction refers to processing logic of data, and the consensus mechanism of the method employs a modified authorized byzantine fault-tolerant algorithm (dBFT); setting the number of nodes in the network as N, numbering each participated node from 0 to N-1 in sequence, and arranging the participated nodes in a descending order according to the reliability trust, and taking N nodes as consensus nodes; the height of the current consensus block is h, and the transaction number is v; the positive evaluation number and the negative evaluation number p, f of the two transaction parties can be calculated by the formulas (3) and (4):
p=(h-v)mod n (3)
and (5) calculating the credit degree of the node n in the ith transaction according to the formula (5) by the positive evaluation number and the negative evaluation number.
5. The blockchain-based data sharing method according to claim 1, wherein the step S5 includes a data security calculation and privacy protection mechanism,
s51: data connection and sharing: records with the same key in different data sets are merged together and shared secretly among parties participating in multi-party computing, and pi is setSFor a pseudo-random permutation cluster, where the secret key s uniquely determines a particular permutation, the algorithm is as follows:
inputting: secret shared data table Ti, kiRepresenting the primary key column in the table
And (3) outputting: equivalent connection table T shared by secret of each input table*
a. Randomly disorganizing each database table Ti by each calculation party, and using Ti*The data table after disorder is a main key column after disorder
b. Selecting a random permutation function pi using a secret shared key s between partiess
c. Each party in turn using a permutation function pisEvaluating query primary key columnsAnd will take valueSequentially transmitting to a subsequent calculator; each calculator is connected with the result sent by the previous calculator in sequence, and finally, a result table T is generated*
S52: sorting: according to the generated result table T*Determining the order of the vectors; the essence is random ordering, setting n computation party shared secret vector x1,x2,...,xnWithout loss of generality, use]Representing the secret sharing vector, the sharing vector is represented as [ x ]1],[x2],......,[xn]。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010284363.8A CN111585751A (en) | 2020-04-10 | 2020-04-10 | Data sharing method based on block chain |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010284363.8A CN111585751A (en) | 2020-04-10 | 2020-04-10 | Data sharing method based on block chain |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111585751A true CN111585751A (en) | 2020-08-25 |
Family
ID=72111563
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010284363.8A Pending CN111585751A (en) | 2020-04-10 | 2020-04-10 | Data sharing method based on block chain |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111585751A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112184206A (en) * | 2020-09-30 | 2021-01-05 | 杭州复杂美科技有限公司 | Data acquisition method, device and storage medium |
CN112417480A (en) * | 2020-11-25 | 2021-02-26 | 中国传媒大学 | Data storage system and method based on block chain |
CN113961545A (en) * | 2021-10-26 | 2022-01-21 | 北京市科学技术情报研究所 | Block chain-based information value database construction method |
CN114679466A (en) * | 2021-06-04 | 2022-06-28 | 腾讯云计算(北京)有限责任公司 | Consensus processing method, device, computer equipment and medium for block chain network |
CN116860707A (en) * | 2023-06-13 | 2023-10-10 | 北京科技大学 | Material genetic engineering big data safe sharing method and system based on block chain |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107659429A (en) * | 2017-08-11 | 2018-02-02 | 四川大学 | Data sharing method based on block chain |
CN108519981A (en) * | 2018-02-01 | 2018-09-11 | 四川大学 | A kind of decentralization data sharing method of highly effective and safe |
CN108848081A (en) * | 2018-06-01 | 2018-11-20 | 深圳崀途科技有限公司 | The data sharing method of verification and integral incentive mechanism is stored based on alliance's chain |
CN109873866A (en) * | 2019-02-20 | 2019-06-11 | 北京邮电大学 | A kind of data sharing method based on block chain, device and electronic equipment |
US20190312719A1 (en) * | 2018-04-06 | 2019-10-10 | Crypto Lab Inc. | User device and electronic device for sharing data based on block chain and homomorphic encryption technology and methods thereof |
-
2020
- 2020-04-10 CN CN202010284363.8A patent/CN111585751A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107659429A (en) * | 2017-08-11 | 2018-02-02 | 四川大学 | Data sharing method based on block chain |
CN108519981A (en) * | 2018-02-01 | 2018-09-11 | 四川大学 | A kind of decentralization data sharing method of highly effective and safe |
US20190312719A1 (en) * | 2018-04-06 | 2019-10-10 | Crypto Lab Inc. | User device and electronic device for sharing data based on block chain and homomorphic encryption technology and methods thereof |
CN108848081A (en) * | 2018-06-01 | 2018-11-20 | 深圳崀途科技有限公司 | The data sharing method of verification and integral incentive mechanism is stored based on alliance's chain |
CN109873866A (en) * | 2019-02-20 | 2019-06-11 | 北京邮电大学 | A kind of data sharing method based on block chain, device and electronic equipment |
Non-Patent Citations (1)
Title |
---|
董祥千等: "《一种高效安全的去中心化数据共享模型》", 《计算机学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112184206A (en) * | 2020-09-30 | 2021-01-05 | 杭州复杂美科技有限公司 | Data acquisition method, device and storage medium |
CN112417480A (en) * | 2020-11-25 | 2021-02-26 | 中国传媒大学 | Data storage system and method based on block chain |
CN112417480B (en) * | 2020-11-25 | 2024-03-19 | 中国传媒大学 | Data storage system and method based on block chain |
CN114679466A (en) * | 2021-06-04 | 2022-06-28 | 腾讯云计算(北京)有限责任公司 | Consensus processing method, device, computer equipment and medium for block chain network |
CN114679466B (en) * | 2021-06-04 | 2023-02-10 | 腾讯云计算(北京)有限责任公司 | Consensus processing method, device, computer equipment and medium for block chain network |
CN113961545A (en) * | 2021-10-26 | 2022-01-21 | 北京市科学技术情报研究所 | Block chain-based information value database construction method |
CN113961545B (en) * | 2021-10-26 | 2022-04-26 | 北京市科学技术情报研究所 | Block chain-based information value database construction method |
CN116860707A (en) * | 2023-06-13 | 2023-10-10 | 北京科技大学 | Material genetic engineering big data safe sharing method and system based on block chain |
CN116860707B (en) * | 2023-06-13 | 2024-02-13 | 北京科技大学 | Material genetic engineering big data safe sharing method and system based on block chain |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111585751A (en) | Data sharing method based on block chain | |
CN107659429A (en) | Data sharing method based on block chain | |
CN108519981B (en) | Cross-chain intelligent contract cooperation possibility evaluation method | |
US11238339B2 (en) | Predictive neural network with sentiment data | |
Aher et al. | Applicability of data mining algorithms for recommendation system in e-learning | |
Spurr et al. | Challenging practical features of Bitcoin by the main altcoins | |
Yang et al. | Optimising column family for OLAP queries in HBase | |
WO2022018574A1 (en) | System and method for assessment of crypto and digital assests | |
Asonov | Querying databases privately: a new approach to private information retrieval | |
Barabesi et al. | Forum on Benford’s law and statistical methods for the detection of frauds | |
Navdeep et al. | Role of big data analytics in analyzing e-Governance projects | |
Jensen et al. | A synthetic data set to benchmark anti-money laundering methods | |
EP4006795A1 (en) | Collaborative big data analysis framework using load balancing | |
Karger et al. | Blockchain for AI Data-State of the Art and Open Research. | |
Botos | Bitcoin Intelligence–Business Intelligence meets Crypto Currency | |
Yu et al. | Popularity prediction for artists based on user songs dataset | |
CN115358753A (en) | Cross-platform credit index association model generation method and system | |
Voitovych et al. | Detection of Fake Accounts in Social Media | |
Wai et al. | An indexing approach of historical states on hyperledger fabric | |
Huang et al. | An empirical case study of internet usage on student performance based on fuzzy association rules | |
Simcharoen | Cooperatively Managing and Exploiting Distributed Co-occurrence Graphs | |
Coulter | The Impact of News Media on Cryptocurrency Prices: Modelling Data Driven Discourses in the Crypto-Economy | |
Indira et al. | Parallel clarans algorithm for recommendation system in multi-cloud environment | |
Siedlecka-Lamch | Secure Medical Data Storage with Blockchain Technology | |
Mahboubeh et al. | Predicting changes in Bitcoin price using grey system theory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200825 |