CN111585751A - Data sharing method based on block chain - Google Patents

Data sharing method based on block chain Download PDF

Info

Publication number
CN111585751A
CN111585751A CN202010284363.8A CN202010284363A CN111585751A CN 111585751 A CN111585751 A CN 111585751A CN 202010284363 A CN202010284363 A CN 202010284363A CN 111585751 A CN111585751 A CN 111585751A
Authority
CN
China
Prior art keywords
data
domain
index
data set
transaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010284363.8A
Other languages
Chinese (zh)
Inventor
郭兵
沈艳
董详千
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202010284363.8A priority Critical patent/CN111585751A/en
Publication of CN111585751A publication Critical patent/CN111585751A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0816Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
    • H04L9/085Secret sharing or secret splitting, e.g. threshold schemes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • H04L9/0631Substitution permutation network [SPN], i.e. cipher composed of a number of stages or rounds each involving linear and nonlinear transformations, e.g. AES algorithms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data sharing method based on a block chain. The method includes the problems of the same blockchain and different blockchain switching. Finally, prototype realization is carried out to prove the feasibility of the prototype realization, and the distributed storage of the research and application foundation is laid for the further development of the value internet to store data to each node of the network in a scattered way. The distributed storage is beneficial to improving the safety and the redundancy of the system, and if data of a certain node is modified, deleted or forged, errors can be found and the data can be recovered through data of other nodes. Experimental analysis results show that the method provides a sharing mechanism which is feasible and efficient.

Description

Data sharing method based on block chain
Technical Field
The invention relates to the technical field of block chain data sharing, in particular to a research on a data sharing method based on a block chain.
Background
Data sharing is a prerequisite for data assets to mine their potential value. Traditional data management approaches, such as the data market, follow the following management patterns: and the data provider uploads the data to a data market, and the data demander downloads the data and analyzes the data to obtain the data value. This mode has the following significant disadvantages: firstly, the data searching mode is single, only keyword retrieval or data browsing is provided, and useful data cannot be efficiently obtained; secondly, the data owner loses the control right of the data, and the data ownership and the data security cannot be guaranteed; finally, data transaction lacks transparency, and fraud behaviors such as collusion of transaction participants cannot be effectively detected.
The method establishes a brand new data management mode by means of a block chain technology, and has the core idea that original data does not depart from the control range of a data provider, data analysis is completed by the data provider, and only analysis result data is sent. To this end, the present method discusses the following problems: 1) for effectively discovering data, a data provider index establishing mechanism is discussed, wherein the data provider index establishing mechanism comprises a metadata extraction mechanism and a domain index establishing mechanism; 2) the data transaction based on the block chain is analyzed, and the transaction record format and the consensus mechanism are included, so that transparent transaction, collusion prevention and cheating prevention of data and the like are realized; 3) and according to the calculation requirements of the data demanders, the safety calculation of privacy protection is realized among the data suppliers in an intelligent contract mode. Experimental analysis results show that the method provides a sharing mechanism which is feasible and safe.
Disclosure of Invention
The invention aims to provide a data sharing method research based on a block chain.
The technical scheme adopted by the invention for solving the technical problem is as follows:
a data sharing method based on a block chain comprises a data set index establishing step, a data set retrieving step, a data requirement contract compiling step, a data trading step and a data safety calculating step, and specifically comprises the following steps:
1) establishing a data set index: according to a domain index establishing mechanism, a data set is segmented according to domains, a domain index is formed on the basis of confirming, limiting or expanding the domains and domain values, and the index is optimized and stored;
and (3) metadata extraction: extracting domains (metadata or attributes) in the data set according to a specific rule, determining domain values and limiting or expanding the range of the domain values according to the domain sizes;
the method judges whether the data sets are the same or not based on Jaccard similarity, and the calculation formula is that if X and Y are metadata items of the data sets X and Y respectively, the Jaccard similarity is as follows:
Figure BDA0002445174680000021
logically dividing the domains into a plurality of groups, and storing the domain indexes in nodes adjacent to the hash values of the domains according to the adjacent relation of the LSH values of the domains and the hash values of all nodes;
2) and (3) data set retrieval step: forming a query domain according to the query requirement of a data demander, and retrieving a required data set from the domain index in the step 1);
by utilizing a domain index retrieval mechanism, a domain similarity search technology is adopted to query the coincidence degree of the terms and the indexes to obtain a retrieval data set;
Figure BDA0002445174680000022
if Q is the query domain and I is the index domain, the domain similarity can be represented by formula (2);
3) a data requirement contract writing step: compiling a data ordering contract according to the characteristics of the data set obtained in the step 2) and the requirements of the data demand side;
4) data transaction and evaluation mechanism: paying a certain fee to a system by a data demand party according to the value strategy and the balance price demand of the provider of the data set obtained in the step 2) to compensate the data provider and related participants; transaction refers to the processing logic of data, and the consensus mechanism of the method adopts an improved authorized Byzantine fault-tolerant algorithm (dBFT); setting the number of nodes in the network as N, numbering each participated node from 0 to N-1 in sequence, and arranging the participated nodes in a descending order according to the reliability trust, and taking N nodes as consensus nodes; the height of the current consensus block is h, and the transaction number is v; the positive evaluation number and the negative evaluation number p, f of the two transaction parties can be calculated by the formulas (3) and (4):
p=(h-v)modn (3)
Figure BDA0002445174680000023
Figure BDA0002445174680000024
calculating the credit degree of the node n in the ith transaction according to the formula (5) by the positive evaluation number and the negative evaluation number;
5) data security calculation and privacy protection mechanisms: on the basis of completing the system payment in the step 4), the system node completes the safe calculation of the data and obtains the output meeting the privacy requirement;
data connection and sharing: records with the same key in different data sets are merged together and shared secretly among parties participating in multi-party computing, and pi is setSFor a pseudo-random permutation cluster, where the secret key s uniquely determines a particular permutation, the algorithm is as follows:
inputting: secret shared data table Ti, kiRepresenting the primary key column in the table
And (3) outputting: equivalent connection table T shared by secret of each input table*
a. Randomly disorganizing each database table Ti by each calculation party, and using Ti*The data table after disorder is a main key column after disorder
b. Selecting a random permutation function pi using a secret shared key s between partiess
c. Each party in turn using a permutation function pisEvaluating query primary key columns
Figure BDA0002445174680000031
And will take value
Figure BDA0002445174680000032
Sequentially transmitting to a subsequent calculator; each calculator is connected with the result sent by the previous calculator in sequence, and finally, a result table T is generated*
Sorting: according to the generated result table T*Determining the order of the vectors; the essence is random ordering, setting n computation party shared secret vector x1,x2,...,xnWithout loss of generality, use]Representing the secret sharing vector, the sharing vector is represented as [ x ]1],[x2],……,[xn]。
Drawings
FIG. 1 dataset retrieval accuracy
FIG. 2 data set retrieval recall
FIG. 3 statistics of internet access volume in one day
FIG. 4 data comparison of time consumption
Detailed Description
In order to evaluate the system performance, a prototype system was constructed. The performance of the core module of the experiment display system mainly comprises: index performance, transaction blockchain network performance, and security computation performance.
1) Description of the Experimental Environment
The experiment was performed on 5 ubuntu cloud servers, each configured as follows: the processor adopts Intel (R) Xeon (R)2.0G and double CPUs; the memory is 2 GB; the operation uses Ubuntu 14.04.5 LTS server with kernel version number 4.4.0-31-genetic. The 5 servers are named node1, node2, … and node5 respectively, wherein node4 and node5 store original data sets, which are data providers and participants of secure computing. Any one of the nodes 1-3 can be used as a data demand side. All nodes are candidate nodes that are common to the block.
The software implementation aspect depends on an open source software architecture, wherein an index part is constructed according to minHash LSH Ensemble; the block chain adopts ethereum technology, and modifies the block structure and the consensus mechanism; the obliv-c technology and programming language are used for reference in multi-party computing.
2) Description of the Experimental samples
The method uses python to generate two main test data sets (education data set and tax data set) and some interference data sets (for measuring index and retrieval performance). The attributes of the educational data set include: personal ID, course ID, gender, year and month of birth, level of school calendar (scholar, master, doctor, higher vocational education), length of course prescription, school attended, date of admission, learning status (in progress, back school, graduation), graduation date/back school date. Attributes of the tax data set include personal ID, year, month, social security fee paid, dividend income, whether the employer is from ICT (or ITL).
3) Index performance
Recall and precision are two important indexes of a search engine, and the other important index, namely the F value, is dependent on the recall and precision and is also an important index for measuring the system. The results of comparing the accuracy and recall rate of data set retrieval according to different similarity threshold values t and a local sensitive hash integration (LSH ensemble) algorithm are shown in fig. 1 and 2. As can be seen from fig. 1 and 2, the algorithm of the present disclosure adds prefix information to the index, so that the accuracy and recall rate are improved to some extent.
4) Block chaining network performance
The block-out speed is a hard limit for most blockchain networks, e.g. the block-out speed for a bitcoin is about one block every 10 minutes. Under the condition that the block size (such as a bitcoin network is 1MB) is constant, the block output speed determines the number of transactions which can be processed in unit time, and obviously, the block output speed is an important factor influencing the real-time performance of the system and the network growth rate. In the dFT consensus algorithm, the time interval, the number of upper bound blocks, and the number of network nodes are all related to the size of the block.
Because the transaction times in different periods have randomness, the transaction amount of the system is simulated by borrowing the internet access probability, and fig. 3 is a statistical chart obtained according to the access periods according to certain internet. As can be seen from fig. 3, there is often an irrational factor simply regarding the transaction time interval as a consensus. Therefore, a limit on the number of transaction blocks is added to the consensus algorithm to increase the peak transaction frequency.
5) Secure computing performance
In order to ensure the safety of data, the performance of the connection query mode with high efficiency is replaced by sub-queries, and the connection query mode is verified through experiments, as shown in fig. 4. As can be seen from the figure, in order to improve the safety of the data, the algorithm provided by the method increases the running time to a certain extent.

Claims (5)

1. A data sharing method based on a block chain is characterized by comprising the steps of data set index establishment, data set retrieval, data requirement contract compiling, data transaction and data security calculation, and specifically comprises the following steps:
s1: establishing a data set index: according to a domain index establishing mechanism, a data set is segmented according to domains, a domain index is formed on the basis of confirming, limiting or expanding the domains and domain values, and the index is optimized and stored;
s2: and (3) data set retrieval step: forming a query domain according to the query requirement of the data demander, and retrieving a required data set from the domain index of the step S1;
s3: a data requirement contract writing step: compiling a data ordering contract according to the characteristics of the data set obtained in the step S2 and the requirements of the data demander;
s4: data transaction and evaluation mechanism: paying a certain fee to the system to compensate the data provider and the relevant participants according to the value strategy and the balance price demand of the provider of the data set obtained in the step S2;
s5: data security calculation and privacy protection mechanisms: on the basis of completing the system payment S4, the system node completes the secure computation of the data and obtains an output meeting the privacy requirements.
2. The method according to claim 1, wherein the step S1 domain index establishing mechanism comprises:
s11: and (3) metadata extraction: extracting domains (metadata or attributes) in the data set according to a specific rule, determining domain values and limiting or expanding the range of the domain values according to the domain sizes;
the method judges whether the data sets are the same or not based on Jaccard similarity, and the calculation formula is that if X and Y are metadata items of the data sets X and Y respectively, the Jaccard similarity is as follows:
Figure FDA0002445174670000011
s12: the domains are logically divided into a plurality of groups, and domain indexes are stored in nodes adjacent to the hash values of the domains according to the adjacent relation of the LSH values of the domains and the hash values of all nodes.
3. The method according to claim 1, wherein the step S2 is a step of retrieving the data set, in which a domain index retrieval mechanism is used, and a domain similarity search technique is used to query the degree of overlap between the term and the index to obtain a retrieved data set;
Figure FDA0002445174670000012
assuming that Q is the query domain and I is the index domain, the domain similarity can be represented by t (Q, I).
4. The method according to claim 1, wherein the step S4 is a data transaction and evaluation mechanism, the transaction refers to processing logic of data, and the consensus mechanism of the method employs a modified authorized byzantine fault-tolerant algorithm (dBFT); setting the number of nodes in the network as N, numbering each participated node from 0 to N-1 in sequence, and arranging the participated nodes in a descending order according to the reliability trust, and taking N nodes as consensus nodes; the height of the current consensus block is h, and the transaction number is v; the positive evaluation number and the negative evaluation number p, f of the two transaction parties can be calculated by the formulas (3) and (4):
p=(h-v)mod n (3)
Figure FDA0002445174670000021
Figure FDA0002445174670000022
and (5) calculating the credit degree of the node n in the ith transaction according to the formula (5) by the positive evaluation number and the negative evaluation number.
5. The blockchain-based data sharing method according to claim 1, wherein the step S5 includes a data security calculation and privacy protection mechanism,
s51: data connection and sharing: records with the same key in different data sets are merged together and shared secretly among parties participating in multi-party computing, and pi is setSFor a pseudo-random permutation cluster, where the secret key s uniquely determines a particular permutation, the algorithm is as follows:
inputting: secret shared data table Ti, kiRepresenting the primary key column in the table
And (3) outputting: equivalent connection table T shared by secret of each input table*
a. Randomly disorganizing each database table Ti by each calculation party, and using Ti*The data table after disorder is a main key column after disorder
b. Selecting a random permutation function pi using a secret shared key s between partiess
c. Each party in turn using a permutation function pisEvaluating query primary key columns
Figure FDA0002445174670000023
And will take value
Figure FDA0002445174670000024
Sequentially transmitting to a subsequent calculator; each calculator is connected with the result sent by the previous calculator in sequence, and finally, a result table T is generated*
S52: sorting: according to the generated result table T*Determining the order of the vectors; the essence is random ordering, setting n computation party shared secret vector x1,x2,...,xnWithout loss of generality, use]Representing the secret sharing vector, the sharing vector is represented as [ x ]1],[x2],......,[xn]。
CN202010284363.8A 2020-04-10 2020-04-10 Data sharing method based on block chain Pending CN111585751A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010284363.8A CN111585751A (en) 2020-04-10 2020-04-10 Data sharing method based on block chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010284363.8A CN111585751A (en) 2020-04-10 2020-04-10 Data sharing method based on block chain

Publications (1)

Publication Number Publication Date
CN111585751A true CN111585751A (en) 2020-08-25

Family

ID=72111563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010284363.8A Pending CN111585751A (en) 2020-04-10 2020-04-10 Data sharing method based on block chain

Country Status (1)

Country Link
CN (1) CN111585751A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184206A (en) * 2020-09-30 2021-01-05 杭州复杂美科技有限公司 Data acquisition method, device and storage medium
CN112417480A (en) * 2020-11-25 2021-02-26 中国传媒大学 Data storage system and method based on block chain
CN113961545A (en) * 2021-10-26 2022-01-21 北京市科学技术情报研究所 Block chain-based information value database construction method
CN114679466A (en) * 2021-06-04 2022-06-28 腾讯云计算(北京)有限责任公司 Consensus processing method, device, computer equipment and medium for block chain network
CN116860707A (en) * 2023-06-13 2023-10-10 北京科技大学 Material genetic engineering big data safe sharing method and system based on block chain

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107659429A (en) * 2017-08-11 2018-02-02 四川大学 Data sharing method based on block chain
CN108519981A (en) * 2018-02-01 2018-09-11 四川大学 A kind of decentralization data sharing method of highly effective and safe
CN108848081A (en) * 2018-06-01 2018-11-20 深圳崀途科技有限公司 The data sharing method of verification and integral incentive mechanism is stored based on alliance's chain
CN109873866A (en) * 2019-02-20 2019-06-11 北京邮电大学 A kind of data sharing method based on block chain, device and electronic equipment
US20190312719A1 (en) * 2018-04-06 2019-10-10 Crypto Lab Inc. User device and electronic device for sharing data based on block chain and homomorphic encryption technology and methods thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107659429A (en) * 2017-08-11 2018-02-02 四川大学 Data sharing method based on block chain
CN108519981A (en) * 2018-02-01 2018-09-11 四川大学 A kind of decentralization data sharing method of highly effective and safe
US20190312719A1 (en) * 2018-04-06 2019-10-10 Crypto Lab Inc. User device and electronic device for sharing data based on block chain and homomorphic encryption technology and methods thereof
CN108848081A (en) * 2018-06-01 2018-11-20 深圳崀途科技有限公司 The data sharing method of verification and integral incentive mechanism is stored based on alliance's chain
CN109873866A (en) * 2019-02-20 2019-06-11 北京邮电大学 A kind of data sharing method based on block chain, device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董祥千等: "《一种高效安全的去中心化数据共享模型》", 《计算机学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184206A (en) * 2020-09-30 2021-01-05 杭州复杂美科技有限公司 Data acquisition method, device and storage medium
CN112417480A (en) * 2020-11-25 2021-02-26 中国传媒大学 Data storage system and method based on block chain
CN112417480B (en) * 2020-11-25 2024-03-19 中国传媒大学 Data storage system and method based on block chain
CN114679466A (en) * 2021-06-04 2022-06-28 腾讯云计算(北京)有限责任公司 Consensus processing method, device, computer equipment and medium for block chain network
CN114679466B (en) * 2021-06-04 2023-02-10 腾讯云计算(北京)有限责任公司 Consensus processing method, device, computer equipment and medium for block chain network
CN113961545A (en) * 2021-10-26 2022-01-21 北京市科学技术情报研究所 Block chain-based information value database construction method
CN113961545B (en) * 2021-10-26 2022-04-26 北京市科学技术情报研究所 Block chain-based information value database construction method
CN116860707A (en) * 2023-06-13 2023-10-10 北京科技大学 Material genetic engineering big data safe sharing method and system based on block chain
CN116860707B (en) * 2023-06-13 2024-02-13 北京科技大学 Material genetic engineering big data safe sharing method and system based on block chain

Similar Documents

Publication Publication Date Title
CN111585751A (en) Data sharing method based on block chain
CN107659429A (en) Data sharing method based on block chain
CN108519981B (en) Cross-chain intelligent contract cooperation possibility evaluation method
US11238339B2 (en) Predictive neural network with sentiment data
Aher et al. Applicability of data mining algorithms for recommendation system in e-learning
Spurr et al. Challenging practical features of Bitcoin by the main altcoins
Yang et al. Optimising column family for OLAP queries in HBase
WO2022018574A1 (en) System and method for assessment of crypto and digital assests
Asonov Querying databases privately: a new approach to private information retrieval
Barabesi et al. Forum on Benford’s law and statistical methods for the detection of frauds
Navdeep et al. Role of big data analytics in analyzing e-Governance projects
Jensen et al. A synthetic data set to benchmark anti-money laundering methods
EP4006795A1 (en) Collaborative big data analysis framework using load balancing
Karger et al. Blockchain for AI Data-State of the Art and Open Research.
Botos Bitcoin Intelligence–Business Intelligence meets Crypto Currency
Yu et al. Popularity prediction for artists based on user songs dataset
CN115358753A (en) Cross-platform credit index association model generation method and system
Voitovych et al. Detection of Fake Accounts in Social Media
Wai et al. An indexing approach of historical states on hyperledger fabric
Huang et al. An empirical case study of internet usage on student performance based on fuzzy association rules
Simcharoen Cooperatively Managing and Exploiting Distributed Co-occurrence Graphs
Coulter The Impact of News Media on Cryptocurrency Prices: Modelling Data Driven Discourses in the Crypto-Economy
Indira et al. Parallel clarans algorithm for recommendation system in multi-cloud environment
Siedlecka-Lamch Secure Medical Data Storage with Blockchain Technology
Mahboubeh et al. Predicting changes in Bitcoin price using grey system theory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200825