CN113127463A

CN113127463A - Data deduplication and sharing auditing method for decentralized storage based on block chain

Info

Publication number: CN113127463A
Application number: CN202110275312.3A
Authority: CN
Inventors: 陈晓峰; 田国华; 姚雨松; 王连海
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2021-07-16
Anticipated expiration: 2041-03-15
Also published as: CN113127463B

Abstract

The invention belongs to the technical field of block chains and information security, and discloses a data deduplication and sharing auditing method for decentralized storage based on block chains, which is used for establishing a corresponding system; file-level copy detection, generating a file key and retrieving whether the file is stored in a decentralized storage system in a blockchain; initially uploading a file, preprocessing the external packet data, uploading the preprocessed external packet data to a storage server SSP, and carrying out data block level copy detection by the SSP; the file is subsequently uploaded to generate PoWs challenge and PoWs evidence, and the PoWs evidence is verified; data auditing, namely generating an auditing challenge and an auditing evidence and verifying the auditing evidence; updating the audit label, preprocessing the audit label, generating the audit label, and checking the audit label; and downloading and decrypting. The invention ensures the safety of the data under the copy forgery attack and single point failure, and saves a large amount of computing resources and storage resources by the technologies of data de-emphasis, lightweight label generation algorithm and the like.

Description

Data deduplication and sharing auditing method for decentralized storage based on block chain

Technical Field

The invention belongs to the technical field of block chains and information security, and particularly relates to a data deduplication and sharing auditing method for decentralized storage based on block chains.

Background

At present, the continuous development and popularization of the cloud computing technology provide wide and convenient data storage service for enterprises and personal users, effectively relieve the storage pressure of data users, and promote the continuous growth of data. However, the storage pressure brought by the increasing data forces cloud service providers to try to reduce the service cost and improve the quality of cloud computing services through various technologies.

As a data compression technology with excellent performance, data deduplication can help a cloud server to save storage resources without affecting data access by detecting and deleting redundant copies of the same data, and thus, the data decompression technology has received extensive attention and research. From the data representation, the existing duplication removal technology can be divided into plaintext data duplication removal and ciphertext data duplication removal, and the plaintext data duplication removal and the ciphertext data duplication removal consume certain computing resources to protect the data privacy of a user; from the mode, the existing deduplication technology can be divided into server-side deduplication and client-side deduplication, the client-side deduplication allows other users except the first uploader to upload data in a way of proof of ownership PoWs, and the server-side deduplication does not need to ask all users to upload the whole file, so that more network bandwidth can be saved, but security threats such as copy forgery attack and evidence replay attack are faced; from the view of data granularity, the existing deduplication technology can be divided into file-level deduplication and data block-level deduplication, and in the latter, not only can file-level duplicate detection and deletion be realized, but also data block duplicate detection and deletion across files can be realized, so that more storage resources can be saved than in the former. Although the block-level client ciphertext data deduplication technology has a better application prospect and many potential safety hazards are solved to different degrees, the data deduplication technology in the centralized storage service architecture still faces security threats and performance bottlenecks caused by copy forgery attacks and single-point failures.

Firstly, in most of the existing client ciphertext data deduplication technologies, because the server cannot judge the consistency between the outsource ciphertext data and the label, a malicious initial uploader can launch a copy forgery attack through uploading the label and a toxic copy of the target file. After the subsequent uploader finishes uploading data through PoWs, only the toxic copy can be downloaded, namely the original data is lost. The existing solution alleviates the security threat brought by the copy-forgery attack by embedding a traceable signature technology, or avoids the data loss caused by the copy-forgery attack by a data storage strategy of a double tag. However, in these methods, both the initial uploader and the subsequent uploader need to encrypt the whole outsourced data, which results in great computation cost and limits the performance of the scheme. Secondly, the key of the data deduplication technology is to keep only a unique physical copy of the file, and the data of all users is centrally stored in one server, which makes the centralized deduplication system more vulnerable to security threats brought by a single point of failure. A simple approach is for the user to store multiple copies of the data on different servers, but the user is therefore faced with additional storage costs. The other method is that a user encrypts and blocks a file by using a trapdoor secret sharing technology, and then the file is respectively stored on a plurality of different servers, even if a single server fails, the user can recover outsourced data by using data blocks with the number reaching the trapdoor value, but the method is not beneficial to retrieval and integrity verification of user data.

The currently common data integrity verification technology is public audit, that is, a user authorizes a Third Party Auditor (TPA) to complete a data audit task, and returns an audit result. The core of the commonly used audit tag generation algorithm is mainly the RSA algorithm and the BLS short signature technology, but the methods are limited by system parameters, data needs to be divided into small blocks to calculate the audit tag, so that a large amount of calculation and storage resource consumption is caused, and the method is not suitable for deduplication and audit of large files. One simple solution is that the user calculates the audit tag using the hash value of the data chunk, thereby breaking the limitation of system parameters on the size of the data chunk, but a malicious auditor can also use the hash value to generate evidence and pass the challenge. While the approach of adding a random challenge seed may solve this problem, it means that the user needs to update the audit tag periodically, which is not practical in a centralized data storage service scenario. In addition, the traditional public auditing technology relies heavily on the fully trusted TPA, and on one hand, the auditing tasks of all users in the system are handed over to the TPA to be completed, which inevitably causes high service pressure and service delay. Especially in a data deduplication scene, different users of the same data submit a plurality of audit requests in the same time period, and the repeated audit task inevitably causes great waste of computing resources. On the other hand, it is not realistic to find a fully authentic TPA meeting the assumptions of the public auditing techniques in real life, and thus the current auditing techniques do not have sufficient practical value.

Through the above analysis, the problems and defects of the prior art are as follows:

(1) the existing data deduplication technology is restricted by a centralized storage architecture, large-scale resource optimization and space saving cannot be realized, the feasibility of a multi-copy storage mode is reduced, and single-point faults cannot be well prevented.

(2) The existing audit tag generation algorithm is limited by system parameters, so that the generation and the updating of lightweight tags cannot be realized, and the compatibility of a data integrity audit technology and a data deduplication technology is restricted.

(3) In the existing auditing technology, the strong trust hypothesis of TPA and the centralized TPA proxy auditing mode restrict the practicability of the data integrity auditing technology.

The difficulty in solving the above problems and defects is: the key point of solving the problem that the existing data deduplication technology cannot prevent single-point faults lies in how to reduce the data storage cost and improve the feasibility of a multi-copy storage strategy. On one hand, how to enlarge the service range of data deduplication and realize more efficient resource saving, and on the other hand, how to realize resource optimization configuration on a limit server with certain calculation and storage resources, realize reliable outsourcing data storage service, and further reduce the storage service cost and the concentration of data storage.

The key for solving the problem that the existing audit tag generation method cannot avoid resource waste lies in breaking the limitation of system parameters on the size of data blocks. The method for generating the audit label by adding the hash value of the data block and the random challenge seed has certain feasibility, but the key point is how to realize the periodic audit label updating.

The key for solving the problem of low practicability caused by a strong TPA trust hypothesis and a centralized TPA proxy audit mode in the prior art is how to find an easily obtained low trust hypothesis auditor.

The significance of solving the problems and the defects is as follows: firstly, efficient resource saving and optimal configuration are realized through large-scale data deduplication, the data storage cost can be reduced, and the feasibility and single-point fault resistance of a double (multiple) copy data storage strategy are improved. Secondly, by adopting the data block hash value as an input to calculate the audit tag, the limitation of system parameters on the data block size in the traditional scheme can be broken, and thus the number of tags is reduced to save the storage space. And if the audit tag is updated regularly, the potential safety hazard caused by the method can be solved, and the reliability of data audit is ensured. Finally, the requirement on the trust assumption of the auditor is reduced, so that the auditor can be simpler and more easily obtained, and the practicability of the data auditing scheme is improved. Particularly, in the bidirectional shared auditing method, the public auditing without completely trusting TPA is realized, the verifiable sharing of the auditing label and the auditing result is realized among different users with the same data, and the waste of computing resources and storage resources is avoided.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a data deduplication and sharing auditing method for decentralized storage based on a block chain.

The invention is realized in such a way, a data deduplication and sharing auditing method based on a block chain for decentralized storage is disclosed, and the data deduplication and sharing auditing method based on the block chain for decentralized storage comprises the following steps:

firstly, establishing a system, generating system parameters, and initializing a decentralized storage system;

detecting the file-level copy to generate a file key and a first label, and searching whether the file is stored in the decentralized storage system or not in the blockchain by using the first label of the file;

step three, file initial uploading, namely preprocessing the external packet data to generate a data block label and detect a data block copy, and finally uploading only file data blocks and other related data which do not exist in SSP;

fourthly, uploading the file subsequently, generating PoWs challenge and PoWs evidence, and verifying the PoWs evidence;

fifthly, data auditing is carried out, auditing challenge and evidence are generated, and the auditing evidence is verified;

updating the audit label, preprocessing the audit label, generating and checking the audit label;

and step seven, downloading decryption, downloading ciphertext data, recovering the data block key, decrypting the ciphertext and obtaining plaintext data.

Further, in the step one, the specific process of system establishment is as follows:

(1) the System manager SM selects two multiplication cycle groups of order p

And

constructing a bilinear map

And from

Selecting a generator g and a random element u as system parameters;

(2) SM defines two Hash functions:

and

and two pseudo-random functions:

and

(3) SM to parameter

Published on blockchains.

Further, in the second step, the specific process of file-level copy detection is as follows:

(1) and (3) file key generation: the user U calculates a public and private key (sk, pk) of the file F, where the file private key sk is H₁(F) The method is used for data encryption and audit tag generation; file public key pk ═ g^skThe first label t ═ pk to be used as the file to realize file-level copy detection and audit evidence verification;

(2) detecting a file copy: u utilizes t to search in the blockchain whether the file F is already stored in the decentralized storage system; when t does not exist, U selects SSP according to service requirement₁And SSP₂And respectively executing an initial uploading step with the mobile terminal; otherwise, U will get SSP from the blockchain₁And SSP₂Then respectively executing the subsequent uploading steps with the relevant information.

Further, in the third step, the specific process of initial file uploading is as follows:

(1) data preprocessing: u first divides the file F into n equal-sized data blocks { m }_iThen for each data block m_iAnd i is more than or equal to 1 and less than or equal to n, and the following operations are carried out:

calculating a data block encryption key: k is a radical of_i＝H₁(m_i)；

Calculating data block cipher text: c. C_i＝Enc(k_i，m_i)；

Calculating the data block label: t is_i＝H₁(c_i)；

Fourthly, calculating the audit label of the data block:

wherein M is_i＝H₂(t||i)；

(2) Requesting to upload data: the user U calculates a block key ciphertext: ck is Enc (sk, k)₁||k₂||...||k_n) And sends an initial upload request to the SSP over a secure downlink channel<t，Ck，{T_i}，{σ_i}>。

(3) Block-level duplicate detection: SSP based on data Block tag { T_iSearching the data blocks { c ] which are not uploaded in the target file F in the local storage_i}^*And asks U to return;

(4) and (3) checking consistency: after U returns the data block, SSP checks the returned data block { c_i}^*Consistency with existing tags; if the test fails, the SSP is stopped, otherwise, the next step is executed;

(5) SSP generates and returns second label t of file to U^*＝H₁(T₁||T₂||...||T_n) And then store outsourced data F<t，t^*，Ck，{T_i}，{σ_i}>And initial upload data block c_i}^*While creating an index list of file F on the blockchain<t，t^*，U，SSP>Data deduplication for subsequent users.

(6) The user U deletes the local data copy and only keeps (sk, pk, t)^*) For subsequent data access.

Initial upload audit tag { sigma_iThe PoWS and audit processes will only be used in the first audit period, and when the used audit tag reaches the threshold, the SSP will update the audit tag and start a new audit period.

Further, in the fourth step, the specific subsequent file uploading process is as follows:

(1) sending an uploading request: a user U sends a subsequent uploading request of the data F to an SSP;

(2) generating a PoWs challenge: SSP randomly selects three random numbers z ∈ [1, n ∈ ]]And

then constitute PoWs challenge Chal_x＝(z，r₁，r₂) And sending to U;

(3) generating PoWs evidence: u blocks the file F and performs the following for each parameter i ∈ [1, z ]:

calculating the sequence number of the ith challenged data block: a is_i＝π₁(i，r₁)；

Calculating coefficients of the ith challenged data block: b_i＝π₂(i，r₂)；

Calculating the label of the ith challenged data block:

finally, U calculates Chal_xCorresponding PoWs evidence

And returned to the SSP.

(4) Verifying PoWs evidence: SSP from challenge Chal_xCalculate the sequence number of the challenged data block { a }_iAnd coefficient b_iThen calculate the aggregate audit tag of the challenged data block:

and

and judging whether the following equations are satisfied by utilizing bilinear mapping operation:

if the equation is true, SSP returns F's second label t to U^*For data access, otherwise shutdown; in particular, if U is the true data owner but does not pass the PoWs, i.e. the outsourced file F has suffered a copy-forgery attack, then U can perform an initial upload step to upload the data, since the SSP can verify the file second tag t^*And the data are consistent with the outsourced ciphertext data, so that the U can avoid the data loss caused by copy-forgery attack.

Further, in the sixth step, the specific process of data auditing is as follows: user U hires server SSP₁(SSP₂) For auditors, the audited party SSP is verified by means of a "challenge-response" model₂(SSP₁) The integrity of the data copy stored in the database finally realizes the bidirectional sharing audit without completely trusted TPA. The method comprises the following specific steps:

(1) generating an audit challenge: the auditor randomly selects three random numbers z E [1, n ∈]And

an audit challenge Chal is then composed_yAnd published on the blockchain network;

(2) generating an audit evidence: audited person according to Chal_yCalculating the sequence number of the challenged data block a_iAnd coefficient b_iThen generating audit evidence

And publish to the blockchain;

(3) and (4) verifying the audit evidence: auditor obtains Proof from block chain_yWhile according to Chal_yCalculating a challenged aggregate audit tag:

and auxiliary information:

finally, whether the following equation is satisfied is judged by utilizing bilinear mapping operation:

if the equation is established, the auditor outputs result_y1, the outsourcing data stored by the auditor is complete; otherwise, outputting result _y0; auditor publishes audit result<Chal_y，Proof_y，result_y>To the blockchain.

Further, in the sixth step, the specific process of updating the audit tag is as follows:

when the used audit label of the file reaches the threshold value, the SSP of the auditor selects an effective user U to execute an audit label updating protocol, obtains a new audit label and starts a new audit period so as to ensure the high efficiency and reliability of the audit method. The method comprises the following specific steps:

(1) pretreating an audit label: SSP randomly generates challenge parameters

And performs the following operations:

for each data block c_iI is more than or equal to 1 and less than or equal to n to generate challenge seeds s_i＝H₁(i||s)；

Generating audit label to be signed

Wherein M is_i＝H₂(t||i)；

Calculating auxiliary information

Fourthly, updating the information<t，t^*，aux>Publishing on a block chain, and sending an audit tag rho to be signed through a channel under the chain_iSending the data to a user U;

(2) generating an audit label: the user U obtains the update information from the blockchain and SSP, respectively<t，t^*，aux>And ρ ═ ρ_iAnd performing the following operations:

checking the consistency of rho and aux; if the detection is successful, the next step is carried out, otherwise, the machine is stopped; secondly, calculating a new audit label by using the file private key sk

③ proof of calculation workload P_w＝aux^sk；

Fourthly, publishing the workload certificate on the block chainMing dynasty<t，t^*，aux，P_w>And converting the audit tag sigma to { sigma over the downlink channel_iIt returns to the auditor SSP.

(3) And (3) inspecting the audit label: SSP proves P based on auxiliary information aux and workload_wVerifying the correctness and validity of the audit tag; if the verification fails, the SSP stops, the user is pulled into a blacklist, and a new effective user is selected to execute the audit tag updating operation; otherwise, the SSP pays rewards to the effective users U participating in the audit tag updating, and opens a new audit period.

Further, in the sixth step, the specific downloading and decrypting process includes:

(1) data downloading: u utilizes t and t^*Downloading data cipher text C ═ { C) from SSP_i} and a block key ciphertext Ck;

(2) and (3) key recovery: the U recovers the data block key using sk: k is a radical of₁||k₂||...||k_n＝Dec(sk，Ck)；

(3) Data decryption: u uses the block key to decrypt the ciphertext data block: m is_i＝Dec(k_i，c_i) I is more than or equal to 1 and less than or equal to n, and finally obtaining complete plaintext data F ═ m₁||m₂||...||m_n。

It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

establishing a system, generating system parameters, and initializing a decentralized storage system;

detecting the file-level copy, generating a file key and a first label, and searching whether the file is stored in the decentralized storage system in the blockchain by using the first label of the file;

initially uploading a file, namely preprocessing the packet data to generate a data block label and detect a data block copy, and finally uploading only relevant data such as file data blocks which do not exist in SSP;

the file is subsequently uploaded to generate PoWs challenge and PoWs evidence, and the PoWs evidence is verified;

data auditing, namely generating an auditing challenge and an auditing evidence and verifying the auditing evidence;

updating the audit label, preprocessing the audit label, generating the audit label, and checking the audit label;

downloading and decrypting, downloading the ciphertext data, recovering the data block key, decrypting the ciphertext and obtaining the plaintext data.

The invention also aims to provide an information data processing terminal, which is used for realizing the data deduplication and sharing auditing method based on the block chain for decentralized storage.

By combining all the technical schemes, the invention has the advantages and positive effects that:

safe and efficient data deduplication is realized based on a block chain: in the aspect of efficiency, the invention realizes high-expansibility data deduplication based on the block chain, and is suitable for data deduplication and resource optimization configuration in a wider range. Meanwhile, more efficient resource saving is realized through block-level deduplication with fine granularity, so that the method is also suitable for deduplication scenes of large files. In the aspect of safety, the invention adopts a message locking encryption algorithm to encrypt the outsourced data, thereby ensuring the confidentiality and privacy of the outsourced data facing to the storage server and an external adversary. In addition, the invention adopts a data storage mode of double labels, ensures the security of the outsourced data under the attack of copy forgery, allows the subsequent users to finish data uploading without encrypting the whole outsourced data, and saves a large amount of computing resources and storage resources compared with the existing solution. Finally, the invention adopts a data storage model of double servers to obtain better single-point copy resistance under the background of efficient resource saving. It is worth mentioning that the data storage model based on the double servers provides a basic framework for the subsequent generation and update of the lightweight audit tag and the public audit without TPA, and the like, and the basic framework runs through the beginning and the end of the invention;

and (3) generating and updating a lightweight audit tag: the invention adopts the hash value of the data block as the input to calculate the audit tag, breaks through the limitation of the system parameter to the data block size in the traditional method, and provides possibility for reducing the tag data redundancy and improving the compatibility of the audit technology and the duplicate removal technology. However, the data can be deleted by the storage server, and only the data hash value is saved to deal with the safety problem of auditing, so that the invention provides a safe and efficient audit tag updating protocol based on the double-server data storage model, and the protocol ensures the reliability of the integrity audit technology by adding random challenge seeds and a method of regular updating;

TPA-free shared audit based on blockchain: under the assumption of security that servers are not colluded, the method allows a user to employ one server of target data as an auditor, checks the integrity of the data copy in the other server, and ensures the reliability and traceability of audit because all audit processes are embodied on a block chain. In addition, the invention realizes the PoWs in the integrity audit and the subsequent uploading process by using the same audit tag, and the subsequent uploader can know the consistency of the audit tag and the held data thereof through the PoWs. Therefore, different users with the same data do not need to upload the audit tag repeatedly, the integrity of the data can be obtained from the audit result of the block chain by sharing the audit tag, and resource waste caused by repeated audit requests is avoided.

Therefore, the invention realizes the sharing of data physical copies (data deduplication), the generation and sharing of lightweight audit tags and the sharing of audit results among different users of the same data, saves a large amount of computing resources and storage resources, and has higher practicability compared with the prior method.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a basic flowchart of a data deduplication and sharing auditing method based on a block chain for decentralized storage according to an embodiment of the present invention.

Fig. 2 is a system model diagram of a block chain-based secure data deduplication and shared auditing method in decentralized storage according to an embodiment of the present invention.

FIG. 3 is a flowchart of the operation of a block chain-based method for deduplication and shared auditing of secure data in decentralized storage according to an embodiment of the present invention.

FIG. 4 is a graph comparing performance of a block chain-based secure data deduplication and shared auditing method in decentralized storage according to an embodiment of the present invention with that of a prior art scheme at a data outsourcing stage; (a) initial uploading; (b) subsequent uploading; (c) and downloading and decrypting.

FIG. 5 is a graph comparing the performance of a block chain-based secure data deduplication and shared auditing method in decentralized storage according to an embodiment of the present invention with that of a prior art scheme in data integrity auditing; (a) generating evidence; (b) verifying evidence; (c) and updating the label.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a data deduplication and sharing auditing method for decentralized storage based on a block chain, and the invention is described in detail below with reference to the attached drawings.

As shown in fig. 1, a method for data deduplication and shared auditing based on a block chain for decentralized storage according to an embodiment of the present invention includes:

s101: and establishing a system, generating system parameters, and initializing a decentralized storage system.

S102: and detecting the file-level copy, generating a file key and a first label, and searching whether the file is stored in the decentralized storage system in the blockchain by using the first label of the file.

S103: the file is initially uploaded, firstly, the package data is preprocessed to obtain related data such as an encrypted data block, a data block label, an audit label and the like, then the data block label is uploaded to the SSP to perform data block level copy detection, and finally, only the related data such as the file data block, the audit label and the like which do not exist in the SSP are uploaded.

S104: and uploading the file subsequently, generating PoWs challenge and PoWs evidence, and verifying the PoWs evidence.

S105: and data auditing, namely generating an auditing challenge and an auditing evidence and verifying the auditing evidence.

S106: updating the audit label, preprocessing the label, generating the audit label, and checking the audit label.

S107: downloading and decrypting, downloading the ciphertext data, recovering the data block key, decrypting and obtaining the plaintext data.

Those skilled in the art can also implement the method of data deduplication and sharing auditing based on block chain for decentralized storage according to the present invention by using other steps, and the method of data deduplication and sharing auditing based on block chain for decentralized storage according to the present invention shown in fig. 1 is only a specific embodiment.

As shown in fig. 2, the de-centralized storage block chain-based data deduplication and sharing auditing system model provided by the embodiment of the present invention includes four types of entities: a storage service provider module (SSP)1, a data user module (U)2, a block chain module (Blockchain)3 and a System Manager (SM) 4. The SM module is only responsible for initializing system parameters when the storage system is established; the Blockchain module is a Blockchain distributed ledger system maintained by miners and is used for recording necessary data involved in data outsourcing storage service. Therefore, the storage system model mainly relates to an SSP module and a U module, a distributed decentralized storage system is constructed through a block chain network, SSPs of storage nodes in the system do not have mutual collusion, and each outsourced file is selected by a first uploader of the outsourced file from alternative SSPs to meet the service requirements₁And SSP₂And completes data encryption and outsourcing, and the subsequent uploader must pass PoWs protocol and SSP₁And SSP₂And finishing data uploading so as to realize client data deduplication. In addition, user nodes Us may employ SSPs₁(SSP₂) For auditor, to check SSP stored in audited person₂(SSP₁) The whole auditing process is disclosed to the block chain, so that the bidirectional shareable auditing without a completely trusted TPA is realized.

As shown in fig. 3, in S101 provided by the embodiment of the present invention, a specific process of system establishment is as follows:

(1) the System manager SM selects two multiplication cycle groups of order p

And

constructing a bilinear map

And from

Selecting a generator g and a random element u as system parameters;

(2) SM defines two Hash functions:

and

and two pseudo-random functions:

and

(3) SM to parameter

Published on blockchains.

In S102 provided by the embodiment of the present invention, a specific process of file-level copy detection is as follows:

(1) and (3) file key generation: user U meterComputing a public and private key pair (sk, pk) for file F, where the file private key sk is H₁(F) The method is used for data encryption and audit tag generation; file public key pk ═ g^skThe first label t ═ pk to be used as the file to realize file-level copy detection and audit evidence verification;

In S103 provided by the embodiment of the present invention, the specific process of initial file uploading is as follows:

calculating a data block encryption key: k is a radical of_i＝H₁(m_i)；

Calculating data block cipher text: c. C_i＝Enc(k_i，m_i)；

Calculating the data block label: t is_i＝H₁(c_i)；

Fourthly, calculating the audit label of the data block:

wherein M is_i＝H₂(t||_i)；

(5) SSP generates and returns second label t of file to U^*＝H₁(T₁||T₂||...||T_n) And then store outsourced data F<t，t^*，Ck，{T_i}，{σi_}>And initial upload data block c_i}^*While creating an index list of file F on the blockchain<t，t^*，U，SSP>Data deduplication for subsequent users.

Audit label { sigma ] generated by initial uploading_iAnd the method is only used for PoWs and an auditing process in the first auditing period, and after the used auditing label reaches a certain threshold value, the SSP executes an auditing label updating protocol to obtain a new label and starts a new auditing period. .

In S104 provided by the embodiment of the present invention, the specific subsequent file uploading process is as follows:

then constitute PoWs challenge Chal_x＝(z，r₁，r₂) And sending to U;

Calculating the firstTags for i challenged data blocks:

finally, U calculates Chal_xCorresponding PoWs evidence

And returned to the SSP.

and auxiliary information:

In S106 provided by the embodiment of the present invention, a specific process of data audit is as follows:

And publish to the blockchain;

and auxiliary information:

In S106 provided by the embodiment of the present invention, the specific process of updating the audit tag is as follows:

when the used audit label of the file reaches the threshold value, the SSP of the auditor selects an effective user U to execute an audit label updating protocol, obtains a new audit label and starts a new audit period, and the method specifically comprises the following steps:

(1) pretreating an audit label: SSP randomly generates challenge parameters

And performs the following operations:

Generating audit label to be signed

Wherein M is_i＝H₂(t||i)；

Calculating auxiliary information

checking the consistency of rho and aux; if the detection is successful, the next step is carried out, otherwise, the machine is stopped;

secondly, calculating a new audit label by using the file private key sk

③ proof of calculation workload P_w＝aux^sk；

Fourth, publishing the proof of the workload on the blockchain<t，t^*，aux，P_w>And converting the audit tag sigma to { sigma over the downlink channel_iIt returns to the auditor SSP.

In the invention, the subsequent file uploading and data auditing processes adopt a verification mechanism of 'challenge-response'. In addition, the invention adopts data hash value calculationCompared with the traditional auditing mechanism, the auditing label saves more calculation and storage resources, but faces the security risk that a malicious challenged can pass verification by keeping the hash value of the data block, so that the method provides an auditing label updating protocol based on a double-server data storage model, and ensures that the auditing method has high efficiency and reliability. Different from the PoWs and the audit challenge process in the first period, after the audit tag is updated, the challenger needs to add a challenge seed set corresponding to the challenged data block into the PoWs and the audit challenge

While provers need to embed the relevant challenge seed when computing evidence:

the party may pass the verification.

In S107 provided by the embodiment of the present invention, the specific downloading and decrypting process includes:

In order to better show the technical performance of the data deduplication and sharing audit method based on the block chain for decentralized storage, the embodiment selects the current two most advanced block chain-based data deduplication and accounting schemes [1] [2] as comparison schemes, and tests the calculation cost (time) of each comparison scheme on a Windows 10 notebook computer with a CPU of 2.30GHz Intel Core 7-1068NG7 and a memory of 16GB by calling a GMP and a PBC encryption operation library through Python. As shown in fig. 4 (a) - (c), each alignment scheme mainly involves three stages in data outsourcing (deduplication): initial upload, subsequent upload and download decryption.

The computational cost of the initial upload phase is mainly used to encrypt data and generate audit tags. As shown in (a) of fig. 4, since scheme [1] does not require encryption of data, 351.76s, 1730.61s, and 3565.92s consumed when processing data of 1MB, 5MB, and 10MB are mainly used to generate audit tags. While the computational cost consumed by scheme [2] is essentially the same as scheme [1], so its time for data encryption is negligible relative to the cost of computing the audit ticket. However, since the present invention adopts a light-weight audit tag generation algorithm, the respective costs of consumption 1.89s, 9.47s and 18.89s are about 0.52% of the schemes [1] and [2], because the data block sizes in the schemes [1] and [2] are limited by system parameters (160 bits-20B), while the data block size in the inventive scheme is not limited, and is therefore set to 4KB to achieve the most efficient data deduplication [3 ].

The calculation cost of the subsequent uploading stage is different due to the scheme structure, if the scheme [2] supports the server-side deduplication, the user needs to consume the same calculation cost as that of the initial uploading stage, and the scheme [1] and the scheme of the present invention adopt the client-side deduplication mode, and the subsequent user can complete the data uploading only by generating a correct ownership evidence, so the calculation cost of the scheme [2] is far higher than that of the scheme [1] and the scheme of the present invention, and therefore (b) in fig. 4 only shows the calculation cost of the scheme [1] and the scheme of the present invention: while scheme [1] consumes 2.65s and 4.62s when challenged with 300 and 460 data blocks, the inventive scheme consumes 1.82s and 2.71s, respectively, which saves about 37.7% of the computation cost compared to scheme [1], thereby achieving better performance.

The computational cost of the download decryption stage is mainly used for decrypting the data and verifying the tag, as shown in (c) of fig. 4, since scheme [1] does not require decrypting the data, the computational cost is substantially 0; scheme [2] consumes 1.21s, 6.02s and 10.82s when decrypting 1MB, 5MB and 10MB data, while the inventive scheme consumes 0.02s, 0.11s and 0.21s, respectively, because scheme [2] requires performing tag verification to verify data integrity, whereas the inventive scheme provides multiple data integrity guarantees without performing this operation, saving a lot of computational cost.

In addition, the present embodiment also tests the computational cost of each comparison case in terms of data integrity auditing, as shown in (a) - (c) of fig. 5, mainly relating to three aspects of evidence generation, evidence verification and label update.

In the evidence generation phase, when the challenged data blocks are 300 and 460, respectively, the computation cost of the scheme [1] is 0.49s and 0.75s, the computation cost of the scheme [2] is 0.46s and 0.69s, and the computation cost of the scheme of the invention is 0.02s and 0.03 s. It is clear that the inventive solution costs less than solutions [1] and [2] in generating evidence.

However, in the evidence verification stage, the corresponding calculation costs of scheme [1] are 1.51s and 2.22s, scheme [2] is 1.39s and 2.09s, and the inventive scheme is 2.01s and 3.08s, which are higher than the evidence verification costs of scheme [1] and scheme [2 ]. This occurs because scheme [1] and scheme [2] require the auditor to calculate the verification value while generating the evidence, whereas the inventive scheme requires the auditor to calculate the verification value by itself. The computational cost of the three comparison schemes is essentially the same from the overall point of view of data auditing.

Finally, since only the present invention scheme supports audit tag updates, fig. 4 (c) shows only the computation costs of the SSP and the client U in the present invention scheme. When the data size is 1MB, 5MB, 10MB, the computation cost of the SSP end is 1.40s, 7.13s, and 13.18s, respectively, and the computation cost of the U end is: 0.43s and 2.20s and 4.08 s. The audit tag updating is to ensure the effectiveness and feasibility of the lightweight tag generation algorithm, and the lightweight tag generation algorithm not only reduces the calculation cost of the initial uploading stage of the user, but also helps the SSP to save a large amount of storage space, so the cost of updating the audit tag is acceptable.

In conclusion, the data deduplication and sharing auditing method based on the block chain for decentralized storage not only improves the security of outsourcing data, but also can save a large amount of computing and storage resources, so that the method has better practicability compared with other existing technical schemes.

The data deduplication and sharing auditing method based on the block chain for decentralized storage can effectively prevent security problems such as confidentiality, single-point failure, copy forgery attack and the like in the data deduplication technology while realizing efficient resource saving and optimal configuration. In addition, the lightweight audit tag generation algorithm and the update protocol can save a large amount of calculation, communication and storage resources while ensuring the reliability of the data integrity audit technology. On the basis, the invention realizes the two-way public auditing without TPA based on the double-server data storage model, and the sharing of the data auditing label and the auditing result effectively avoids the resource waste caused by the repeated auditing request within a certain time. Therefore, the invention provides a relatively perfect frame combining data deduplication and audit, not only realizes the sharing of physical data duplicates, but also realizes the verifiable sharing of audit labels and audit results among different users of the same data, and particularly provides a public audit mechanism without TPA (remote operated ticket) in decentralized storage, thereby not only ensuring the safety of data in a decentralized storage scene, but also avoiding unnecessary cost overhead in a data outsourcing process, and improving the practicability of the data deduplication and audit technology.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Reference documents:

[1]Y.Xu,C.Zhang,G.Wang,Z.Qin,and Q.Zeng,“A blockchain-enabled deduplicatable data auditing mechanism for network storage services,”IEEE Transactions on Emerging Topics in Computing,2020.

[2]H.Yuan,X.Chen,J.Wang,J.Yuan,H.Yan,and W.Susilo,“Blockchain-based public auditing and secure deduplication with fair arbitration,”Information Sciences,vol.541,pp.409-425,2020.

[3]C.Ng and P.P.Lee,“Revdedup:A reverse deduplication storage system optimized for reads to latestbackups,”inProceedings ofthe4thAsia-Pacificworkshop on systems,2013,pp.1-7.

Claims

1. a data deduplication and sharing auditing method based on a block chain for decentralized storage is characterized in that the data deduplication and sharing auditing method based on the block chain for decentralized storage comprises the following steps:

2. The de-centralized storage block chain-based data deduplication and sharing auditing method of claim 1, wherein the specific process of system establishment is as follows:

(1) the System manager SM selects two multiplication cycle groups of order p

And

constructing a bilinear map e:

and from

Selecting a generator g and a random element u as system parameters;

(2) SM defines two Hash functions: h₁:

And H₂:

And two pseudo-random functions: pi₁:

And pi₂:

(3) SM to parameter

Published on blockchains.

3. The de-centralized storage block chain-based data de-duplication and sharing auditing method according to claim 1, characterized in that the specific file-level copy detection process is as follows:

(1) and (3) file key generation: the user U calculates a public and private key (sk, pk) of the file F, where the file private key sk is H₁(F) Will be used for data encryption and audit tag generation; file public key pk ═ g^skThe first label t ═ pk to be used as a file to realize file-level copy detection and audit evidence verification;

4. The de-centralized storage block chain-based data de-duplication and sharing auditing method according to claim 1, characterized in that the specific process of file initial uploading is as follows:

calculating a data block encryption key: k is a radical of_i＝H₁(m_i)；

Calculating data block cipher text: c. C_i＝Enc(k_i,m_i)；

Calculating the data block label: t is_i＝H₁(c_i)；

Fourthly, calculating the audit label of the data block:

wherein M is_i＝H₂(t||i)；

(2) Requesting to upload data: the user U calculates a block key ciphertext: ck is Enc (sk, k)₁||k₂||...||k_n) And sends an initial upload request < T, Ck, { T over a secure downlink channel to the SSP_i},{σ_i}＞。

(5) SSP generates and returns second label t of file to U^*＝H₁(T₁||T₂||...||T_n) Then store < t, t of outsourced data F^*,Ck,{T_i},{σ_iH and initial upload data block c_i}^*While creating an index list < t, t for file F on the blockchain^*U, SSP >, for data deduplication by subsequent users.

In particular, an audit tag of the initial upload phase { σ }_iAnd the method is only used for PoWs and an auditing process in the first auditing period, and when the used auditing label reaches a threshold value, the SSP updates the auditing label and starts a new auditing period.

5. The de-centralized storage block chain-based data de-duplication and sharing auditing method according to claim 1, characterized in that the specific process of file subsequent uploading is as follows:

(2) generating a PoWs challenge: SSP random selection of threeA random number z ∈ [1, n ]]And r₁,

Then constitute PoWs challenge Chal_x＝(z,r₁,r₂) And sending to U;

calculating the sequence number of the ith challenged data block: a is_i＝π₁(i,r₁)；

Calculating coefficients of the ith challenged data block: b_i＝π₂(i,r₂)；

Calculating the label of the ith challenged data block:

finally, U calculates Chal_xCorresponding PoWs evidence

And returned to the SSP.

(4) Verifying PoWs evidence: SSP from challenge Chal_xCalculate the sequence number of the challenged data block { a }_iAnd coefficient b_iThen calculate the aggregate audit tag of the challenged data block

And

if the equation is true, SSP returns F's second label t to U^*For data access, otherwise shutdown; in particular, it is possible to use, for example,if U is the real data owner but does not pass PoWs, that is, the outsourced file F has been attacked by copy forgery, then U can perform the initial uploading step to upload data, since SSP can verify the second label t of the file^*And the data are consistent with the outsourced ciphertext data, so that the U can avoid the data loss caused by copy-forgery attack.

6. The de-centralized storage block chain-based data de-duplication and sharing auditing method according to claim 1, characterized in that the data auditing specific process is as follows: user U hires server SSP₁(SSP₂) For auditors, the audited party SSP is verified by means of a "challenge-response" model₂(SSP₁) The integrity of the data copy stored in the database finally realizes the bidirectional sharing audit without a completely trusted third party auditor TPA. The method comprises the following specific steps:

(1) generating an audit challenge: the auditor randomly selects three random numbers z E [1, n ∈]And r₁,

And publish to the blockchain;

and auxiliary information:

finally, the bilinear mapping operation is used to judge the following equationIf not:

if the equation is established, the auditor outputs result_y1, the outsourcing data stored by the auditor is complete; otherwise, outputting result_y0; auditor publishes audit result < Chal_y,Proof_y,result_yTo the blockchain.

7. The de-centralized storage block chain-based data de-duplication and sharing auditing method of claim 1 characterized in that the auditing label updating process is as follows: when the used audit label of the file reaches the threshold value, the SSP of the auditor selects an effective user U to execute an audit label updating protocol, obtains a new audit label and starts a new audit period so as to ensure the high efficiency and reliability of the audit method. The method comprises the following specific steps:

(1) pretreating an audit label: SSP randomly generates challenge parameters

And performs the following operations:

Generating audit label to be signed

I is more than or equal to 1 and less than or equal to n, wherein M_i＝H₂(t||i)；

Calculating auxiliary information

Fourthly, the updated information is less than t, t^*Aux > is published on the blockchain, and the audit tag rho to be signed is determined through a channel under the chain_iSending the data to a user U;

(2) generating an audit label: the user U obtains the update information < t, t from the blockchain and SSP, respectively^*Aux > and ρ ═ ρ_iAnd performing the following operations:

secondly, calculating a new audit label by using the file private key sk

1≤i≤n；

③ proof of calculation workload P_w＝aux^sk；

Fourthly, publishing the work amount certification < t, t on the block chain^*,aux,P_w>. and audit tag σ ═ σ { σ } over the downlink channel_iIt returns to the auditor SSP.

8. The de-centralized storage block chain-based data de-duplication and sharing auditing method according to claim 1, characterized in that the specific downloading and decryption process is as follows:

(2) and (3) key recovery: the U recovers the data block key using sk: k is a radical of₁||k₂||...||k_n＝Dec(sk,Ck)；

(3) Data decryption: u uses the block key to decrypt the ciphertext data block: m is_i＝Dec(k_i,c_i) I is more than or equal to 1 and less than or equal to n, and finally obtaining complete plaintext data F ═ m₁||m₂||...||m_n。

9. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

10. An information data processing terminal, characterized in that the information data processing terminal is used for implementing the data deduplication and sharing auditing method based on block chain for decentralized storage according to any one of claims 1 to 8.