CN113094753B - Big data platform hive data modification method and system based on block chain - Google Patents

Big data platform hive data modification method and system based on block chain Download PDF

Info

Publication number
CN113094753B
CN113094753B CN202110497644.6A CN202110497644A CN113094753B CN 113094753 B CN113094753 B CN 113094753B CN 202110497644 A CN202110497644 A CN 202110497644A CN 113094753 B CN113094753 B CN 113094753B
Authority
CN
China
Prior art keywords
information
modification
data
data table
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110497644.6A
Other languages
Chinese (zh)
Other versions
CN113094753A (en
Inventor
舒海
杨文逸
罗小东
白慧静
陈静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank Of Chongqing Co ltd
Original Assignee
Bank Of Chongqing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank Of Chongqing Co ltd filed Critical Bank Of Chongqing Co ltd
Priority to CN202110497644.6A priority Critical patent/CN113094753B/en
Publication of CN113094753A publication Critical patent/CN113094753A/en
Application granted granted Critical
Publication of CN113094753B publication Critical patent/CN113094753B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a big data platform hive data modification method based on a block chain, which comprises the following steps: generating a block chain for storing operation information of the shared data table; when the initiating terminal carries out modification operation, the operation information is written into a new block to be linked to the block chain, and the block chain is broadcasted to each sharing terminal; each sharing end respectively verifies the authenticity of the operation, and if all the sharing ends pass the verification, the modified content is updated; otherwise, restoring the data to the state before modification. In the invention, the modification information of the data is recorded through the block chain, each modification of the data is verified by each sharing end, and the modification is finished only when each sharing end is verified, so that the operation content recorded by the initiating end is consistent with the actual operation result after modification, the data is prevented from being tampered by bypassing hive by an account number, and the authenticity of the modification record is ensured, thereby facilitating audit.

Description

Big data platform hive data modification method and system based on block chain
Technical Field
The invention relates to the technical field of big data platforms, in particular to a live data modification method and system of a big data platform based on a block chain.
Background
The Hadoop platform is a distributed storage and processing platform suitable for big data, and the Hive is a data warehouse tool based on Hadoop and can carry out data sorting, special query and analysis processing on data sets in files stored on the HDFS. In a big data Hadoop platform hive data warehouse, some data can be modified by a plurality of account numbers, the account numbers can be account numbers which issue data together, and also can be superior account numbers with larger authority, administrator account numbers and the like, the account numbers can achieve the purpose of modifying the data by operating data files on a distributed file system hdfs corresponding to a hive table without through hive components, and the big data platform cannot provide enough data to prove that the data modification comes from certain operation of which account number, so that the legality of the operation cannot be checked, and the account numbers from which the data modification comes cannot be traced back. The main reasons are as follows:
1) The owner account number and the role information of the files in the Hive data warehouse are lost: and almost all permission components such as a send and the like are required to be used under a large data platform multi-account management system. But after the permission management components such as the sender and the like are started, data is written into a hive data warehouse path (/ user/hive/ware house /) in any mode, and the hdfs layer of the distributed file system displays that the owners are hives, but the accounts and roles for initiating the writing operation cannot be displayed, and the accounts and roles from which the operation comes cannot be located.
2) The audit log recorded by hdfs only provides time record, source and operation object of operation, lacks data range of operation result and operation influence, and cannot be used for tracing to a provider of specific data; the audit log of the hive record only contains SQL statements submitted by the hive, but if the account directly operates the hdfs file to modify data by bypassing the hive, the hive cannot generate the audit record, so that the audit operation compliance and the data source tracing difficulty are caused.
For data modification auditing of a hive data warehouse of a big data platform, the currently adopted technical scheme mainly comprises the following three types:
1) The signature record data provider information scheme is provided at a fixed location of the data. For example, a signature field is added in a hive shared data table, information and providing time of a data provider are recorded, and the corresponding data provider can be found only by looking up the content of the field during auditing and tracing.
2) The encrypted signature information is provided in a hidden location of the data. A mode of adding encryption information strings such as numbers, pictures and the like at hidden positions in data is adopted, but the scheme can completely cover and rewrite the content of the whole data file by other accounts with permission to modify the data, so that the signature is invalid, and the data source cannot be proved.
3) The hash value generation blockchain is computed using the full amount of data to identify the uniqueness of the data. And calculating the hash values of the modified data contents pairwise to generate a hash tree, so that the data generates a unique hash value, then writing the hash value into a block, and sequentially connecting the block information modified each time in series to form a block chain for tracing the modified content each time. However, the method needs to calculate hash values for all data, needs a large calculation overhead, is only suitable for recording and tracing the modification of small data volume, and is long in block generation time and low in efficiency for a large data platform with a large data volume processed by one task.
Disclosure of Invention
The invention aims to provide a big data platform hive data modification method and system based on a block chain, which can prevent data from being tampered, so that the authenticity of data modification records is guaranteed.
The technical scheme of the invention is as follows:
a big data platform hive data modification method based on block chains is characterized in that a control center and a distributed file system are corresponded to the big data platform, the distributed file system stores data in a form of a shared data table, the big data platform also corresponds to an initiating end used for modifying the shared data table of the distributed file system, a plurality of sharing ends used for checking and modifying authenticity and block chains respectively generated by the initiating end and each sharing end, the block chains are used for storing operation information of the shared data table, and the operation information comprises operation content and operation results; the big data modification method comprises the following steps:
s1, an initiating terminal sends a modification request for a shared data table to a control center;
s2, the control center verifies whether the initiating end has the modification authority of the shared data table;
s3, the control center checks whether each sharing end meets the operation requirement of the current modification; if the requirement is met, executing the step S4; otherwise, the control center refuses the current modification request;
s4, the initiating terminal modifies the data of the shared data table, writes the modified operation information into a new block, connects the new block to the tail end of a block chain corresponding to the initiating terminal to generate a new block chain, and broadcasts the newly generated block chain to each shared terminal;
s5, after each sharing end receives the newly generated block chain, checking the authenticity of the operation according to the information recorded by the block chain, and returning the information whether the check is passed; if all the sharing ends return verification passes, executing the step S6; otherwise, executing step S7;
s6, each sharing end updates the block chain to the latest state; meanwhile, the distributed file system updates the modified content;
s7, restoring the data of the shared data table to a state before modification, and enabling each sharing end to discard a new block chain; and the initiating terminal discards the current block and returns the block chain to the state before modification.
Furthermore, the distributed file system corresponds to a plurality of physical nodes and a plurality of account numbers for operating the distributed file system, each account number corresponds to at least one control end, the control ends are located on the physical nodes, and the initiating end and the sharing end are both control ends of the account numbers; when an account initiates a modification request for a shared data table to a control center, the control center takes the account as an operation account modified this time, takes a control end of the operation account as an initiating end of the modification, takes other accounts sharing the shared data table as a shared account modified this time, and takes a control end of the shared account as a sharing end of the modification; the block chains are in one-to-one correspondence with the account numbers, and the block chains are stored in the control ends of the corresponding account numbers.
Further, in the step S3, the method for checking whether each sharing end meets the operation requirement of the current modification includes:
a control center of the Hadoop big data platform sends a notification requesting for data modification to each sharing end, wherein the notification requesting for data modification comprises the content of a modification request;
after receiving the notification, each sharing end checks whether the control end where the block chain is located meets the operation requirement, if so, the sharing end returns the information of approving modification to the control center, otherwise, the sharing end returns the information of refusing modification to the control center; and if the sharing ends all return the information which agrees to be modified, judging that the sharing ends all meet the operation requirement.
Further, the operation information further includes an operation source account, a recording path, an operation type, and an operation completion time.
Further, the content of the modification request includes the account signature, the operation type and the operation content of the modified request, and the modified operation type includes an add operation, an add operation and a delete operation.
Further, in step S4, the method for initiating an end to modify the data in the shared data table includes:
the control center sends the modification request of the initiating terminal to the distributed file system, and simultaneously returns a notification of approving modification to the initiating terminal to allow the modification operation of the shared data table; the initiating end firstly copies a share of data of the shared data table from the distributed file system as copy data to be stored in an independent path of a cache region used for modification of the distributed file system, and then the copy data is operated in the cache region.
Further, in step S5, the method for verifying the authenticity of the current operation by each sharing terminal is as follows:
after receiving the broadcast of the newly generated block chain modified this time, each sharing end compares the data information of the modified shared data table with the operation result of the last block in the block chain of the account number to find out the modified part; the data information and the operation result of the shared data table comprise file names, generation time and occupied storage capacity information; and reading the operation information of the current modification recorded in the current block in the newly generated block chain, checking whether the modified part is consistent with the operation information recorded in the current block, if so, returning the information which passes the verification, otherwise, returning the information which does not pass the verification.
Further, in step S3, the control center further generates a one-time ticket token, where the one-time ticket token is used to identify and mutually trust the modified request session among the control center, the distributed file system, the originating end, and each sharing end.
Further, the distributed file system generates an audit data table for recording modification information of the data table, and the audit data table corresponds to a unique row number field 'rowId', a unique field 'provider account signature' and unique field 'generation time'; the "rowId" field is a growth sequence with a self-step size of 1, used to record the modified data range; the 'provider account signature' field is used for recording the account signature of an initiator, and the 'generation time' is used for recording the modification time of data; after the step S7 is executed, relevant operation information is written into an audit data table.
A big data platform hive data modification system based on a block chain is used for modifying shared data in a big data platform and comprises the following steps:
the system comprises an initiating end, a control center and a sharing end which shares corresponding shared data with the initiating end;
the initiating terminal is used for initiating a modification request to the control center, modifying the shared file stored in the big data platform under the condition that the control center allows modification, writing the operation information of the modification into a new block, connecting the new block to the tail end of a block chain corresponding to the initiating terminal to generate a new block chain, and broadcasting the newly generated block chain to the shared terminal for inspection;
the control center is used for verifying whether an initiating terminal has the modification authority of the shared data table, detecting whether the shared terminal meets the operation requirement of the modification after the initiating terminal is verified to have the modification authority, and allowing the initiating terminal to modify the shared file if the shared terminal meets the requirement;
the sharing end is used for receiving the newly generated block chain from the initiating end, carrying out authenticity check according to the newly generated block chain, updating the new block chain to the latest state if the check is passed, and otherwise discarding the new block chain;
and the control center is also used for enabling the initiating terminal to discard the block of the time when the sharing terminal fails to pass the inspection, and returning the block chain to the state before modification.
In the invention, the modification information of the data is recorded through the block chain, each modification of the data is verified by each sharing end, and the modification is finished only when each sharing end is verified, so that the operation content recorded by the initiating end is consistent with the actual operation result after modification, the data is prevented from being tampered by bypassing hive by an account number, and the authenticity of the modification record is ensured, thereby facilitating audit.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of a hive data modification method of a big data platform based on a block chain according to the present invention;
fig. 2 is a logic diagram of a big data platform hive data modification method based on a block chain according to a preferred embodiment of the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the embodiments of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention are further described in detail below with reference to the accompanying drawings.
As shown in fig. 1 and 2, the hive data modification method of the big data platform based on the block chain is used for data modification of a hive data warehouse of a Hadoop big data platform, the Hadoop big data platform corresponds to a control center and a distributed file system, the distributed file system stores data in the form of a shared data table, the distributed file system corresponds to a plurality of physical nodes and a plurality of account numbers used for operating the distributed file system, each account number corresponds to at least one control end, and the control ends are located on the physical nodes; each physical node can be provided with a plurality of control ends of the account, each account can also be provided with a plurality of control ends, and the plurality of control ends are respectively arranged on different physical nodes. The distributed file system corresponds to an initiating terminal used for modifying the shared data table and a plurality of sharing terminals used for verifying the authenticity of modification, and block chains are respectively generated corresponding to the initiating terminal and each sharing terminal. The initiating terminal and the sharing terminal are both control terminals of the account; when an account initiates a modification request for the shared data table to the control center, the control center takes the account as an operation account for the modification, takes a control end of the operation account as an initiating end of the modification, takes other accounts sharing the shared data table as shared accounts for the modification, and takes the control end of the shared accounts as a sharing end of the modification. The block chains are in one-to-one correspondence with the account numbers, and the block chains are stored in one control end of the corresponding account numbers. The block chain is used for storing operation information of the corresponding account on the shared data table, wherein the operation information comprises operation content and operation results; of course, the operation information may also include an operation source account number, a recording path, an operation type, an operation completion time, and a one-time ticket token. The control center synchronizes in real time and stores the operable shared data table range of each account on the big data platform.
A preferred embodiment of the big data platform hive data modification method based on the block chain comprises the following steps:
s1, an initiating end sends a modification request for a shared data table to a control center.
The control center synchronizes the operation authority of each account on the shared data table; the content of the modification request comprises an account signature, an operation type and operation content of the modified request, and the modified operation type comprises an adding operation, an adding operation and a deleting operation.
The adding operation is used for adding content in the file of the shared data table, and the adding operation can only add the content at the tail end of the file and cannot add the content in the middle of the file in order to prevent data tampering.
The new adding operation is used for adding a file in the shared data table.
The deleting operation is used for deleting part or All files in the shared data table, when All files are deleted, the operation content is 'All', when part of files are deleted, the operation content is a file list, and the files in the list are the files which are requested to be deleted.
S2, the control center verifies whether the initiating end has the modification authority of the shared data table; and if the modification authority exists, executing the step S4, otherwise, refusing the current modification request.
S3, the control center generates a disposable token and checks whether each sharing end meets the operation requirement of the modification; if the requirement is met, executing the step S4; otherwise, the control center refuses the modification request and informs the initiating terminal. The purpose of generating the one-time ticket token is to identify and mutually trust the modified request session among the control center, the distributed file system and each account. The method for checking whether each sharing end meets the operation requirement of the current modification comprises the following steps:
the control center of the Hadoop big data platform sends a one-time bill token and a notification requesting to modify data to each sharing end, wherein the notification requesting to modify the data comprises the content of a modification request;
after receiving the notification, each sharing end checks whether the control end where the block chain is located meets the operation requirement, if so, the sharing end returns the information of approving modification to the control center, otherwise, the sharing end returns the information of refusing modification to the control center; and if the sharing ends all return the information which agrees to be modified, judging that the sharing ends all meet the operation requirement.
S4, the control center sends a modification request and a one-time ticket token of the initiating terminal to the distributed file system hdfs, and simultaneously returns a notification of modification approval to the initiating terminal to allow the initiating terminal to modify the shared data table; the initiating end firstly copies a share of data of the shared data table from the distributed file system hdfs as copy data to be stored in an independent path of a cache region used for modification of the distributed file system hdfs, and then operates the copy data in the cache region through an SQL component or a non-SQL component. And writing the operation information modified this time into a new block, connecting the new block to the tail end of the block chain corresponding to the initiating end to generate a new block chain, and broadcasting the newly generated block chain to each sharing end.
S5, after each sharing end receives the newly generated block chain, checking the authenticity of the operation according to the information recorded by the block chain, and returning the information whether the check is passed; if all the sharing ends return verification passes, executing the step S6; otherwise, step S7 is executed. The method for verifying the authenticity of the operation by each sharing end comprises the following steps:
after receiving the broadcast of modifying the newly generated block chain, each sharing end compares the data information of the modified shared data table with the operation result of the last block in the block chain of the account number to find out the modified part; the data information and the operation result of the shared data table comprise file names, generation time and occupied storage capacity information; and reading the operation information of the current modification recorded in the current block in the newly generated block chain, checking whether the modified part is consistent with the operation information recorded in the current block, if so, returning the information which passes the verification, otherwise, returning the information which does not pass the verification.
Specifically, the verifying the authenticity of the additional operation includes:
finding out the files modified at this time in the shared data table from the distributed file system hdfs, and comparing an operation file (namely, the file requested to be modified in the current modification request) list recorded in the operation content of the block information at this time to determine whether the files are completely contained in the files modified at this time; if yes, the files recorded in the operation content of the current block are all modified, and whether the operation content is consistent or not is continuously verified; otherwise, the operation content of the current block is not modified completely, the verification is stopped, and the information that the verification fails is returned.
Then, for each unmodified file, sequentially comparing whether the file name, the generation time and the occupied storage capacity information of the file are consistent with the file name, the generation time and the occupied storage capacity information recorded in the operation result of the block at the last time; if the verification result is consistent, the unmodified file is not tampered, the verification is continued, otherwise, the unmodified file is tampered, the verification is stopped, and information that the verification cannot pass is returned.
Then, reading each operation file recorded in the operation content of the current block, sequentially finding the data added by each operation file at the current time, and checking whether the account number signature written in the data is consistent with the account number recorded in the current block; if the operation file is inconsistent with the operation file, returning information that the verification is not passed, and if the operation file is consistent with the operation file, continuously checking whether the existing account number of the data written into the data line number range before the current addition is consistent with the information recorded in the operation result of the last block; if the consistency indicates that the existing data of the original file is not modified, continuously checking whether the added content is consistent with the operation content, otherwise, indicating that the existing data of the original file is tampered, stopping checking, and returning the information that the checking fails.
Finally, reading each operation file in sequence, and checking whether the added information is consistent with the operation content and the operation result recorded in the block at the time; if the verification information is consistent with the verification information, returning the verification information; otherwise, returning the information that the verification fails.
The method for verifying the authenticity of the new operation comprises the following steps:
checking whether the file name, the generation time and the occupied storage capacity information of each file recorded in the operation result of the last block are consistent with the file name, the generation time and the occupied storage capacity information recorded in the operation content of the block at this time in sequence, and detecting whether the file range is complete; if the information is consistent and the file range is complete, the original file in the shared data table is not modified, and the operating result is not found to be inconsistent with the operating content, and the verification is continued; otherwise, the original file in the shared data table is tampered, the verification is stopped, and the information that the verification fails is returned.
Checking whether the files which are not recorded in the operation result of the last block are recorded in the operation content of the block, and checking whether the account number signature and the generation time file content in the data files are consistent with the operation source account number and the generation time information recorded in the block in sequence, and whether the operation result corresponding to the operation content is consistent with the information recorded in the operation result of the block; if the operation result is consistent with the operation content, returning the information passing the verification; otherwise, the operation result is not consistent with the operation content, and the information that the verification fails is returned.
The method for verifying the authenticity of the deletion operation comprises the following steps:
when the operation content is 'All', checking whether a file is stored under the path of the shared data table; if the file is not stored, the data of the shared data table is completely deleted, the operation result is in accordance with the operation content, and the information that the verification is passed is returned; otherwise, the operation result is not consistent with the operation content, and the information that the verification fails is returned.
When the operation content is a file list, checking whether a file in the list is stored under the path of the shared data table, if the file in the list is not found, indicating that the file in the list is completely deleted, and if the operation result is not found to be inconsistent with the operation content, continuously checking; otherwise, the operation result is not consistent with the operation content, the verification is stopped, and the information that the verification fails is returned.
Detecting whether the file name, the generation time and the occupied storage capacity information of the file still stored under the path are consistent with the file name, the generation time and the occupied storage capacity recorded in the operation result of the last block or not; if the operation result is consistent with the operation content, returning the information that the verification is passed; otherwise, the file which is not deleted in the shared data table is tampered, and the information which is not verified is returned.
S6, each sharing end updates the block chain to the latest state and stores the block chain; meanwhile, the distributed file system hdfs updates the modified content. The method comprises the following specific steps:
the control center informs all sharing ends of the shared data table T that the check is passed, and all sharing ends of the shared data table T respectively update the block chains to the latest state and store the block chains, and return confirmation information to the control center; the control center informs the account A of completion of modification, and files and recovers the disposable token to complete the modification operation; and meanwhile, the control center issues an instruction for overwriting the written new data, firstly locks the path of the shared data table T in the hdfs of the distributed file system, refuses any application access, then deletes the data file under the path, then moves the copy data of the modified shared data table in the cache region for modification to the path of the shared data table T on the big data file system, and finally unlocks, allows the application access and completes the data overwriting operation. Because the distributed file system hdfshdfs only needs to modify hdfs metadata and does not need real moving data when deleting and moving data file operations, the efficiency is very high (basically completed in milliseconds), and the account number and the application are basically unaware.
S7, restoring the data of the shared data table to a state before modification, and enabling each sharing end to discard a new block chain; and the initiating end discards the block of this time and returns the block chain to the state before modification. The method comprises the following specific steps:
the control center broadcasts operation failure information to each sharing end of the shared data table, initiates a rollback operation to the distributed file system hdfs, deletes the duplicate data of the shared data table in the cache region, and copies a copy of the data of the shared data table from the original path on the hdfs and places the copied data in the path of the cache region, so that the data is restored to a state before modification. After each sharing terminal receives the failure information, discarding a new block chain; the initiating end also discards the current block and returns the block chain to the state before modification.
After the modification of the hive data of the big data platform is limited by the modification method, if the data is tampered, all the data of the sharing ends must be tampered, otherwise, the data cannot be successfully modified, so that the data can be prevented from being tampered privately only by properly setting the number of the sharing ends, and the modification account and the modification content of each data modification are recorded so as to facilitate auditing. For example, when the number of the sharing ends of the shared data table is set to be not less than 20, the data which is tampered privately needs to find out the 20 sharing ends first, and then the data of the 20 sharing ends is modified respectively, which is very difficult and almost impossible to accomplish; for the important data, the number of sharing ends can be further increased. Therefore, authenticity of data modification information can be guaranteed on a large data platform with few accounts.
For facilitating auditing, an auditing data table T which is used for recording modification information and has a plurality of accounts with operation authority can be generated in the distributed file system, the table T has a unique row number 'rowId' field, the type of the row number is a growth sequence with the self-step length of 1, and the growth sequence is used for recording a modified data range; the table T has a unique field 'provider account signature' for recording the account signature of the initiating terminal and the modification time of the unique field 'generation time' for recording data; after the modification is completed, the relevant operation information is also written into the table T. During auditing, the source account and the operation time of each data modification can be found only by using hive to look over the 'rowId' field, the 'account signature providing' field and the 'generation time' field in the shared data table, so that the auditing is convenient.
The invention discloses a block chain-based big data platform hive data modification system, which comprises the following preferred embodiments:
the system comprises an initiating end, a control center and a sharing end which shares corresponding shared data with the initiating end;
the initiating terminal is used for initiating a modification request to the control center, modifying the shared file stored in the big data platform under the condition that the control center allows modification, writing the operation information of the modification into a new block, connecting the new block to the tail end of a block chain corresponding to the initiating terminal to generate a new block chain, and broadcasting the newly generated block chain to the shared terminal for inspection;
the control center is used for verifying whether an initiating terminal has the modification authority of the shared data table, detecting whether the shared terminal meets the operation requirement of the modification after the initiating terminal is verified to have the modification authority, and allowing the initiating terminal to modify the shared file if the shared terminal meets the requirement;
the sharing end is used for receiving the newly generated block chain from the initiating end, carrying out authenticity check according to the newly generated block chain, updating the new block chain to the latest state if the check is passed, and otherwise discarding the new block chain;
and the control center is also used for enabling the initiating end to discard the current block and returning the block chain to a state before modification when the sharing end fails to pass the inspection.
In particular, according to a preferred embodiment of the present invention, the process described above with reference to fig. 1 may be implemented as a computer software program. For example, a third preferred embodiment of the present invention comprises a computer program product comprising a computer program carried on a computer readable medium, the computer program comprising program code for performing the method illustrated in fig. 1. In such a preferred embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the system of the present application.
In the flowchart shown in fig. 1 and the block diagram shown in fig. 2, each block in the flowchart or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
The undescribed parts of the present invention are consistent with the prior art, and are not described herein. The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures made by using the contents of the present specification and the drawings can be directly or indirectly applied to other related technical fields, and are within the scope of the present invention.

Claims (7)

1. A big data platform hive data modification method based on block chains is characterized in that a big data platform corresponds to a control center and a distributed file system, the distributed file system stores data in a form of a shared data table, the big data platform also corresponds to an initiating end used for modifying the shared data table of the distributed file system, a plurality of sharing ends used for verifying modification authenticity and block chains respectively generated by the initiating end and each sharing end, the block chains are used for storing operation information of the shared data table, and the operation information comprises operation content and operation results; the big data modification method comprises the following steps:
s1, an initiating terminal sends a modification request for a shared data table to a control center; the content of the modification request comprises an account signature, an operation type and operation content which are required to be modified, and the modified operation type comprises an adding operation, an adding operation and a deleting operation; wherein the appending operation is used for appending the content at the end of the file of the shared data table; the new adding operation is used for newly adding files in the shared data table; the deleting operation is used for deleting part or all of the files in the shared data table, when part of the files are deleted, the operation content is a file list, and the files in the list are files requiring deletion;
s2, the control center verifies whether the initiating end has the modification authority of the shared data table;
s3, the control center checks whether each sharing end meets the operation requirement of the current modification; if the requirement is met, executing the step S4; otherwise, the control center refuses the current modification request;
s4, the initiating terminal modifies the data of the shared data table, writes the modified operation information into a new block, connects the new block to the tail end of the block chain corresponding to the initiating terminal to generate a new block chain, and broadcasts the newly generated block chain to each shared terminal; the method for modifying the data of the shared data table by the initiating terminal comprises the following steps:
the control center sends the modification request of the initiating terminal to the distributed file system, and simultaneously returns a notification of approving modification to the initiating terminal to allow modification operation on the shared data table; the initiating end firstly copies a share of data of the shared data table from the distributed file system as copy data to be stored in an independent path of a cache region used for modification of the distributed file system, and then operates the copy data in the cache region;
s5, after each sharing end receives the newly generated block chain, checking the authenticity of the operation according to the information recorded by the block chain, and returning the information whether the check is passed; if all the sharing ends return verification passes, executing the step S6; otherwise, executing step S7;
wherein, verifying the authenticity of the appending operation comprises:
finding out the files modified this time in the shared data table from the distributed file system, and comparing an operation file list recorded in the operation content of the block information this time to determine whether the files are completely contained in the files modified this time; if yes, continuing to check; otherwise, stopping checking and returning the information that the checking fails;
for each unmodified file, sequentially comparing whether the file name, the generation time and the occupied storage capacity information of the file are consistent with the file name, the generation time and the occupied storage capacity information recorded in the operation result of the last block; if the verification is consistent, continuing to carry out verification, otherwise, stopping verification and returning information which does not pass the verification;
reading each operation file recorded in the operation content of the current block, sequentially finding the data of each operation file added at the current time, and checking whether the account number signature written in the data is consistent with the account number recorded in the current block; if the operation file is inconsistent with the operation file, returning information that the verification is not passed, and if the operation file is consistent with the operation file, continuously checking whether the existing account number of the data written into the data line number range before the current addition of the operation file is consistent with the information recorded in the operation result of the last block; if the verification is consistent, continuing the verification, otherwise, stopping the verification and returning the information which fails to pass the verification;
sequentially reading each operation file, and checking whether the added information is consistent with the operation content and the operation result recorded in the block at the time; if the verification information is consistent with the verification information, returning the verification information; otherwise, returning the information that the verification fails;
the method for verifying the authenticity of the new operation comprises the following steps:
checking whether the file name, the generation time and the occupied storage capacity information of each file recorded in the operation result of the last block are consistent with the file name, the generation time and the occupied storage capacity information recorded in the operation content of the block at this time in sequence, and detecting whether the file range is complete; if the information is consistent and the file range is complete, continuously checking; otherwise, stopping checking and returning the information that the checking fails;
checking whether a file which is not recorded in the operation result of the last block is recorded in the operation content of the block, and sequentially checking whether the account number signature and the generation time file content in the data file are consistent with the operation source account number and the generation time information recorded in the block, and whether the operation result corresponding to the operation content is consistent with the information recorded in the operation result of the block; if the verification information is consistent with the verification information, returning the verification information; otherwise, returning the information that the verification fails;
the method for verifying the authenticity of the deletion operation comprises the following steps:
when all files are deleted, whether the files are stored under the path of the shared data table is checked; if the file is not stored, the data of the shared data table is completely deleted, the operation result is in accordance with the operation content, and the information that the verification is passed is returned; otherwise, the operation result is not consistent with the operation content, and the information that the verification fails is returned;
when part of files are deleted and the operation content is a file list, checking whether the files in the list are stored under the path of the shared data table, and if the files in the list are not found, continuing to check; otherwise, stopping checking and returning the information that the checking fails;
detecting whether the file name, the generation time and the occupied storage capacity information of the file still stored under the path are consistent with the file name, the generation time and the occupied storage capacity recorded in the operation result of the last block or not; if the verification information is consistent with the verification information, returning the verification information; otherwise, returning the information that the verification fails;
s6, each sharing end updates the block chain to the latest state; meanwhile, the distributed file system updates the modified content; the method for updating the modified content by the distributed file system comprises the following steps:
the control center issues an instruction for covering and writing in new data, firstly, the path of a shared data table to be modified in the distributed file system is locked, any application access is refused, then, a data file under the path is deleted, the copy data of the modified shared data table in the cache area for modification is moved to the path of the shared data table on the big data file system, and finally, the locking is released, the application access is allowed, and the data covering operation is completed;
s7, restoring the data of the shared data table to a state before modification, and enabling each sharing end to discard a new block chain; enabling the initiating end to discard the current block and returning the block chain to the state before modification; the method for restoring the data of the shared data table to the state before modification comprises the following steps:
the control center broadcasts operation failure information to each sharing end of the shared data table, initiates rollback operation to the distributed file system, deletes duplicate data of the shared data table in the cache region, and copies a copy of data of the shared data table from the original path of the distributed file system again to be placed in the path of the cache region, so that the data is restored to a state before modification.
2. The big data platform hive data modification method based on the block chain as claimed in claim 1, wherein the distributed file system has a plurality of physical nodes and a plurality of account numbers for operating the distributed file system, each account number has at least one control end, the control end is located on a physical node, and the initiating end and the sharing end are control ends of account numbers; when an account initiates a modification request for a shared data table to a control center, the control center takes the account as an operation account for the modification, takes a control end of the operation account as an initiating end of the modification, takes other accounts sharing the shared data table as shared accounts for the modification, and takes the control end of the shared accounts as a sharing end of the modification; the block chains are in one-to-one correspondence with the account numbers, and the block chains are stored in the control ends of the corresponding account numbers.
3. The method for modifying big data platform hive data based on a block chain as claimed in claim 2, wherein in the step S3, the method for checking whether each sharing end meets the operation requirement of the current modification includes:
a control center of the Hadoop big data platform sends a notification requesting for data modification to each sharing end, wherein the notification requesting for data modification comprises the content of a modification request;
after receiving the notification, each sharing end checks whether the control end where the block chain is located meets the operation requirement, if the control end meets the operation requirement, the sharing end returns information of agreeing to modification to the control center, otherwise, the sharing end returns information of refusing modification to the control center; and if the sharing ends all return the information which agrees to be modified, judging that the sharing ends all meet the operation requirement.
4. The big data platform hive data modification method based on the block chain as claimed in claim 1, wherein the operation information further includes an operation source account, a record path, an operation type, and an operation completion time.
5. The big data platform hive data modification method based on the block chain as claimed in claim 1, wherein in step S5, the method for verifying the authenticity of the current operation by each sharing end is as follows:
after receiving the broadcast of the newly generated block chain modified this time, each sharing end compares the data information of the modified shared data table with the operation result of the last block in the block chain of the account number to find out the modified part; the data information of the shared data table and the operation result both comprise file names, generation time and occupied storage capacity information; and reading the operation information of the current modification recorded in the current block in the newly generated block chain, checking whether the modified part is consistent with the operation information recorded in the current block, if so, returning the information that the verification is passed, and otherwise, returning the information that the verification is not passed.
6. The big data platform hive data modification method based on block chains according to claim 2, wherein in step S3, the control center further generates a one-time ticket token, and the one-time ticket token is used to identify and mutually trust the request session of this modification among the control center, the distributed file system, the initiating end, and the sharing ends.
7. The big data platform hive data modification method based on the block chain as claimed in claim 1, wherein the distributed file system generates an audit data table for recording modification information of the data table, and the audit data table corresponds to a unique row number field "rowId", a unique field "provider account signature" and a unique field "generation time"; the "rowId" field is a growth sequence with a self-step size of 1, used to record the modified data range; the 'provider account signature' field is used for recording the account signature of an initiator, and the 'generation time' is used for recording the modification time of data; after the step S7 is executed, relevant operation information is written into an audit data table.
CN202110497644.6A 2021-05-08 2021-05-08 Big data platform hive data modification method and system based on block chain Active CN113094753B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110497644.6A CN113094753B (en) 2021-05-08 2021-05-08 Big data platform hive data modification method and system based on block chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110497644.6A CN113094753B (en) 2021-05-08 2021-05-08 Big data platform hive data modification method and system based on block chain

Publications (2)

Publication Number Publication Date
CN113094753A CN113094753A (en) 2021-07-09
CN113094753B true CN113094753B (en) 2023-02-24

Family

ID=76681743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110497644.6A Active CN113094753B (en) 2021-05-08 2021-05-08 Big data platform hive data modification method and system based on block chain

Country Status (1)

Country Link
CN (1) CN113094753B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113835931B (en) * 2021-10-11 2022-08-26 长春嘉诚信息技术股份有限公司 Data modification discovery method applied to block chain

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391291A (en) * 2017-03-24 2017-11-24 北京瑞卓喜投科技发展有限公司 Modification block chain is the block chain corrigenda method and system for having block volume data
CN108446407A (en) * 2018-04-12 2018-08-24 北京百度网讯科技有限公司 Database audit method based on block chain and device
CN108599963A (en) * 2018-05-11 2018-09-28 招商局重庆交通科研设计院有限公司 A kind of detection data based on block chain technology is traced to the source verification method
CN109145275A (en) * 2018-08-07 2019-01-04 广东工业大学 A kind of block chain electronic contract management and intelligent generating system and method
CN110417781A (en) * 2019-07-30 2019-11-05 中国工商银行股份有限公司 File encryption management method, client and server based on block chain
CN111429191A (en) * 2018-12-24 2020-07-17 航天信息股份有限公司 Block chain-based electronic invoice flow management method, device and system
CN112256799A (en) * 2020-11-12 2021-01-22 腾讯科技(深圳)有限公司 Data processing method and device based on block chain, server and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107040585B (en) * 2017-02-22 2020-06-19 创新先进技术有限公司 Service checking method and device
CN106951185B (en) * 2017-03-01 2019-12-06 武汉爱宁智慧科技有限公司 health detection data management system and method based on block chain technology
CN107133532A (en) * 2017-05-31 2017-09-05 无锡井通网络科技有限公司 A kind of block chain logistics based on NFC is traced to the source tracking method for anti-counterfeit
CN107493340B (en) * 2017-08-23 2020-02-11 广州市易彩乐网络科技有限公司 Data distribution verification method, device and system in block chain network
CN111061769B (en) * 2019-12-24 2021-09-10 腾讯科技(深圳)有限公司 Consensus method of block chain system and related equipment
CN111431707B (en) * 2020-03-19 2021-03-26 腾讯科技(深圳)有限公司 Service data information processing method, device, equipment and readable storage medium
CN111683101B (en) * 2020-06-16 2021-01-22 铭数科技(青岛)有限公司 Autonomous cross-domain access control method based on block chain
CN111915308A (en) * 2020-07-04 2020-11-10 中信银行股份有限公司 Transaction processing method of blockchain network and blockchain network
CN112084486A (en) * 2020-09-08 2020-12-15 中国平安财产保险股份有限公司 User information verification method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391291A (en) * 2017-03-24 2017-11-24 北京瑞卓喜投科技发展有限公司 Modification block chain is the block chain corrigenda method and system for having block volume data
CN108446407A (en) * 2018-04-12 2018-08-24 北京百度网讯科技有限公司 Database audit method based on block chain and device
CN108599963A (en) * 2018-05-11 2018-09-28 招商局重庆交通科研设计院有限公司 A kind of detection data based on block chain technology is traced to the source verification method
CN109145275A (en) * 2018-08-07 2019-01-04 广东工业大学 A kind of block chain electronic contract management and intelligent generating system and method
CN111429191A (en) * 2018-12-24 2020-07-17 航天信息股份有限公司 Block chain-based electronic invoice flow management method, device and system
CN110417781A (en) * 2019-07-30 2019-11-05 中国工商银行股份有限公司 File encryption management method, client and server based on block chain
CN112256799A (en) * 2020-11-12 2021-01-22 腾讯科技(深圳)有限公司 Data processing method and device based on block chain, server and storage medium

Also Published As

Publication number Publication date
CN113094753A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
US11075757B2 (en) Shielded interoperability of distributed ledgers
JP6980769B2 (en) Methods, equipment and computer programs for using distributed ledgers for data processing
CN108446407B (en) Database auditing method and device based on block chain
US7979441B2 (en) Method of creating hierarchical indices for a distributed object system
CN102629247B (en) Method, device and system for data processing
CN111177252B (en) Service data processing method and device
EP3709568A1 (en) Deleting user data from a blockchain
US10013312B2 (en) Method and system for a safe archiving of data
CN111506592B (en) Database upgrading method and device
JP2020126409A (en) Data managing system and data managing method
US11853581B2 (en) Restoring a storage system using file relocation metadata
US11822806B2 (en) Using a secondary storage system to implement a hierarchical storage management plan
CN111033489A (en) Method and apparatus for data traversal
Antonopoulos et al. Sql ledger: Cryptographically verifiable data in azure sql database
CN113094753B (en) Big data platform hive data modification method and system based on block chain
CN113094754B (en) Big data platform data modification system and modification, response, cache and verification method
CN116070294B (en) Authority management method, system, device, server and storage medium
CN116361292A (en) Cross-chain resource mapping and management method and system
US10657139B2 (en) Information processing apparatus and non-transitory computer readable medium for distributed resource management
CN111400279B (en) Data operation method, device and computer readable storage medium
CN112181921A (en) Data processing method and device
CN112989404A (en) Log management method based on block chain and related equipment
US11349916B2 (en) Learning client preferences to optimize event-based synchronization
CN115421877A (en) Service processing method, device, computer equipment and storage medium
CN115081031A (en) Tamper-proof block chain data storage method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant