Batch real-time protection data verification method
Technical Field
The invention relates to a batch real-time protection data verification method, and belongs to the technical field of data verification.
Background
For important information systems, there is rpo need to secure user data, so real-time protection of data is possible. To ensure data correctness, the data needs to be checked. In the timing data protection, the common methods for data verification include direct comparison, information summary comparison and application verification.
Direct comparison: the source data is compared with the protection data one by one.
And (3) information abstract comparison: the information digests (md5, sha1) are calculated for the source data and the protection data, respectively, and then compared.
Application verification: and transmitting the protection data to an application system, and finishing the verification of the protection data by the application system.
Direct comparison requires that both the source data and the protection data be read once and compared. The information abstract is generated after the source data and the protection data are read, and the information abstract is compared.
The direct comparison and the information summary comparison at least need to read the source data once, and when the source data is large, a long time is needed.
The application verification depends on the external functions of the application, and the universality is not strong.
The three verification modes are suitable for the timing data protection mode, the source data cannot change in the primary verification process, and the content of the read source data and the read protection data is consistent at any time and is static data. However, for real-time protection, the source data is constantly changing, the stored real-time protection data is also constantly changing, the source data and the real-time protection data at the same position are read at different times, the contents may be different, and the data is dynamic data, so a new verification scheme is needed to perform correctness verification on the dynamic data.
The degree of informatization in enterprises and institutions is increasing, and more information systems exist in each institution. For important information systems, data needs to be protected in real time, and data loss is reduced as much as possible. After the information system fails, the information system can be rebuilt by recovering the real-time protection data.
The real-time protection data may be incomplete or erroneous due to program failure, network fluctuation, hardware damage, and the like. Incomplete or erroneous real-time protection data cannot be recovered after a failure of the information system, or partial data is not available after recovery. Therefore, it is desirable to verify real-time protection data to ensure that the data is recoverable and consistent with the information system.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a batch real-time protection data verification method, wherein when an information system is used, data can be changed continuously, and the changed data is captured and stored.
In order to achieve the purpose, the invention is realized by the following technical scheme:
the invention discloses a batch real-time protection data verification method, which comprises the following steps:
calculating the number of primary check block samples according to the size of user source data, a check period and user bandwidth, and performing layered sampling according to the data change heat to generate a primary data block number information table to be checked;
reading source data content of a corresponding block number from a source information system according to the data block number information table, and generating a source data abstract;
searching corresponding backup data from the real-time protection data, sequentially searching from the latest data time point to the front when searching, finding block data of a time point and generating a real-time data abstract, and comparing the consistency of the source data abstract and the real-time data abstract; and finishing one search until the block data of the consistent time point is found or all the time points are found.
And when the data block in the data block number information table is completely verified, finishing one-time data verification.
The database of the data block number information table needs to be checked one block by one block, and the source data block A has changes during real-time protection, so that data blocks A1, A2, A3, A4 and A5 are generated;
when a source data block A is verified, reading block data content in source data to form a source data abstract; reading the content of the source data block A from the real-time protection data to form a data abstract, and comparing the data abstract with the source data abstract;
reading the latest time point of a source data block A in the real-time protection data, when the time summary of A5 is inconsistent with the source data summary, sequentially searching the contents of A4, A3, A2 and A1, and calculating and comparing the information summaries of the real-time protection data; if a block of data is hit, the check is passed; otherwise, checking the abnormal condition, and protecting the abnormal condition in real time.
The batch real-time protection data verification comprises a plurality of batches of data verification to form a verification period.
In a verification period, the source data is verified at least once, and the data with high change heat is verified for multiple times, so that the data accuracy is ensured, and data errors are found in time.
When a verification period is verified for 2 times, the data block ADEF is verified for the first time, and corresponding data blocks A4, D1, E2 and F1 are found from the real-time protection data;
checking the data block ABCE for the second time, and finding corresponding data blocks A5, B1, C1 and E3 from the real-time protection data;
after two data verifications, all data blocks ABCDEF are verified at least once, and a block AE with high change heat is verified 2 times, so that the correctness of the real-time data is effectively verified.
The invention supports the verification of dynamic data, the verification is carried out in batches, a single verification window is reduced, and the influence on an information system is reduced; the data with high change heat degree is checked for many times, and the data can be found as early as possible after errors occur.
Drawings
FIG. 1 is a flowchart illustrating a method for batch real-time protection data verification according to the present invention;
FIG. 2 is a flowchart of a work process for verifying specified data;
fig. 3 is a flowchart of the operation of a verification cycle.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
Referring to fig. 1, a batch real-time protection data verification method of the present invention includes the following steps:
generating a data block number information table to be checked for one time according to the size of user source data, the data change heat and the checking period;
reading source data content of a corresponding block number from an information system according to the data block number information table, and generating a source data abstract;
searching backup data of a corresponding block from the real-time protection data, sequentially searching forward from the latest data time point during searching, finding block data of a time point and generating a real-time data abstract, and comparing the consistency of the source data abstract and the real-time data abstract; and finishing one search until the block data of the consistent time point is found or all the time points are found.
And when the data block in the data block number information table is completely verified, finishing one-time data verification.
The batch real-time protection data verification comprises a plurality of batches of data verification to form a verification period.
In a verification period, the source data is verified at least once, and the data with high change heat is verified for multiple times, so that the data accuracy is ensured, and data errors are found in time.
The source data and the real-time protection data which need to be verified are constantly changed, and the source data and the real-time protection data which need to be verified are dynamic data.
Referring to fig. 2: the source data block A, which has changed during real-time protection, generates A1, A2, A3, A4, A5 data blocks.
And when the A data block is verified, reading the block data content in the source data to form a source data abstract. The content of the data block a needs to be read from the real-time protection data to form a data summary, and the data summary is compared with the source data summary.
Due to the time difference between the reading source data and the real-time protection data, the latest time point (a 5 in the figure) of the a data block in the reading real-time protection data may not be of the source data a, so when the time summary of the a5 is inconsistent with the summary of the source data, the contents of the a4, the A3, the a2 and the a1 need to be searched in sequence, and the summaries of the real-time protection data information are calculated and compared. If a block of data is hit, the check is passed; otherwise, checking the abnormal condition, and protecting the abnormal condition in real time.
Referring to fig. 3: take 2 checks in one check cycle as an example.
Checking the data block ADEF for the first time, and finding corresponding data blocks A4, D1, E2 and F1 from the real-time protection data;
checking the data block ABCE for the second time, and finding corresponding data blocks A5, B1, C1 and E3 from the real-time protection data;
after two data verifications, all data blocks ABCDEF are verified at least once, and a block AE with high change heat is verified 2 times, so that the correctness of the real-time data is effectively verified.
The principle of the scheme is that data read from source data at a certain point in time must be found in real-time protected data. Therefore, data can be read from the source data, and then the data block corresponding to the time point is searched in the real-time protection data, and data comparison is performed to complete verification.
The volume real-time backup divides a source volume into blocks, monitors and captures IO changes of each data block, and stores IO streams. After the source data volume is abnormal, the information system can be reconstructed by using the stored IO stream. To ensure data correctness, the data needs to be checked.
In this embodiment, the verification method of the present invention is used, and the verification is performed in a morning of 02:00-02:30 every day, with a verification period of 1 month being configured. The checking module is started at 02:00 every day, 60% of data blocks are preferentially selected from the hot data blocks in the current day, 30% of data blocks are selected from the hot data blocks in the current month, and the rest 10% of data blocks are selected from unchanged data blocks to form a data block number information table to be checked at this time. And sequentially reading the contents of the corresponding data blocks in the source volume according to the data block number information table to be verified, and generating an information abstract. And searching the content of the corresponding data block in the stored IO stream, and generating an information summary for comparison when data of a time point is found. If the content information abstract of a certain time point data block in the IO stream is consistent with the source volume, the block passes the verification; if the user can not find the target object, the verification fails, and the user is warned. And finishing the verification if all the data blocks in the data block number information table to be verified are verified. In a one-month check period, 30 checks are started in total, all data blocks in the source volume are ensured to be checked at least once, and the correctness of real-time protection data of the volume is ensured.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.