CN112860741B - Data sampling detection method, device, equipment and storage medium - Google Patents

Data sampling detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN112860741B
CN112860741B CN202110064517.7A CN202110064517A CN112860741B CN 112860741 B CN112860741 B CN 112860741B CN 202110064517 A CN202110064517 A CN 202110064517A CN 112860741 B CN112860741 B CN 112860741B
Authority
CN
China
Prior art keywords
sampling
data
rule
server
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110064517.7A
Other languages
Chinese (zh)
Other versions
CN112860741A (en
Inventor
罗国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110064517.7A priority Critical patent/CN112860741B/en
Priority to PCT/CN2021/083590 priority patent/WO2022151590A1/en
Publication of CN112860741A publication Critical patent/CN112860741A/en
Application granted granted Critical
Publication of CN112860741B publication Critical patent/CN112860741B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/45Structures or tools for the administration of authentication
    • G06F21/46Structures or tools for the administration of authentication by designing passwords or checking the strength of passwords
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention relates to the technical field of cloud, and discloses a data sampling detection method, a device, equipment and a storage medium, which are used for uniformly carrying out intelligent identification and calculation on irrelevant services in original data migration, so that the sampling detection efficiency is improved, and the deployment cost is reduced. The sampling detection method of the data comprises the following steps: account establishment is carried out, and a secret key pair information and a target sampling rule are generated; acquiring initial data and generating sampling data; carrying out checksum calculation on the storage sampling data to generate a target storage verification result; carrying out checksum calculation on the host sampled data to generate a target host verification result; carrying out checksum calculation on the sampled data of the database to generate a target database check result; and merging the target storage check result, the target host check result and the target database check result, and outputting a final sampling detection result. In addition, the invention also relates to a block chain technology, and the final sampling detection result can be stored in the block chain.

Description

Data sampling detection method, device, equipment and storage medium
Technical Field
The present invention relates to the field of parallel computing, and in particular, to a method, an apparatus, a device, and a storage medium for sampling and detecting data.
Background
Sampling is to select a part of samples from an integral sample by a certain method, the sampling is one of basic steps of data processing, and is also an economic and effective working and research method commonly adopted by scientific experiments, quality inspection and social investigation.
With the popularization of cloud computing, data migration and account checking tend to be normalized in the aspects of databases, storage, hosts and the like, for cloud computing, the data volume is large, unification in data processing is difficult to form, and a data sampling detection model is relatively independent, so that the efficiency of data sampling detection is low, and the deployment cost is high.
Disclosure of Invention
The invention provides a data sampling detection method, a device, equipment and a storage medium, which are used for uniformly and intelligently identifying and calculating different logic and irrelevant services in original data migration and account checking by extracting data information of a host, a storage and a database, thereby improving the sampling detection efficiency and reducing the deployment cost.
The first aspect of the present invention provides a method for sampling and detecting data, including: receiving an account creating instruction sent by a user terminal, creating an account according to the account creating instruction, generating key pair information, and configuring sampling rules to obtain target sampling rules, wherein the target sampling rules comprise client rules and server rules; acquiring initial data, generating intermediate data according to the secret key pair information and the client rule, verifying the intermediate data to obtain basic verification data, classifying the basic verification data based on the server rule to generate sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data; sending a first instruction to a storage sampling calculation server, wherein the first instruction is used for instructing the storage sampling calculation server to carry out checksum calculation on the storage sampling data according to the server-side rule and generate a target storage check result; sending a second instruction to a host computer sampling calculation server, wherein the second instruction is used for instructing the host computer sampling calculation server to carry out checksum calculation on the host computer sampling data according to the server-side rule and generate a target host computer check result; sending a third instruction to a database sampling calculation server, wherein the third instruction is used for instructing the database sampling calculation server to check and calculate the database sampling data according to the server-side rule, and generating a target database check result; and merging the target storage verification result, the target host verification result and the target database verification result, and outputting a final sampling detection result.
Optionally, in a first implementation manner of the first aspect of the present invention, the receiving an account creating instruction sent by a user terminal, creating an account according to the account creating instruction, generating key pair information, and configuring a sampling rule to obtain a target sampling rule, where the target sampling rule includes a client rule and a server rule, and includes: receiving an account creating instruction sent by a user terminal, creating an account according to a preset account creating flow, and generating key pair information, wherein the key pair information comprises public key information and private key information; configuring the sampling amount and the sampling proportion in a preset rule template to obtain a client rule, wherein the client rule is a rule for transmitting data to a sampling calculation server; configuring sampling data types and reconciliation information in a preset rule template to obtain a server-side rule, wherein the server-side rule is a rule between a sampling platform server and a sampling calculation server; and combining the client rule and the server rule to generate a target sampling rule.
Optionally, in a second implementation manner of the first aspect of the present invention, the obtaining initial data, generating intermediate data according to the secret key pair information and the client rule, verifying the intermediate data to obtain basic verification data, and classifying the basic verification data based on the server rule to generate sample data, where the sample data includes storage sample data, host sample data, and database sample data includes: acquiring initial data, loading a client sub-rule matched with a preset sampling service address and the secret key pair information to obtain a target client sub-rule, screening the initial data according to the target client sub-rule to generate intermediate data, wherein the client rule comprises a plurality of client sub-rules; authority and rule verification is carried out on the legality of the intermediate data, whether the intermediate data meet a preset authority verification standard or not is judged, and if yes, basic verification data are generated; and judging whether the basic check data is matched with the server side rule or not, and classifying the basic check data according to a preset sampling service type to obtain sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data.
Optionally, in a third implementation manner of the first aspect of the present invention, the obtaining initial data, loading a client sub-rule matched with a preset sampling service address and the key pair information to obtain a target client sub-rule, and screening the initial data according to the target client sub-rule to generate intermediate data, where the client rule includes a plurality of client sub-rules including: acquiring initial data, a preset sampling service address and the key pair information, and loading a client sub-rule matched with the preset sampling service address and the key pair information from the client rule to obtain a target client sub-rule, wherein the client rule comprises a plurality of client sub-rules; and screening the initial data according to the target client sub-rule, preprocessing the initial data through a principal component analysis algorithm, deleting redundant data, retaining data conforming to the target client sub-rule, and generating intermediate data.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the sending a first instruction to a storage sample calculation server, where the first instruction is used to instruct the storage sample calculation server to perform checksum calculation on the storage sample data according to the server-side rule, and generating a target storage verification result includes: sending a first instruction to a storage sampling calculation server, comparing file information of the storage sampling data according to the server-side rule, wherein the file information comprises a file name, a file size, a file format and final modification time, calculating a final unique value of the file information according to a preset information summary algorithm, and generating first target storage data; carrying out file distinguishing on the storage sampling data according to the size of occupied memory to obtain memory classification data, and respectively carrying out unique value calculation on the memory classification data according to a preset information abstract algorithm and different classes to obtain second target storage data; and merging the first target storage data and the second target storage data to generate a target storage verification result.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the sending a second instruction to a host sample computation server, where the second instruction is used to instruct the host sample computation server to perform checksum computation on the host sample data according to the server-side rule, and generating a target host verification result includes: sending a second instruction to a host sampling calculation server, when the server side rule is set to verify the information of the operating system, calculating the unique value of the information of the operating system according to a preset information abstract algorithm, and randomly calculating and verifying other information except the information of the operating system to generate first target host data; when the server side rule is set to check all information, performing final unique value calculation on all field information according to a preset information abstract algorithm to generate second target host data; and merging the first target host data and the second target host data to generate a target host verification result.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the sending a third instruction to a database sample computation server, where the third instruction is used to instruct the database sample computation server to perform checksum computation on the database sample data according to the server-side rule, and generating a target database verification result includes: sending a third instruction to a database sampling calculation server, verifying the consistency of preset database basic information and table structure information according to the server-side rule, verifying whether the database basic information and the table structure information are matched with preset standard information or not, and obtaining a consistency result, wherein the database basic information comprises database users, a storage process and a trigger, and the table structure information comprises field names, field lengths and field types; performing single record verification according to the consistency result and the database sampling data, randomly selecting a single record field, and calculating a unique value of the single record field according to a preset information abstract algorithm to obtain single record verification data; performing batch record verification according to the consistency result and the database sampling data, verifying all field information according to a preset information abstract algorithm, and calculating final unique values of all field information to obtain batch record verification data; and merging the single record verification data and the batch record verification data to generate a target database verification result.
A second aspect of the present invention provides a data sampling detection apparatus, including: the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving an account creating instruction sent by a user terminal, creating an account according to the account creating instruction, generating key pair information, and configuring sampling rules to obtain target sampling rules, and the target sampling rules comprise client rules and server rules; the acquisition module is used for acquiring initial data, generating intermediate data according to the secret key pair information and the client rule, verifying the intermediate data to obtain basic verification data, classifying the basic verification data based on the server rule and generating sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data; the first checking module is used for sending a first instruction to a storage sampling calculation server, wherein the first instruction is used for instructing the storage sampling calculation server to check and calculate the storage sampling data according to the server-side rule to generate a target storage checking result; the second checking module is used for sending a second instruction to the host sampling calculation server, and the second instruction is used for instructing the host sampling calculation server to check and calculate the host sampling data according to the server-side rule to generate a target host checking result; the third checking module is used for sending a third instruction to the database sampling calculation server, wherein the third instruction is used for instructing the database sampling calculation server to check and calculate the database sampling data according to the server-side rule to generate a target database checking result; and the output module is used for merging the target storage verification result, the target host verification result and the target database verification result and outputting a final sampling detection result.
Optionally, in a first implementation manner of the second aspect of the present invention, the receiving module includes: the system comprises a receiving unit, a processing unit and a processing unit, wherein the receiving unit is used for receiving an account creating instruction sent by a user terminal, creating an account according to a preset account creating flow and generating key pair information, and the key pair information comprises public key information and private key information; the system comprises a first configuration unit, a second configuration unit and a third configuration unit, wherein the first configuration unit is used for configuring the sampling amount and the sampling proportion in a preset rule template to obtain a client rule, and the client rule is a rule for transmitting data to a sampling calculation server; the second configuration unit is used for configuring the sampling data type and the reconciliation information in a preset rule template to obtain a server-side rule, wherein the server-side rule is a rule between a sampling platform server and a sampling calculation server; and the first merging unit is used for merging the client rule and the server rule to generate a target sampling rule.
Optionally, in a second implementation manner of the second aspect of the present invention, the obtaining module includes: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring initial data, loading a client sub-rule matched with a preset sampling service address and the key pair information to obtain a target client sub-rule, screening the initial data according to the target client sub-rule to generate intermediate data, and the client rule comprises a plurality of client sub-rules; the judging unit is used for carrying out authority and rule verification on the legality of the intermediate data, judging whether the intermediate data meets a preset authority verification standard or not, and if yes, generating basic verification data; and the classification unit is used for judging whether the basic check data is matched with the server side rule or not, classifying the basic check data according to a preset sampling service type to obtain sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data.
Optionally, in a third implementation manner of the second aspect of the present invention, the obtaining unit is specifically configured to: acquiring initial data, a preset sampling service address and the secret key pair information, and loading a client sub-rule matched with the preset sampling service address and the secret key pair information from the client rule to obtain a target client sub-rule, wherein the client rule comprises a plurality of client sub-rules; and screening the initial data according to the target client sub-rule, preprocessing the initial data through a principal component analysis algorithm, deleting redundant data, retaining data conforming to the target client sub-rule, and generating intermediate data.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the first checking module includes: the first calculation unit is used for sending a first instruction to a storage sampling calculation server, comparing file information of the storage sampling data according to the server-side rule, wherein the file information comprises a file name, a file size, a file format and a last modification time, calculating a final unique value of the file information according to a preset information summary algorithm, and generating first target storage data; the second calculation unit is used for carrying out file distinguishing on the storage sampling data according to the size of occupied memory to obtain memory classification data, and respectively carrying out unique value calculation on the memory classification data according to a preset information abstract algorithm and different classes to obtain second target storage data; and the second merging unit is used for merging the first target storage data and the second target storage data to generate a target storage verification result.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the second check module includes: the third calculation unit is used for sending a second instruction to the host computer sampling calculation server, when the server-side rule is set to verify the operating system information, performing unique value calculation on the operating system information according to a preset information abstract algorithm, and performing random calculation and verification on other information except the operating system information to generate first target host computer data; the fourth calculation unit is used for performing final unique value calculation on all field information according to a preset information abstract algorithm to generate second target host data when the server side rule is set to check all information; and the third merging unit is used for merging the first target host data and the second target host data to generate a target host verification result.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the third checking module includes: the matching unit is used for sending a third instruction to the database sampling calculation server, verifying the consistency of preset database basic information and table structure information according to the server-side rule, verifying whether the database basic information and the table structure information are matched with preset standard information or not, and obtaining a consistency result, wherein the database basic information comprises database users, a storage process and a trigger, and the table structure information comprises a field name, a field length and a field type; the fifth calculation unit is used for performing single record verification according to the consistency result and the database sampling data, randomly selecting a single record field, and calculating a unique value of the single record field according to a preset information summary algorithm to obtain single record verification data; a sixth calculating unit, configured to perform batch record verification according to the consistency result and the database sampling data, perform verification on all field information according to a preset information digest algorithm, and calculate a final unique value of all field information to obtain batch record verification data; and the fourth merging unit is used for merging the single record verification data and the batch record verification data to generate a target database verification result.
A third aspect of the present invention provides a sampling detection apparatus for data, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the sample detection device of the data to perform the sample detection method of the data described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute the above-described method of sample detection of data.
In the technical scheme provided by the invention, an account creating instruction sent by a user terminal is received, account creating is carried out according to the account creating instruction, secret key pair information is generated, sampling rule configuration is carried out, and a target sampling rule is obtained, wherein the target sampling rule comprises a client rule and a server rule; acquiring initial data, generating intermediate data according to the secret key pair information and the client rule, verifying the intermediate data to obtain basic verification data, classifying the basic verification data based on the server rule to generate sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data; sending a first instruction to a storage sampling calculation server, wherein the first instruction is used for instructing the storage sampling calculation server to carry out checksum calculation on the storage sampling data according to the server-side rule and generate a target storage check result; sending a second instruction to a host computer sampling calculation server, wherein the second instruction is used for instructing the host computer sampling calculation server to carry out checksum calculation on the host computer sampling data according to the server-side rule and generate a target host computer verification result; sending a third instruction to a database sampling calculation server, wherein the third instruction is used for instructing the database sampling calculation server to check and calculate the database sampling data according to the server-side rule, and generating a target database check result; and merging the target storage verification result, the target host verification result and the target database verification result, and outputting a final sampling detection result. In the embodiment of the invention, by extracting the data information of the host, the storage and the database, different logic and irrelevant services in the original data migration and reconciliation are uniformly intelligently identified and calculated, the sampling detection efficiency is improved, and the deployment cost is reduced.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a method for sampling data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a method for sampling data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a sampling detection apparatus for data in an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of a sampling detection apparatus for data according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an embodiment of a sampling detection device for data in an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a data sampling detection method, a data sampling detection device, data sampling detection equipment and a data storage medium, which are used for uniformly carrying out intelligent identification and calculation on different logic and irrelevant services in original data migration and reconciliation by extracting data information of a host, a storage and a database, so that the sampling detection efficiency is improved, and the deployment cost is reduced.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a detailed flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a method for sampling and detecting data in an embodiment of the present invention includes:
101. receiving an account creating instruction sent by a user terminal, creating an account according to the account creating instruction, generating key pair information, and configuring a sampling rule to obtain a target sampling rule, wherein the target sampling rule comprises a client rule and a server rule.
The sampling platform server receives an account creating instruction sent by a user terminal, creates an account according to the account creating instruction, generates key pair information, and configures sampling rules to obtain target sampling rules, wherein the target sampling rules comprise client rules and server rules. Specifically, the sampling platform server receives an account creating instruction sent by a user terminal, creates an account according to a preset account creating flow, and generates key pair information, wherein the key pair information comprises public key information and private key information; the sampling platform server configures the sampling amount and the sampling proportion in a preset rule template to obtain a client rule, wherein the client rule is a rule for transmitting data to a sampling calculation server; the sampling platform server configures the type of the sampling data and the reconciliation information in a preset rule template to obtain a server-side rule, wherein the server-side rule is a rule between the sampling platform server and the sampling calculation server; and the sampling platform server combines the client rule and the server rule to generate a target sampling rule.
After the account creating process is completed, paired public key and private key information can be generated, then a relevant sampling calculation rule is configured, the rule is divided into two parts, one part is a rule for transmitting data to a sampling calculation server and is called a client side rule, such as a sampling amount or a sampling proportion, the other part is a relevant reconciliation and calculation rule on a sampling platform server and the sampling calculation server and is called a server side rule, such as which type of sampling the data is and which information is checked, and after the configuration of the sampling rule is completed, the rule belonging to the server side is issued to the sampling calculation server through the sampling platform server to perform specific sampling calculation.
102. The method comprises the steps of obtaining initial data, generating intermediate data according to secret key pair information and client side rules, verifying the intermediate data to obtain basic verification data, classifying the basic verification data based on server side rules, and generating sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data.
The sampling platform server obtains initial data, generates intermediate data according to the secret key pair information and the client rule, verifies the intermediate data to obtain basic verification data, classifies the basic verification data based on the server rule to generate sampling data, and the sampling data comprises storage sampling data, host sampling data and database sampling data. The sampling platform server loads corresponding client rules to the information according to the configured sampling service address and the secret key, transmits data to be sampled to the sampling platform server according to the client rules, performs authority and rule verification after the sampling platform server receives a client request, ensures the legitimacy of the authority and the data, starts to perform the previously configured server rule verification after completing the basic verification, further splits and confirms which type of sampling service the sampling platform server belongs to, then forwards the request to the corresponding sampling server for sampling calculation, and the sampling calculation server performs verification according to the information forwarded by the sampling platform server and the account checking rules configured by the user. The sampling platform server mainly solves the problem that accounts checking of a database, storage and a host computer during data migration cannot be unified, and establishes a universal data transmission interface by taking the thought of AI data analysis as a reference.
103. And sending a first instruction to the storage sampling calculation server, wherein the first instruction is used for instructing the storage sampling calculation server to carry out checksum calculation on the storage sampling data according to the server-side rule, and generating a target storage verification result.
The sampling platform server sends a first instruction to the storage sampling calculation server, wherein the first instruction is used for instructing the storage sampling calculation server to carry out checksum calculation on the storage sampling data according to the server-side rule, and a target storage checking result is generated. Specifically, the sampling platform server sends a first instruction to the storage sampling calculation server, file information comparison is carried out on the storage sampling data according to a server-side rule, the file information comprises a file name, a file size, a file format and final modification time, a final unique value of the file information is calculated according to a preset information summary algorithm, and first target storage data are generated; the sampling platform server performs file distinguishing on the stored sampling data according to the size of occupied memory to obtain memory classification data, and performs unique value calculation on the memory classification data according to a preset information abstract algorithm and different classes to obtain second target storage data; and the sampling platform server combines the first target storage data and the second target storage data to generate a target storage verification result.
The storage sampling calculation server mainly compares file information, and json string information comprises: the method comprises the following steps of comparing file information and file content, checking the final unique value of the file information aiming at the information needing to be compared, and checking the file content in the logic of sampling and checking account, wherein the calculation in the file usually needs to occupy a memory, for a small file (the system is set within 500 MB), only the information abstract algorithm (MD 5) value needs to be calculated, for a large file, the unique value calculation needs to be performed in a slicing mode, the unique value is MD5(MD5(1) + MD5(2) + MD5(3) … … + MD5(n)) -n), and the calculation sampling result is considered to be normal when the check results calculated by the two unique values of the file information and the file content are the same, otherwise, the result is no pass.
104. And sending a second instruction to the host computer sampling calculation server, wherein the second instruction is used for instructing the host computer sampling calculation server to carry out checksum calculation on the host computer sampling data according to the server-side rule, and generating a target host computer verification result.
And the sampling platform server sends a second instruction to the host sampling calculation server, wherein the second instruction is used for instructing the host sampling calculation server to carry out checksum calculation on the host sampling data according to the server-side rule and generate a target host checking result. Specifically, the sampling platform server sends a second instruction to the host sampling calculation server, when the server side rule is set to verify the operating system information, unique value calculation is carried out on the operating system information according to a preset information abstract algorithm, and other information except the operating system information is randomly calculated and verified to generate first target host data; when the server side rule is set to check all information, the sampling platform server carries out final unique value calculation on all field information according to a preset information abstract algorithm to generate second target host data; and the sampling platform server combines the first target host data and the second target host data to generate a target host verification result.
The host sampling calculation server checks information such as version information, host name, IP, network card and route of an operating system according to a client rule, when the rule is set, necessary check values such as host operating system information and network card information can be set, all information can be set and verified, if the necessary check information of the operating system is set, the operating system information is checked, other information is selectively and randomly calculated and checked, if all information is set and verified, all field information is calculated and verified, no matter how much data is verified, each calculation item generates an MD5 value, all MD5 values are added, and then total MD5 value information is generated, wherein the calculation mode is as follows: and MD5(MD5(1) + MD5(2) + MD5(3) … … + MD5(n)) -n is calculated in the front data and the rear data, whether the final MD5 value information is consistent with the calculation times of n or not is checked, if the MD5 value information is consistent with the calculation times of n, the sampling calculation check is considered to be normal, and if the MD5 value information is not consistent with the calculation times of n, the sampling calculation check is not passed.
105. And sending a third instruction to the database sampling calculation server, wherein the third instruction is used for instructing the database sampling calculation server to carry out checksum calculation on the database sampling data according to the server-side rule, and generating a target database check result.
And the sampling platform server sends a third instruction to the database sampling calculation server, wherein the third instruction is used for instructing the database sampling calculation server to carry out verification and calculation on the database sampling data according to the server-side rule, and a target database verification result is generated. Specifically, the sampling platform server sends a third instruction to the database sampling calculation server, the consistency of preset database basic information and table structure information is verified according to server-side rules, whether the database basic information and the table structure information are matched with preset standard information or not is verified, a consistency result is obtained, the database basic information comprises database users, a storage process and a trigger, and the table structure information comprises field names, field lengths and field types; the sampling platform server performs single record verification according to the consistency result and the database sampling data, randomly selects a single record field, and calculates a unique value of the single record field according to a preset information abstract algorithm to obtain single record verification data; the sampling platform server performs batch record verification according to the consistency result and the database sampling data, verifies all field information according to a preset information abstract algorithm, and calculates the final unique value of all field information to obtain batch record verification data; and the sampling platform server combines the single record verification data and the batch record verification data to generate a target database verification result.
The database information generally comprises table structure information and table record information, such as field names, field lengths, field types and the like, the calculation rule is still performed according to the client rule, only the consistency of the base information of the database before and after the database is checked, such as the check of information of a database user, a storage process, a trigger and the like, then the consistency of the table structure information is calculated, and after the check of the table structure information is completed, specific data check is performed, because the collective data volume has the possibility of hundreds of billions and thousands of billions, the calculation of the check of the single record and the batch record are also performed during the calculation of the check, the single record check is performed to check whether the single record content is consistent, the batch record check is performed without performing the item-by-item check of the content, the calculation is performed according to each content to obtain a final unique value, and the calculation mode is as follows: MD5(MD5(1) + MD5(2) + MD5(3) … … + MD5(n)) -n, n is less than or equal to 10000, the final unique values before and after the verification are calculated in batches each time, if the final unique values are the same, the batch verification is considered to pass, and if the final unique values are different, the batch verification is considered to fail.
106. And merging the target storage check result, the target host check result and the target database check result, and outputting a final sampling detection result.
And the sampling platform server combines the target storage verification result, the target host verification result and the target database verification result and outputs a final sampling detection result. The sampling platform server samples and calculates the sampling model according to the same calculation mode by extracting the data information of the storage, the host and the database, can perform unified intelligent identification and calculation on different logic and irrelevant services in the original data migration and reconciliation, combines the check results and outputs the final identification result.
In the embodiment of the invention, by extracting the data information of the host, the storage and the database, different logic and irrelevant services in the original data migration and reconciliation are uniformly intelligently identified and calculated, the sampling detection efficiency is improved, and the deployment cost is reduced.
Referring to fig. 2, another embodiment of the method for sampling data according to the embodiment of the present invention includes:
201. receiving an account creating instruction sent by a user terminal, creating an account according to the account creating instruction, generating key pair information, and configuring a sampling rule to obtain a target sampling rule, wherein the target sampling rule comprises a client rule and a server rule.
The sampling platform server receives an account creating instruction sent by a user terminal, creates an account according to the account creating instruction, generates key pair information, and configures sampling rules to obtain target sampling rules, wherein the target sampling rules comprise client rules and server rules. Specifically, the sampling platform server receives an account creating instruction sent by a user terminal, creates an account according to a preset account creating flow, and generates key pair information, wherein the key pair information comprises public key information and private key information; the sampling platform server configures the sampling amount and the sampling proportion in a preset rule template to obtain a client rule, wherein the client rule is a rule for transmitting data to the sampling calculation server; the sampling platform server configures the type of the sampling data and the reconciliation information in a preset rule template to obtain a server-side rule, wherein the server-side rule is a rule between the sampling platform server and the sampling calculation server; and the sampling platform server combines the client rule and the server rule to generate a target sampling rule.
After the account creating process is completed, paired public key and private key information can be generated, then a relevant sampling calculation rule is configured, the rule is divided into two parts, one part is a rule for transmitting data to a sampling calculation server and is called a client side rule, such as sampling amount or sampling proportion, the other part is a relevant account checking rule and a calculation rule on a sampling platform server and the sampling calculation server and is called a server side rule, such as which type of sampling the data is and which information is checked, and after the configuration of the sampling rule is completed, the rule belonging to the server side can be issued to the sampling calculation server through the sampling platform server to perform specific sampling calculation.
202. The method comprises the steps of obtaining initial data, loading client-side sub-rules matched with preset sampling service addresses and secret key pair information to obtain target client-side sub-rules, screening the initial data according to the target client-side sub-rules to generate intermediate data, wherein the client-side rules comprise a plurality of client-side sub-rules.
The sampling platform server obtains initial data, loads client-side sub-rules matched with preset sampling service addresses and key pair information to obtain target client-side sub-rules, and screens the initial data according to the target client-side sub-rules to generate intermediate data, wherein the client-side rules comprise a plurality of client-side sub-rules. Specifically, the sampling platform server acquires initial data, a preset sampling service address and key pair information, and loads a client sub-rule matched with the preset sampling service address and key pair information from a client rule to obtain a target client sub-rule, wherein the client rule comprises a plurality of client sub-rules; the sampling platform server screens the initial data according to the target client sub-rule, preprocesses the initial data through a principal component analysis algorithm, deletes redundant data, retains data conforming to the target client sub-rule, and generates intermediate data.
In the embodiment, data screening is realized through a principal component analysis algorithm (PCA), the principal component analysis algorithm is used for extracting principal components by analyzing the principal components in a vector space, and unimportant components are omitted, so that the purpose of dimension reduction and information compression is achieved, data which accord with sub-rules of a client side are reserved through the algorithm, and redundant data are deleted.
203. And carrying out authority and rule verification on the legality of the intermediate data, judging whether the intermediate data meets a preset authority verification standard, and if so, generating basic verification data.
The sampling platform server checks the authority and the rule of the intermediate data, judges whether the intermediate data meet a preset authority check standard or not, generates basic check data if the intermediate data meet the preset authority check standard, and checks the authority and the rule after receiving the request, so that the legality of the authority and the data is ensured, and the basic check is completed.
204. And judging whether the basic check data is matched with the server-side rule or not, and classifying the basic check data according to a preset sampling service type to obtain sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data.
The sampling platform server judges whether the basic check data is matched with the server side rule or not, and classifies the basic check data according to the preset sampling service type to obtain sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data. After receiving the request, the sampling platform server starts to check the previously configured server rules, further splits and confirms which type of sampling service belongs to, then forwards the request to the corresponding sampling server to perform sampling calculation, and the sampling calculation server checks the information forwarded by the sampling platform server and the account checking rules configured by the user.
205. And sending a first instruction to the storage sampling calculation server, wherein the first instruction is used for instructing the storage sampling calculation server to carry out checksum calculation on the storage sampling data according to the server-side rule, and generating a target storage checking result.
The sampling platform server sends a first instruction to the storage sampling calculation server, wherein the first instruction is used for instructing the storage sampling calculation server to carry out checksum calculation on the storage sampling data according to the server-side rule, and a target storage checking result is generated. Specifically, the sampling platform server sends a first instruction to the storage sampling calculation server, compares file information of the storage sampling data according to a server-side rule, wherein the file information comprises a file name, a file size, a file format and final modification time, calculates a final unique value of the file information according to a preset information summary algorithm, and generates first target storage data; the sampling platform server performs file differentiation on the stored sampling data according to the size of occupied memory to obtain memory classification data, and performs unique value calculation on the memory classification data according to a preset information abstract algorithm and different classes to obtain second target storage data; and the sampling platform server combines the first target storage data and the second target storage data to generate a target storage verification result.
The storage sampling calculation server mainly compares file information, and json string information comprises: the method comprises the following steps of comparing file information and file content, checking the final unique value of the file information aiming at the information needing to be compared, and checking the file content in the logic of sampling and checking account, wherein the calculation in the file usually needs to occupy a memory, for a small file (the system is set within 500 MB), only the information abstract algorithm (MD 5) value needs to be calculated, for a large file, the unique value calculation needs to be performed in a slicing mode, the unique value is MD5(MD5(1) + MD5(2) + MD5(3) … … + MD5(n)) -n), and the calculation sampling result is considered to be normal when the check results calculated by the two unique values of the file information and the file content are the same, otherwise, the result is no pass.
206. And sending a second instruction to the host computer sampling calculation server, wherein the second instruction is used for instructing the host computer sampling calculation server to carry out checksum calculation on the host computer sampling data according to the server-side rule, and generating a target host computer verification result.
And the sampling platform server sends a second instruction to the host sampling calculation server, wherein the second instruction is used for instructing the host sampling calculation server to carry out checksum calculation on the host sampling data according to the server-side rule and generate a target host checking result. Specifically, the sampling platform server sends a second instruction to the host sampling calculation server, when the server-side rule is set to verify the operating system information, unique value calculation is carried out on the operating system information according to a preset information summary algorithm, and other information except the operating system information is randomly calculated and verified to generate first target host data; when the server side rule is set to check all information, the sampling platform server calculates the final unique value of all field information according to a preset information abstract algorithm to generate second target host data; and the sampling platform server combines the first target host data and the second target host data to generate a target host verification result.
The host sampling calculation server checks information such as version information, host name, IP, network card and route of an operating system according to a client rule, when the rule is set, necessary check values such as host operating system information and network card information can be set, all information can be set and verified, if the necessary check information of the operating system is set, the operating system information is checked, other information is selectively and randomly calculated and checked, if all information is set and verified, all field information is calculated and verified, no matter how much data is verified, each calculation item generates an MD5 value, all MD5 values are added, and then total MD5 value information is generated, wherein the calculation mode is as follows:
and MD5(MD5(1) + MD5(2) + MD5(3) … … + MD5(n)) -n is calculated in front and back data, whether the final MD5 value information is consistent with the calculation times of n or not is checked, if the values are the same, the sampling calculation check is normal, and if the values are not the same, the sampling calculation check is not passed.
207. And sending a third instruction to the database sampling calculation server, wherein the third instruction is used for instructing the database sampling calculation server to carry out checksum calculation on the database sampling data according to the server-side rule, and generating a target database check result.
And the sampling platform server sends a third instruction to the database sampling calculation server, wherein the third instruction is used for instructing the database sampling calculation server to carry out verification and calculation on the database sampling data according to the server-side rule, and a target database verification result is generated. Specifically, the sampling platform server sends a third instruction to the database sampling calculation server, the consistency of preset database basic information and table structure information is verified according to server-side rules, whether the database basic information and the table structure information are matched with preset standard information or not is verified, a consistency result is obtained, the database basic information comprises database users, a storage process and a trigger, and the table structure information comprises field names, field lengths and field types; the sampling platform server performs single record verification according to the consistency result and the database sampling data, randomly selects a single record field, and calculates a unique value of the single record field according to a preset information abstract algorithm to obtain single record verification data; the sampling platform server performs batch record verification according to the consistency result and the database sampling data, verifies all field information according to a preset information abstract algorithm, and calculates the final unique value of all field information to obtain batch record verification data; and the sampling platform server combines the single record verification data and the batch record verification data to generate a target database verification result.
The database information generally comprises table structure information and table record information such as field names, field lengths, field types and the like, the calculation rule is still carried out according to a client rule, only the consistency of the base information of the database before and after the database is checked, such as the check of database users, storage processes, triggers and the like, then the consistency of the table structure information is calculated, and after the check of the table structure information is completed, specific data check is carried out, because the collective data volume has the possibility of hundreds of billions and hundreds of billions, the check of single record and batch record are also carried out during the calculation check, the check of single record is carried out to check whether the single record content is consistent, the batch record check is not carried out to check the content one by one, calculation is carried out according to each content to obtain a final unique value, and the calculation mode is as follows:
and MD5(MD5(1) + MD5(2) + MD5(3) … … + MD5(n)) -n, n is less than or equal to 10000, the final unique values before and after verification are calculated in batches each time, if the final unique values are the same, the batch verification is considered to pass, and if the final unique values are different, the batch verification is considered to not pass.
208. And merging the target storage check result, the target host check result and the target database check result, and outputting a final sampling detection result.
And the sampling platform server combines the target storage verification result, the target host verification result and the target database verification result and outputs a final sampling detection result. The sampling platform server samples and calculates the sampling model according to the same calculation mode by extracting the data information of the storage, the host and the database, can perform unified intelligent identification and calculation on different logic and irrelevant services in the original data migration and reconciliation, combines the check results and outputs the final identification result.
In the embodiment of the invention, by extracting the data information of the host, the storage and the database, different logic and irrelevant services in the original data migration and reconciliation are uniformly intelligently identified and calculated, the sampling detection efficiency is improved, and the deployment cost is reduced.
In the above description of the method for sampling and detecting data in the embodiment of the present invention, referring to fig. 3, a device for sampling and detecting data in the embodiment of the present invention is described below, where an embodiment of the device for sampling and detecting data in the embodiment of the present invention includes:
the receiving module 301 is configured to receive an account creating instruction sent by a user terminal, create an account according to the account creating instruction, generate key pair information, and configure a sampling rule to obtain a target sampling rule, where the target sampling rule includes a client rule and a server rule;
an obtaining module 302, configured to obtain initial data, generate intermediate data according to a secret key for information and a client rule, check the intermediate data to obtain basic check data, and classify the basic check data based on a server rule to generate sample data, where the sample data includes stored sample data, host sample data, and database sample data;
the first checking module 303 is configured to send a first instruction to the storage sampling calculation server, where the first instruction is used to instruct the storage sampling calculation server to perform checksum calculation on the storage sampling data according to a server-side rule, so as to generate a target storage checking result;
the second checking module 304 is configured to send a second instruction to the host sampling calculation server, where the second instruction is used to instruct the host sampling calculation server to check and calculate the host sampling data according to the server-side rule, and generate a target host checking result;
the third checking module 305 is configured to send a third instruction to the database sampling calculation server, where the third instruction is used to instruct the database sampling calculation server to perform checksum calculation on the database sampling data according to the server-side rule, so as to generate a target database checking result;
and the output module 306 is configured to combine the target storage verification result, the target host verification result and the target database verification result, and output a final sampling detection result.
In the embodiment of the invention, by extracting the data information of the host, the storage and the database, different logic and irrelevant services in the original data migration and reconciliation are uniformly intelligently identified and calculated, the sampling detection efficiency is improved, and the deployment cost is reduced.
Referring to fig. 4, another embodiment of the apparatus for sampling data according to the embodiment of the present invention includes:
the receiving module 301 is configured to receive an account creating instruction sent by a user terminal, create an account according to the account creating instruction, generate key pair information, and configure a sampling rule to obtain a target sampling rule, where the target sampling rule includes a client rule and a server rule;
an obtaining module 302, configured to obtain initial data, generate intermediate data according to a key pair information and a client rule, verify the intermediate data to obtain basic verification data, and classify the basic verification data based on a server rule to generate sampling data, where the sampling data includes storage sampling data, host sampling data, and database sampling data;
the first checking module 303 is configured to send a first instruction to the storage sampling calculation server, where the first instruction is used to instruct the storage sampling calculation server to perform checksum calculation on the storage sampling data according to a server-side rule, so as to generate a target storage checking result;
the second checking module 304 is configured to send a second instruction to the host sampling calculation server, where the second instruction is used to instruct the host sampling calculation server to perform checksum calculation on the host sampling data according to the server-side rule, so as to generate a target host checking result;
the third checking module 305 is configured to send a third instruction to the database sampling calculation server, where the third instruction is used to instruct the database sampling calculation server to perform checksum calculation on the database sampling data according to the server-side rule, so as to generate a target database checking result;
and the output module 306 is configured to combine the target storage verification result, the target host verification result, and the target database verification result, and output a final sampling detection result.
Optionally, the receiving module 301 includes:
the receiving unit 3011 is configured to receive an account creation instruction sent by a user terminal, create an account according to a preset account creation process, and generate key pair information, where the key pair information includes public key information and private key information;
a first configuration unit 3012, configured to configure the sample size and the sample proportion in a preset rule template to obtain a client rule, where the client rule is a rule for transmitting data to a sample computation server;
a second configuration unit 3013, configured to configure the sampling data type and the reconciliation information in the preset rule template to obtain a server-side rule, where the server-side rule is a rule between the sampling platform server and the sampling computation server;
and the first merging unit 3014 is configured to merge the client-side rule and the server-side rule to generate a target sampling rule.
Optionally, the obtaining module 302 includes:
an obtaining unit 3021, configured to obtain initial data, load a client sub-rule that matches a preset sampling service address and key pair information, obtain a target client sub-rule, screen the initial data according to the target client sub-rule, and generate intermediate data, where the client rule includes multiple client sub-rules;
a judging unit 3022, configured to perform authority and rule verification on the validity of the intermediate data, judge whether the intermediate data meets a preset authority verification standard, and if so, generate basic verification data;
a classification unit 3023, configured to determine whether the basic calibration data matches the server-side rule, and classify the basic calibration data according to a preset sampling service type to obtain sampling data, where the sampling data includes stored sampling data, host sampling data, and database sampling data.
Optionally, the obtaining unit 3021 is specifically configured to:
acquiring initial data, a preset sampling service address and secret key pair information, and loading a client sub-rule matched with the preset sampling service address and secret key pair information from a client rule to obtain a target client sub-rule; and screening the initial data according to the target client sub-rule, preprocessing the initial data through a principal component analysis algorithm, deleting redundant data, retaining the data meeting the target client sub-rule, and generating intermediate data.
Optionally, the first checking module 303 includes:
a first calculating unit 3031, configured to send a first instruction to a storage sampling calculation server, compare file information of the storage sampling data according to a server-side rule, where the file information includes a file name, a file size, a file format, and a last modification time, calculate a final unique value of the file information according to a preset information digest algorithm, and generate first target storage data;
a second calculating unit 3032, configured to perform file distinguishing on the storage sample data according to the size of the occupied memory to obtain memory classification data, and perform unique value calculation on the memory classification data according to a preset information digest algorithm and different classes to obtain second target storage data;
a second merging unit 3033, configured to merge the first target storage data and the second target storage data, and generate a target storage verification result.
Optionally, the second check module 304 includes:
a third calculating unit 3041, configured to send a second instruction to the host sampling calculation server, when the server-side rule is set to verify the operating system information, perform unique value calculation on the operating system information according to a preset information digest algorithm, and perform calculation and verification at random on other information except the operating system information to generate first target host data;
a fourth calculating unit 3042, configured to, when the server-side rule is set to check all information, perform final unique value calculation on all field information according to a preset information digest algorithm, and generate second target host data;
a third merging unit 3043, configured to merge the first target host data and the second target host data, and generate a target host verification result.
Optionally, the third checking module 305 includes:
the matching unit 3051 is configured to send a third instruction to the database sampling calculation server, verify consistency of preset database basic information and table structure information according to a server-side rule, verify whether the database basic information and the table structure information are matched with preset standard information, and obtain a consistency result, where the database basic information includes a database user, a storage process, and a trigger, and the table structure information includes a field name, a field length, and a field type;
the fifth calculation unit 3052, configured to perform single record verification according to the consistency result and the database sample data, randomly select a single record field, and calculate a unique value of the single record field according to a preset information digest algorithm to obtain single record verification data;
a sixth calculation unit 3053, configured to perform batch record verification according to the consistency result and the database sampling data, verify all field information according to a preset information digest algorithm, and calculate a final unique value of all field information to obtain batch record verification data;
and a fourth merging unit 3054, configured to merge the single record verification data and the batch record verification data, and generate a target database verification result.
In the embodiment of the invention, by extracting the data information of the host, the storage and the database, different logic and irrelevant services in the original data migration and reconciliation are uniformly intelligently identified and calculated, the sampling detection efficiency is improved, and the deployment cost is reduced.
Fig. 3 and fig. 4 describe the data sampling detection apparatus in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the data sampling detection apparatus in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of a data sample detection device 500 according to an embodiment of the present invention, where the data sample detection device 500 may have relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors), a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the sample detection device 500 for data. Further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the data sample detection device 500.
The sample detection device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the data sampling detection device configuration shown in fig. 5 does not constitute a limitation of the data sampling detection device and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
The present invention also provides a sampling detection device for data, the computer device includes a memory and a processor, the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the sampling detection method for data in the above embodiments.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the method for sample detection of data.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for sampling data, the method comprising:
receiving an account creating instruction sent by a user terminal, creating an account according to the account creating instruction, generating key pair information, and configuring sampling rules to obtain target sampling rules, wherein the target sampling rules comprise client rules and server rules;
acquiring initial data, generating intermediate data according to the secret key pair information and the client rule, verifying the intermediate data to obtain basic verification data, classifying the basic verification data based on the server rule to generate sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data, and the sampling data is used for indicating account checking rules configured for users;
sending a first instruction to a storage sampling calculation server, wherein the first instruction is used for instructing the storage sampling calculation server to carry out checksum calculation on the storage sampling data according to the server-side rule and generate a target storage check result;
sending a second instruction to a host computer sampling calculation server, wherein the second instruction is used for instructing the host computer sampling calculation server to carry out checksum calculation on the host computer sampling data according to the server-side rule and generate a target host computer verification result;
sending a third instruction to a database sampling calculation server, wherein the third instruction is used for instructing the database sampling calculation server to check and calculate the database sampling data according to the server-side rule, and generating a target database check result;
and merging the target storage verification result, the target host verification result and the target database verification result, and outputting a final sampling detection result.
2. The method according to claim 1, wherein the receiving an account creating instruction sent by a user terminal, creating an account according to the account creating instruction, generating key pair information, and configuring a sampling rule to obtain a target sampling rule, and the target sampling rule includes a client rule and a server rule, and includes:
receiving an account creating instruction sent by a user terminal, creating an account according to a preset account creating flow, and generating key pair information, wherein the key pair information comprises public key information and private key information;
the method comprises the steps that the sampling amount and the sampling proportion in a preset rule template are configured to obtain a client rule, the client rule is a rule for transmitting data to a sampling calculation server, and the sampling calculation server comprises a storage sampling calculation server, a host sampling calculation server and a database sampling calculation server;
configuring sampling data types and reconciliation information in a preset rule template to obtain a server-side rule, wherein the server-side rule is a rule between a sampling platform server and a sampling calculation server;
and combining the client rule and the server rule to generate a target sampling rule.
3. The method for sampling data according to claim 1, wherein the obtaining of the initial data, generating intermediate data according to the key pair information and the client rule, checking the intermediate data to obtain basic check data, and classifying the basic check data based on the server rule to generate the sampling data, and the sampling data includes storage sampling data, host sampling data, and database sampling data, and includes:
acquiring initial data, loading a client sub-rule matched with a preset sampling service address and the key pair information to obtain a target client sub-rule, screening the initial data according to the target client sub-rule to generate intermediate data, wherein the client rule comprises a plurality of client sub-rules;
authority and rule verification is carried out on the validity of the intermediate data, whether the intermediate data meet a preset authority verification standard or not is judged, and if yes, basic verification data are generated;
and judging whether the basic check data is matched with the server side rule or not, and classifying the basic check data according to a preset sampling service type to obtain sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data.
4. The method according to claim 3, wherein the obtaining of the initial data, loading a client sub-rule matching a preset sampling service address and the key pair information to obtain a target client sub-rule, and performing screening on the initial data according to the target client sub-rule to generate intermediate data, the client rule including a plurality of the client sub-rules includes:
acquiring initial data, a preset sampling service address and the key pair information, and loading a client sub-rule matched with the preset sampling service address and the key pair information from the client rule to obtain a target client sub-rule, wherein the client rule comprises a plurality of client sub-rules;
and screening the initial data according to the target client sub-rule, preprocessing the initial data through a principal component analysis algorithm, deleting redundant data, retaining data conforming to the target client sub-rule, and generating intermediate data.
5. The method for sampling data according to claim 1, wherein the sending a first instruction to a sample-on-storage calculation server, the first instruction being used for instructing the sample-on-storage calculation server to perform checksum calculation on the sample-on-storage data according to the server-side rule, and generating a target storage verification result comprises:
sending a first instruction to a storage sampling calculation server, comparing file information of the storage sampling data according to the server-side rule, wherein the file information comprises a file name, a file size, a file format and final modification time, calculating a final unique value of the file information according to a preset information summary algorithm, and generating first target storage data;
carrying out file distinguishing on the storage sampling data according to the size of occupied memory to obtain memory classification data, and respectively carrying out unique value calculation on the memory classification data according to a preset information abstract algorithm and different classes to obtain second target storage data;
and merging the first target storage data and the second target storage data to generate a target storage verification result.
6. The method for sampling data according to claim 1, wherein the sending a second instruction to a host sampling computation server, the second instruction being used for instructing the host sampling computation server to perform checksum computation on the host sampling data according to the server-side rule, and generating a target host verification result includes:
sending a second instruction to a host sampling calculation server, when the server side rule is set to verify the information of the operating system, calculating the unique value of the information of the operating system according to a preset information abstract algorithm, and randomly calculating and verifying other information except the information of the operating system to generate first target host data;
when the server side rule is set to check all information, performing final unique value calculation on all field information according to a preset information abstract algorithm to generate second target host data;
and merging the first target host data and the second target host data to generate a target host verification result.
7. The method for sampling and detecting data according to any one of claims 1-6, wherein the sending a third instruction to the database sample calculation server, the third instruction being used for instructing the database sample calculation server to perform checksum calculation on the database sample data according to the server-side rule, and generating a target database check result comprises:
sending a third instruction to a database sampling calculation server, verifying the consistency of preset database basic information and table structure information according to the server-side rule, verifying whether the database basic information and the table structure information are matched with preset standard information or not, and obtaining a consistency result, wherein the database basic information comprises database users, a storage process and a trigger, and the table structure information comprises field names, field lengths and field types;
performing single record verification according to the consistency result and the database sampling data, randomly selecting a single record field, and calculating a unique value of the single record field according to a preset information abstract algorithm to obtain single record verification data;
performing batch record verification according to the consistency result and the database sampling data, verifying all field information according to a preset information abstract algorithm, and calculating final unique values of all field information to obtain batch record verification data;
and merging the single record verification data and the batch record verification data to generate a target database verification result.
8. A sample detection device for data, said sample detection device comprising:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving an account creating instruction sent by a user terminal, creating an account according to the account creating instruction, generating key pair information, and configuring sampling rules to obtain target sampling rules, and the target sampling rules comprise client rules and server rules;
the acquisition module is used for acquiring initial data, generating intermediate data according to the secret key pair information and the client rule, verifying the intermediate data to obtain basic verification data, classifying the basic verification data based on the server rule to generate sampling data, wherein the sampling data comprises storage sampling data, host sampling data and database sampling data, and the sampling data is used for indicating account checking rules configured for users;
the first checking module is used for sending a first instruction to a storage sample calculation server, wherein the first instruction is used for instructing the storage sample calculation server to check and calculate the storage sample data according to the server-side rule, and a target storage checking result is generated;
the second checking module is used for sending a second instruction to the host sampling calculation server, and the second instruction is used for instructing the host sampling calculation server to check and calculate the host sampling data according to the server-side rule to generate a target host checking result;
the third checking module is used for sending a third instruction to the database sampling calculation server, wherein the third instruction is used for instructing the database sampling calculation server to check and calculate the database sampling data according to the server-side rule, and a target database checking result is generated;
and the output module is used for merging the target storage verification result, the target host verification result and the target database verification result and outputting a final sampling detection result.
9. A device for sampling detection of data, the device comprising: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invoking the instructions in the memory to cause a sample detection device of the data to perform a sample detection method of the data according to any one of claims 1-7.
10. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement a method of sample detection of data according to any one of claims 1-7.
CN202110064517.7A 2021-01-18 2021-01-18 Data sampling detection method, device, equipment and storage medium Active CN112860741B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110064517.7A CN112860741B (en) 2021-01-18 2021-01-18 Data sampling detection method, device, equipment and storage medium
PCT/CN2021/083590 WO2022151590A1 (en) 2021-01-18 2021-03-29 Method, apparatus and device for performing sampling inspection on data, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110064517.7A CN112860741B (en) 2021-01-18 2021-01-18 Data sampling detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112860741A CN112860741A (en) 2021-05-28
CN112860741B true CN112860741B (en) 2022-08-23

Family

ID=76006711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110064517.7A Active CN112860741B (en) 2021-01-18 2021-01-18 Data sampling detection method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112860741B (en)
WO (1) WO2022151590A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114153647B (en) * 2021-09-24 2022-08-02 深圳市木浪云科技有限公司 Rapid data verification method, device and system for cloud storage system
CN114764347A (en) * 2022-04-14 2022-07-19 重庆长安汽车股份有限公司 Program verification system and method of multi-core controller and storage medium
CN114611473B (en) * 2022-05-11 2022-08-12 希维科技(广州)有限公司 Generation method of inspection execution file and electronic equipment
CN115577379B (en) * 2022-11-09 2023-05-09 中孚安全技术有限公司 Hierarchical protection security analysis method, system and equipment
CN115577867B (en) * 2022-12-09 2023-04-18 深圳海智创科技有限公司 Method and system for creating spot check task, computer equipment and storage medium
CN116032830B (en) * 2023-03-24 2023-07-21 微网优联科技(成都)有限公司 Network switch interaction method, network switch and network system
CN117275654B (en) * 2023-11-02 2024-05-03 中世康恺科技有限公司 Inspection and inspection mutual recognition data acquisition method and device
CN117435630B (en) * 2023-12-21 2024-03-29 杭银消费金融股份有限公司 Rule preposition-based data verification method and system
CN117706260B (en) * 2024-02-06 2024-04-30 禹创半导体(深圳)有限公司 ESD event detection method
CN118278959A (en) * 2024-06-03 2024-07-02 广东省食品检验所(广东省酒类检测中心) Food safety spot check data verification method, storage medium and system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8744937B2 (en) * 2005-02-25 2014-06-03 Sap Ag Consistent set of interfaces derived from a business object model
US10929384B2 (en) * 2017-08-16 2021-02-23 Walmart Apollo, Llc Systems and methods for distributed data validation
CN107704436A (en) * 2017-10-30 2018-02-16 平安科技(深圳)有限公司 Sampling of data method, terminal, equipment and computer-readable recording medium
CN108055281B (en) * 2017-12-27 2021-05-18 百度在线网络技术(北京)有限公司 Account abnormity detection method, device, server and storage medium
CN110413635A (en) * 2019-06-20 2019-11-05 口碑(上海)信息技术有限公司 A kind of data processing method and device
CN110618986A (en) * 2019-09-04 2019-12-27 水晶球教育信息技术有限公司 Big data statistical sampling method and device, server and storage medium
CN110781173A (en) * 2019-10-12 2020-02-11 杭州城市大数据运营有限公司 Data identification method and device, computer equipment and storage medium
CN111581197B (en) * 2020-04-30 2023-06-13 中国工商银行股份有限公司 Method and device for sampling and checking data table in data set
CN111726359B (en) * 2020-06-18 2023-04-07 五八有限公司 Account information detection method and device

Also Published As

Publication number Publication date
CN112860741A (en) 2021-05-28
WO2022151590A1 (en) 2022-07-21

Similar Documents

Publication Publication Date Title
CN112860741B (en) Data sampling detection method, device, equipment and storage medium
US20160098340A1 (en) Method and system for comparing different versions of a cloud based application in a production environment using segregated backend systems
CN112054941B (en) Automatic testing method, device and equipment for private domain name and storage medium
CN108510396B (en) Method and device for insurance verification, computer equipment and storage medium
CN111651347B (en) Jump verification method, device, equipment and storage medium of test system
CN110764980A (en) Log processing method and device
CN112506747A (en) Business process monitoring method and device, electronic equipment and storage medium
CN109815112B (en) Data debugging method and device based on functional test and terminal equipment
CN113867782A (en) Gray scale distribution method and device, computer equipment and storage medium
CN115099792A (en) Method, device and equipment for auditing project declaration form and storage medium
CN114491555A (en) Equipment safety detection method and device, computer equipment and storage medium
CN111858658A (en) Data acquisition method, device, equipment and storage medium
CN114827161A (en) Service calling request sending method and device, electronic equipment and readable storage medium
CN108076092A (en) Web server resources balance method and device
CN114006749A (en) Security verification method, device, equipment and storage medium
CN113204747A (en) Account management method, account management device, server and storage medium
CN112596919A (en) Model calling method, device, equipment and storage medium
CN111680303A (en) Vulnerability scanning method and device, storage medium and electronic equipment
KR102010442B1 (en) Total monitoring method and system for cloud virtual machines
CN111314326A (en) Method, device, equipment and medium for confirming HTTP vulnerability scanning host
CN113839956B (en) Data security assessment method, device, equipment and storage medium
CN114003784A (en) Request recording method, device, equipment and storage medium
CN112765010A (en) Method, device, equipment and storage medium for centralized management of service parameters
CN107704557B (en) Processing method and device for operating mutually exclusive data, computer equipment and storage medium
CN111582832A (en) Fair competition examination method and system based on block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant