CN112685765A - Data quality evaluation method and device for data quality evaluation - Google Patents

Data quality evaluation method and device for data quality evaluation Download PDF

Info

Publication number
CN112685765A
CN112685765A CN202010220629.2A CN202010220629A CN112685765A CN 112685765 A CN112685765 A CN 112685765A CN 202010220629 A CN202010220629 A CN 202010220629A CN 112685765 A CN112685765 A CN 112685765A
Authority
CN
China
Prior art keywords
data
ciphertext
party
quality evaluation
security
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010220629.2A
Other languages
Chinese (zh)
Inventor
王天雨
王蜀洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huakong Tsingjiao Information Technology Beijing Co Ltd
Original Assignee
Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huakong Tsingjiao Information Technology Beijing Co Ltd filed Critical Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority to CN202010220629.2A priority Critical patent/CN112685765A/en
Publication of CN112685765A publication Critical patent/CN112685765A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a data quality evaluation method and device and a device for data quality evaluation. The method comprises the following steps: receiving a security marker and ciphertext selective examination data from a second party, wherein the security marker is generated by the second party based on the ciphertext of a data entry in the data to be provided, the selective examination data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data; inquiring to obtain a first security mark corresponding to a data entry in the spot check data from the received security marks; generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data; matching the first safety mark with the second safety mark one by one; and under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, thereby determining the quality evaluation result of the data to be provided. The embodiment of the invention can improve the reliability of ciphertext data transaction.

Description

Data quality evaluation method and device for data quality evaluation
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data quality assessment method and apparatus, and an apparatus for data quality assessment.
Background
With the development and the rise of technologies such as internet, internet of things and the like, the sources, contents and forms of data are more and more diversified, and the capacity of the data is also increased year by year at a very fast speed. Data has been considered an important asset as a basis for user behavior analysis, business value mining, artificial intelligence training, and the like. Since different organizations or individuals have mastered data of different types, it is a necessary condition for the development of digital economy to integrate, exchange, and share data resources through data transactions.
In order to ensure the reliability of data transaction, quality inspection needs to be performed on data provided by a data provider, so as to inspect whether the data provided by the data provider is real and complete, and whether the requirements of a data demander can be met.
Currently, for the market of data trading in plaintext, the quality of the data can be verified based on the plaintext of the data. However, in the market for data trading of ciphertext, the quality of data cannot be verified based on the ciphertext of the data, and the reliability of the trading of ciphertext data is difficult to guarantee.
Disclosure of Invention
The embodiment of the invention provides a data quality evaluation method and device and a device for evaluating data quality, which can be used for checking the quality of data provided by a data provider on the basis of a ciphertext to improve the reliability of ciphertext data transaction.
In order to solve the above problem, an embodiment of the present invention discloses a data quality evaluation method, which is applied to a data transaction scenario for transacting data usage rights, and is applicable to a first-party device, where the method includes:
receiving a security marker and ciphertext spot check data from a second party, wherein the security marker is generated by the second party based on the ciphertext of a data entry in data to be provided, the spot check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data;
inquiring to obtain a first security mark corresponding to a data entry in the spot check data from the received security marks;
generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data;
matching the first security mark with the second security mark one by one;
and under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, thereby determining the quality evaluation result of the data to be provided.
On the other hand, the embodiment of the invention discloses a data quality evaluation device, which is applied to a data transaction scene of transaction data use right and is suitable for first-party equipment, and the device comprises:
the system comprises a data receiving module, a data processing module and a data processing module, wherein the data receiving module is used for receiving a security mark from a second party and random check data of a ciphertext, the security mark is generated by the second party based on the ciphertext of a data entry in data to be provided, the random check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data;
the mark query module is used for querying the received security marks to obtain first security marks corresponding to data entries in the spot check data;
the mark calculation module is used for generating a second safety mark corresponding to the data item in the selective examination data based on the ciphertext of the data item in the selective examination data;
the mark matching module is used for matching the first safety mark with the second safety mark one by one;
and the quality evaluation module is used for carrying out quality evaluation on the sampling inspection data under the condition of successful matching to obtain a quality evaluation result of the sampling inspection data, so that the quality evaluation result of the data to be provided is determined.
In yet another aspect, an embodiment of the present invention discloses an apparatus for data quality evaluation, applied to a data transaction scenario for transacting data usage rights, adapted to a first-party device, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs including instructions for:
receiving a security marker and ciphertext spot check data from a second party, wherein the security marker is generated by the second party based on the ciphertext of a data entry in data to be provided, the spot check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data;
inquiring to obtain a first security mark corresponding to a data entry in the spot check data from the received security marks;
generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data;
matching the first security mark with the second security mark one by one;
and under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, thereby determining the quality evaluation result of the data to be provided.
In yet another aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a data quality assessment method as described in one or more of the preceding.
The embodiment of the invention has the following advantages:
the data quality evaluation method provided by the embodiment of the invention can be applied to a data transaction scene of transaction data use right, and the quality evaluation is carried out on the ciphertext data of the second party through the first party equipment. The second party can send the security tag and the ciphertext spot check data to the first party, so that the first party can perform quality evaluation on the ciphertext data owned by the first party. The security marker is generated by the second party based on the ciphertext of the data entry in the data to be provided and may be used to verify data integrity. And in the case that the security mark of the second party passes the verification, the first party equipment performs quality evaluation on the random inspection data of the ciphertext of the second party. The random inspection data may be part or all of the data to be provided, and the quality of the random inspection data may reflect the overall quality of the data to be provided by the data provider to a certain extent, so that the quality evaluation result of the data to be provided by the second party may be determined according to the quality evaluation result of the random inspection data.
Therefore, before purchasing the right of use of the ciphertext data provided by the data provider, the data demander can check whether the quality of the ciphertext data provided by the data provider meets the requirement or not according to the data quality evaluation result of the data provider and then decide whether to purchase or not, so that the data quality based on ciphertext data transaction can be improved, and the reliability of the ciphertext data transaction can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a flow chart of the steps of an embodiment of a data quality assessment method of the present invention;
FIG. 2 is a flow chart of the steps of another data quality assessment method embodiment of the present invention;
FIG. 3 is a block diagram of an embodiment of a data quality evaluation apparatus according to the present invention;
FIG. 4 is a block diagram of an embodiment of a data quality assessment system of the present invention;
FIG. 5 is a block diagram of an apparatus 800 for data quality assessment in accordance with the present invention; and
fig. 6 is a schematic diagram of a server in some embodiments of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Method embodiment
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data quality evaluation method according to the present invention is shown, where the method is applied to a data transaction scenario for transacting data usage rights, and is applicable to a first-party device, and specifically may include the following steps:
step 101, receiving a security tag and ciphertext spot check data from a second party, wherein the security tag is generated by the second party based on a ciphertext of a data entry in data to be provided, the spot check data is part or all of the data to be provided, and the data to be provided comprises the ciphertext data;
step 102, from the received security marks, querying to obtain a first security mark corresponding to a data entry in the spot check data;
103, generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data;
104, matching the first safety mark and the second safety mark one by one;
and 105, under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, so as to determine the quality evaluation result of the data to be provided.
The embodiment of the invention can be applied to a data transaction scene of transaction data use right, a main carrier of data transaction can be a data transaction platform, a data provider can provide data related information of the use right to be sold to the data transaction platform, a data demander can submit the requirement of required data to the data transaction platform, the data transaction platform can recommend the data information of the data provider meeting the requirement of the data demander to the data demander according to the requirement provided by the data demander, and then the data demander and the data provider can realize the data transaction of the data use right based on the data transaction platform.
In order to ensure the privacy and safety of data in the data transaction process, the data transaction platform can support the transaction of the right of use of ciphertext data. For example, the data provider may be an automobile enterprise, an automobile maintenance plant, an insurance company, an owner of an automobile, and the like, and the ciphertext data of the use right sold by the data provider may include the ciphertext of the vehicle condition data, and the vehicle condition data may include relevant data of the automobile such as the automobile type, the automobile age, the engine speed, and the like. Of course, the embodiment of the present invention does not limit the industry or application field to which the data provider belongs. For example, for the medical health field, the ciphertext data of the use right sold by the data provider may include ciphertext of data of blood pressure, heart rate, blood concentration and the like of a human body recorded by the medical device. For the field of internet of things such as smart home, the ciphertext data of the use right sold by the data provider can comprise the ciphertext of the data such as the temperature, the wind speed, the geographical position, the starting time and the like of the air conditioner.
The embodiment of the invention can carry out quality evaluation on the ciphertext data of the second party through the first party equipment. Wherein, the first party refers to the checking party for evaluating the quality of the ciphertext data. The second party refers to the party to be inspected who accepts the quality evaluation of the ciphertext data that it owns. The first party can be a data demand party, namely a party purchasing the ciphertext data use right. Optionally, the first party may also be any other institution or platform, such as a third-party evaluation institution or a data transaction platform. The second party may be a data provider, i.e., a party that sells the usage rights for the ciphertext data.
In this embodiment, the second party may send the security token and the ciphertext spot check data to the first party, so that the first party performs quality evaluation on the ciphertext data owned by the first party. The security mark is generated by the second party based on the ciphertext of the data entry in the data to be provided, the spot check data may be part or all of the data in the data to be provided, and the data to be provided includes the ciphertext data.
The data to be provided may be ciphertext data of the right to sell the use by the second party. The second party may generate a corresponding security tag for the ciphertext of each data entry in the data to be provided, where the security tag may be used to verify the integrity of the data to be provided. The integrity of the data to be provided is checked, on the one hand, whether the data value of the data to be provided is correct or not can be checked, and on the other hand, whether the data to be provided is real data provided by the second party or not can be checked. A data entry refers to the smallest unit of data that is not continuously splittable, such as a certain field of a database.
In an optional embodiment of the present invention, the security token may be obtained by calculating a ciphertext of a data entry in the data to be provided by using an irreversible encryption algorithm.
The irreversible encryption algorithm is characterized in that a secret key is not needed in the encryption process, the system directly processes the original data into a ciphertext through the encryption algorithm after inputting the original data, the encrypted data cannot be decrypted, and the true decryption can be realized only by re-inputting the original data and re-processing the original data through the same irreversible encryption algorithm to obtain the same encrypted ciphertext which is re-identified by the system.
And the second party calculates the ciphertext of each data entry in the data to be provided by using an irreversible encryption algorithm to generate a security tag, which is assumed to be Tab, where Tab is hash (ID), and the ID may represent an index value of the data entry.
It should be noted that the embodiment of the present invention does not limit the type of the irreversible encryption algorithm used. For example, the irreversible encryption algorithm may include SM3, MD5, SHA1, SHA256, and the like.
And the second party sends the security mark of the data to be provided and the random inspection data of the ciphertext to the first party for data quality evaluation. The first party may check the integrity of the data to be provided by the second party according to the received security token, and if it is determined that the integrity of the data to be provided by the second party meets the requirement, the quality of the received snapshot data of the ciphertext may be evaluated, where the snapshot data may be part or all of the data to be provided by the second party, so as to evaluate the quality of the data to be provided by the second party.
After receiving the security marker and the ciphertext selective examination data from the second party, the first party queries the received security marker to obtain a first security marker corresponding to a data entry in the selective examination data.
In one example, the data to be provided by the second party includes data entries with index values from 1 to 1000000, and the second party generates corresponding security tags for ciphertexts of the 1000000 data entries by using an irreversible encryption algorithm, so as to obtain 1000000 security tags. Assuming that the snapshot data is a data entry with an index value from 1 to 100, the second party sends 1000000 security tags corresponding to the data entry with the index value from 1 to 1000000 and the snapshot data of the data entry with the index value from 1 to 100 to the first party.
The first party queries the 1000000 received security labels to obtain the first security label corresponding to the data entry with the index value from 1 to 100. In addition, the first party calculates the security marks for the data entries in the sampled data by using the same irreversible encryption algorithm as the second party to obtain the second security marks corresponding to the data entries in the sampled data.
The first security label of the 100 data entries is received by the first party from the second party, and the second security label of the 100 data entries is calculated by the first party. The first party matches the first security label with the second security label one by one, for example, matches the first security label and the second security label of the data entry with index value 1, and determines whether the two are the same. And matching the first security label and the second security label of the data entry with the index value of 2, and judging whether the two are the same. And so on until the first security label and the second security label of the data entry with the index value of 100 are matched. If the first security marks and the second security marks of the 100 data entries are successfully matched, the second party provides the sampling data which passes the integrity check, and the quality of the sampling data can be continuously evaluated; otherwise, the second party provides the random check data which is not passed the integrity check, and the data transaction with the second party can be terminated.
It should be noted that, in order to protect privacy and security of data, the first party performs quality evaluation on the second party's spot-check data, and performs quality evaluation on the spot-check data based on the ciphertext, and the operation performed on the ciphertext spot-check data in the quality evaluation process may be implemented by invoking a multi-party security computing engine. The multi-party secure computing engine can provide multi-party secure computing service, multi-party secure computing is cryptograph-based computing, data participating in computing comprises cryptograph data, and intermediate results generated in the computing process and final computing results are also cryptograph data. In the calculation process based on the ciphertext, the data plaintext is not exposed, and the privacy security of the data can be ensured.
Optionally, the ciphertext-based computation may be performed by at least two of the four compute nodes that communicate with other compute nodes participating in the ciphertext computation based on a multi-party secure computing protocol to complete the computation task.
In the embodiment of the invention, one computing task can be distributed to four computing nodes to complete cooperatively. These four compute nodes are referred to as Y1, Y2, Ya, Yb, respectively. For partial computing tasks, only two computing nodes are needed to complete cooperatively, only two computing nodes Y1 and Y2 are needed to participate, and for example, addition computing only needs two computing nodes to perform computing. The embodiment of the invention does not limit the number of the computing nodes.
It can be understood that any privacy operational scenario based on the ciphertext can be realized by the multi-party security computing engine in the embodiment of the present invention. For example, the data may be subjected to calculation operations such as addition, subtraction, multiplication, division, and averaging based on the ciphertext, comparison, sorting, and grouping based on the ciphertext, model training and prediction such as machine learning and artificial intelligence based on the ciphertext, and database query based on the ciphertext.
In an optional embodiment of the present invention, before receiving the snapshot data of the security token and the ciphertext from the second party, the method may further include: according to a preset sampling proportion, calculating an index value corresponding to the sampling inspection data, and sending the index value to the second party;
the receiving of the snapshot data of the security token and the ciphertext from the second party may include: and receiving the security mark from the second party and the snapshot data of the ciphertext corresponding to the index value.
In the embodiment of the present invention, the first party may determine the snapshot data of the ciphertext that the second party should provide. For example, the first party may calculate an index value corresponding to the snapshot data according to a preset sampling ratio, and send the index value to the second party, and the second party sends the security label of the data to be provided and the snapshot data of the ciphertext corresponding to the index value to the first party.
The preset sampling proportion can be flexibly set according to actual conditions, can be determined by negotiation between the first party and the second party, or can be set by the first party alone, and in order to ensure that data of the second party is not leaked, the sampling proportion set by the first party does not exceed the sampling proportion negotiated by the two parties. In general, the sampling ratio may be a small ratio such as 0.01%, 0.001%, or the like. The first party generates an index value of a data entry to be subjected to the snapshot according to a preset sampling ratio, for example, a data entry including an index value from 1 to 1000000 is included in data to be provided of the second party, the first party and the second party negotiate a sampling ratio of 0.01%, and the first party may calculate an index value corresponding to the snapshot data according to the negotiated sampling ratio, for example, the index value may be from 1 to 100, or from 50 to 150, and the like.
It should be noted that the sampling ratio may be used to check a part of the data to be provided, or the sampling ratio may be 1 to check all the data to be provided.
In the embodiment of the present invention, the first party may include any one or more of a data demanding party, a third-party evaluation institution, and a data transaction platform. In practical applications, the quality evaluation result of the second party's spot check data may include the evaluation result of any one of the above three first parties, or a combination of the evaluation results of any multiple parties.
In an optional embodiment of the present invention, the quality evaluation result includes any one or more of the following information: an evaluation mode, a ratio of evaluation data, and an extraction mode.
The evaluation mode is used for representing a specific evaluation mode adopted by the current quality evaluation result, such as cross validation, model prediction and other evaluation modes. The proportion of the evaluation data is used for representing the proportion of the sampling data participating in quality evaluation in the data to be provided. The extraction mode is used to indicate a mode of selecting the sampling data from the data to be provided, such as random selection, designated selection, and the like.
In an optional embodiment of the present invention, the performing quality evaluation on the spot check data specifically may include:
step S11, obtaining the same data corresponding to the spot check data from a trusted third party, wherein the same data comprises ciphertext data;
step S12, performing ciphertext-based comparison operation on the key data items of the data items in the spot check data and the key data items of the data items in the same data to obtain a comparison result;
and step S13, obtaining a quality evaluation result corresponding to the sampling inspection data according to the comparison result.
The trusted third party may include an authority having authoritative data, such as a governing body, for example, a public security bureau. The first party can obtain the same data corresponding to the spot check data from the trusted third party, and the same data comprises ciphertext data.
The identical data refers to ciphertext data which is owned by a trusted third party and has the same data items as the random inspection data. The peer data is authority data owned by the trusted third party, and is generally true and reliable, so that the accuracy of the random inspection data of the second party can be cross-verified by using the peer data. The first party can perform ciphertext-based comparison operation on the key data items of the data items in the spot check data and the key data items of the data items in the same data to obtain a comparison result.
The key data items may be specified by the first party, for example, the key data items may include identification numbers, incomes, and the like, which are all ciphertexts. A data entry for the key data item is determined in the snapshot data of the ciphertext provided by the second party, which is assumed to be E1.
The first party acquires the same data corresponding to the random inspection data of the second party from the trusted third party, wherein the same data comprises ciphertext data. The first party may obtain data entries for the key data items (e.g., identification numbers, revenue, etc.) from the same data, and this is assumed to be E2. E1 and E2 are ciphertext data.
And the first party performs ciphertext-based comparison operation on the E1 in the second party selective examination data and the E2 in the same data owned by the trusted third party to obtain a comparison result, and according to the comparison result, a quality evaluation result corresponding to the selective examination data can be obtained. For example, performing ciphertext-based comparison operations on E1 and E2 may determine a proportion of equal data entries in E1 and E2, from which a quality assessment result corresponding to the spot check data may be determined. Optionally, the quality level of the spot check data may also be determined according to the percentage of equal data entries in E1 and E2.
In an optional embodiment of the present invention, the performing quality evaluation on the selective inspection data to obtain a quality evaluation result corresponding to the selective inspection data specifically may include:
s21, performing ciphertext-based model training by using the spot check data to obtain a target model;
step S22, testing the target model based on a preset test set to obtain a prediction result of the target model;
and step S23, obtaining a quality evaluation result corresponding to the sampling inspection data according to the prediction result.
The embodiment of the invention can utilize the second party's random inspection data to carry out model training based on the ciphertext to obtain the target model, and check the improvement condition of the random inspection data on model prediction to evaluate the quality of the extracted data.
After a target model is obtained by training with the snapshot data of the ciphertext of the second party, the target model may be tested based on a preset test set to obtain a prediction result of the target model. The preset test set may include test data provided by a first party, or the test set may include test data provided by a second party, or the test set may further include test data provided by both the first party and the second party, and the like.
And obtaining a quality evaluation result corresponding to the sampling inspection data according to the prediction result. The judgment basis for determining the quality evaluation result corresponding to the sampling inspection data according to the prediction result can be set by the data demand side.
In one example, the judgment criterion may include a lifting effect of a technical level, which may include accuracy of target model prediction, and the like. For example, the quality of the spot check data may be determined by determining whether the accuracy of the target model prediction is improved and whether the percentage of the accuracy improvement reaches a preset index.
In another example, the judgment may include an application level improvement effect, and the application level may include a user conversion rate or a profit obtained by using data predicted by the target model. For example, after the recommendation information predicted by the target model is used, the quality of the spot check data can be determined by judging whether the user conversion rate is improved and whether the percentage of the user conversion rate improvement reaches a preset index.
In specific application, the quality of the spot check data can be comprehensively determined according to various promotion percentages. For example, the percentage of the model prediction accuracy improvement, the percentage of the user conversion rate improvement, and the percentage of the income improvement can be weighted to obtain the quality evaluation result of the spot check data.
In an optional embodiment of the present invention, the performing quality evaluation on the selective inspection data to obtain a quality evaluation result corresponding to the selective inspection data specifically may include:
step S31, performing any one or more of the following checking operations on the spot check data: checking a repeated value, checking a missing value, checking data inclination and checking an abnormal value to obtain a checking result;
and step S32, obtaining a quality evaluation result corresponding to the sampling inspection data according to the inspection result.
In the embodiment of the invention, the multi-party secure computing engine can realize privacy operation based on the ciphertext on the data. For example, the data may be subjected to calculation operations such as addition, subtraction, multiplication, division, and averaging based on the ciphertext, comparison, sorting, and grouping based on the ciphertext, model training and prediction such as machine learning and artificial intelligence based on the ciphertext, and database query based on the ciphertext.
Therefore, the quality evaluation method based on the plaintext can be suitable for evaluating the quality of data based on the ciphertext through the multi-party security computing engine. For example, the embodiment of the present invention may perform any one or more of the following checking operations on the snapshot data of the ciphertext by invoking the multi-party secure computing engine: and obtaining a detection result by repeated value detection, missing value detection, data inclination detection, abnormal value detection and the like, and obtaining a quality evaluation result corresponding to the sampling detection data according to the detection result.
Wherein the repeated value check is used for checking whether repeated data exists in the spot check data based on the ciphertext. Missing value checking the data value used to check whether some data or data exists in the spot check data based on the ciphertext is incomplete. The data skew check is used for checking whether the cache data are not distributed enough in a distributed system based on the ciphertext, so that a large amount of cache data are concentrated to one or a plurality of service nodes. The outlier check is used to check whether unreasonable data values exist in the spot check data based on the ciphertext.
The repeated value check, the missing value check, the data inclination check and the abnormal value check are all common quality evaluation methods in plaintext data. It should be noted that the above quality evaluation methods for plaintext data are only used as an example, and in practical applications, other quality evaluation methods for plaintext data may be applied to perform quality evaluation on ciphertext spot check data in the embodiment of the present invention.
In an optional embodiment of the present invention, the performing quality evaluation on the selective inspection data to obtain a quality evaluation result corresponding to the selective inspection data specifically may include:
step S41, obtaining historical feedback information corresponding to the spot check data;
and step S42, obtaining a quality evaluation result corresponding to the sampling inspection data according to the historical feedback information.
In the embodiment of the invention, in addition to the quality evaluation result obtained by evaluating the selective inspection data by the first party (such as the data demanding party, the third-party evaluation mechanism and the data transaction platform), the quality evaluation result of the selective inspection data can be determined according to the historical feedback information of the data using party.
In one example, the historical feedback information may include: the data user uses the ciphertext data of the data provider, and the model prediction accuracy of the data user is improved by percentage, or the data user uses the ciphertext data of the data provider, and the user conversion rate or the income improvement percentage obtained by the data user are improved by percentage.
The data using party can include any one or more of a data requiring party, a third-party evaluation mechanism, a data transaction platform and the like. For example, both data consumer a and data consumer B have purchased the usage rights for the ciphertext data of data provider C. After the data demander A uses the ciphertext data of the data provider C, the model prediction accuracy is improved by 30%. The data demander a may upload feedback information info1 of the ciphertext data of the data provider C to the data transaction platform, where the feedback information info1 may include a percentage of the model prediction accuracy improvement after the data demander a uses the ciphertext data of the data provider C. For another example, after the data consumer B uses the ciphertext data of the data provider C, the user conversion rate obtained by the data consumer B is improved by 40%. The data consumer B may upload feedback information info2 of its usage of the ciphertext data of the data provider C to the data transaction platform, where the feedback information info2 may include a percentage of the user conversion rate increase after its usage of the ciphertext data of the data provider C.
The data transaction platform can store info1 and info2 as historical feedback information of a data user, and when quality evaluation is performed on data to be provided by a data provider C, the historical feedback information info1 and the historical feedback information info2 stored by the data transaction platform can be used as reference for the quality evaluation.
After the first party performs quality evaluation on the sampling data to obtain a quality evaluation result of the sampling data, the quality evaluation result of the data to be provided of the second party can be determined according to the quality evaluation result of the sampling data. Since the selective examination data may be partial data randomly extracted by the first party from the data to be provided of the second party according to a sampling ratio, or the selective examination data may also be all data in the data to be provided of the second party, the quality of the selective examination data may represent the overall quality of the data to be provided by the data provider to a certain extent, and the quality evaluation result of the selective examination data may be used as the quality evaluation result of the data to be provided of the second party.
It should be noted that, in a specific application, a first party (e.g., a data demanding party, a third-party evaluation organization, and a data transaction platform) may perform quality evaluation on ciphertext spot check data of a second party by using any one of the above data quality evaluation methods or a combination of any several data quality evaluation methods.
In an optional embodiment of the present invention, after determining a result of the quality evaluation of the data to be provided, the method may further include: and after the quality evaluation result of the data to be provided and the received safety mark are digitally signed, storing the result to a data transaction platform.
The first party can digitally sign the quality evaluation result of the data to be provided obtained by evaluation and the security label received from the second party, and store the quality evaluation result of the data to be provided, the security label and the digital signature in a data transaction platform, wherein the quality evaluation result comprises any one or more of the following information: an evaluation mode, a ratio of evaluation data, and an extraction mode. Therefore, before purchasing the right of use of the ciphertext data provided by the data provider, the data demander can query the data quality evaluation result of the data provider stored in the data transaction platform to check whether the quality of the ciphertext data provided by the data provider meets the requirement and then decide whether to purchase the data, so that the data quality based on ciphertext data transaction can be improved, and the reliability of the ciphertext data transaction can be improved.
For example, the data demander may check whether the quality of the ciphertext data provided by the data provider meets the requirement from the following aspects: first, the data demanding party can use the digital certificate of the first party (such as the data demanding party, the third party evaluation organization, and the data transaction platform) to verify whether the digital signature of the quality evaluation result is legal. And secondly, the data demand party can verify the integrity of the ciphertext data of the use right to be purchased by using the security tag. In the process of verifying the integrity, the integrity of the security labels of all the data items to be provided by the data provider can be verified, or in order to ensure the verification efficiency, the integrity of the security labels of partial data items of the random inspection data can be verified. And thirdly, checking whether the data quality evaluation result meets a preset index or not. For example, according to the quality evaluation result stored in the data transaction platform, it may be queried whether ciphertext data provided by a certain data provider can meet a preset index for improving the model prediction accuracy, or whether ciphertext data provided by the data provider can meet a preset index for improving the user conversion rate.
In summary, the data quality evaluation method provided by the embodiment of the invention can be applied to a data transaction scene of transaction data usage rights, and the ciphertext data of the second party is subjected to quality evaluation through the first party device. The second party can send the security tag and the ciphertext spot check data to the first party, so that the first party can perform quality evaluation on the ciphertext data owned by the first party. The security marker is generated by the second party based on the ciphertext of the data entry in the data to be provided and may be used to verify data integrity. And in the case that the security mark of the second party passes the verification, the first party equipment performs quality evaluation on the random inspection data of the ciphertext of the second party. The random inspection data may be part or all of the data to be provided, and the quality of the random inspection data may reflect the overall quality of the data to be provided by the data provider to a certain extent, so that the quality evaluation result of the data to be provided by the second party may be determined according to the quality evaluation result of the random inspection data.
Therefore, before purchasing the right of use of the ciphertext data provided by the data provider, the data demander can check whether the quality of the ciphertext data provided by the data provider meets the requirement or not according to the data quality evaluation result of the data provider and then decide whether to purchase or not, so that the data quality based on ciphertext data transaction can be improved, and the reliability of the ciphertext data transaction can be improved.
Method embodiment two
Referring to fig. 2, a flowchart illustrating steps of another embodiment of a data quality evaluation method according to the present invention is shown, where the method is applied to a data transaction scenario for transacting data usage rights, and is applicable to a first-party device, and specifically may include the following steps:
step 201, a security label from a second party is received.
And the second party generates a corresponding security tag for the ciphertext of each data entry in the data to be provided. For example, the second party calculates the ciphertext of each data entry in the data to be provided by using the irreversible encryption algorithm to generate the security tag. The data to be provided includes ciphertext data. The data entry refers to the smallest data unit that is not continuously splittable, such as a certain field of the database.
The second party transmits the generated security token to the first party (e.g., data requiring party, third party evaluation institution, data transaction platform), and the first party receives the security token from the second party.
Step 202, calculating an index value corresponding to the sampling inspection data according to a preset sampling proportion, and sending the index value to the second party.
It should be noted that, when the snapshot data is used for data quality evaluation, the index value index of the data entry to be snapshot may be generated by the first party according to the sampling ratio (which may be a smaller ratio, such as 0.01%, 0.001%, etc.) negotiated with the second party. When the spot check data is used for integrity verification, the sampling proportion may be unilaterally determined by the first party.
And 203, receiving the snapshot data of the ciphertext corresponding to the index value sent by the second party.
After receiving the index value index corresponding to the snapshot data sent by the first party, the second party sends the snapshot data of the ciphertext corresponding to the index value index to the first party, and the first party can receive the snapshot data of the ciphertext corresponding to the index value index sent by the second party.
And step 204, inquiring the received security marks to obtain a first security mark corresponding to a data entry in the spot check data.
The first party may obtain, according to the index value index corresponding to the sampling data, a first security label corresponding to a data entry in the sampling data by querying the received security labels.
Step 205, generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data.
And the first party generates a second security mark corresponding to the data entry in the sampling data based on the ciphertext of the data entry in the sampling data by using the same irreversible encryption algorithm as the second party.
And 206, matching the first safety mark with the second safety mark one by one.
And if the first safety marks of all the data entries in the selective inspection data are the same as the corresponding second safety marks, the first safety marks of the selective inspection data are successfully matched with the second safety marks, namely, the selective inspection data pass integrity verification.
And step 207, under the condition of successful matching, performing quality evaluation on the sampling inspection data.
And if the first safety marks of all the data entries in the selective inspection data are the same as the second safety marks, the matching is successful, the selective inspection data passes the integrity verification, and the selective inspection data is the data extracted by the first party. If the match is unsuccessful, indicating that the second party sent the snapshot data that was not the data drawn by the first party, the data transaction may be terminated.
It should be noted that, in order to ensure the data quality, the first party may select all data in the data to be provided by the second party as the sampling data for integrity verification. Of course, in order to ensure the efficiency of integrity verification, part of the data to be provided by the second party is selected as the random check data for integrity verification.
After the sampling inspection data passes the integrity verification, the first party can perform quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, and then can determine the quality evaluation result of the data to be provided. For quality evaluation of the spot check data, any one of the data quality evaluation methods or a combination of any two of the data quality evaluation methods in the first embodiment of the method may be adopted, and details are not repeated here.
And step 208, storing the quality evaluation result and the received safety mark to a data transaction platform after digital signature.
The first party can digitally sign the quality evaluation result of the data to be provided obtained by evaluation and the security label received from the second party, and store the quality evaluation result of the data to be provided, the security label and the digital signature in a data transaction platform, wherein the quality evaluation result comprises any one or more of the following information: an evaluation mode, a ratio of evaluation data, and an extraction mode.
Before purchasing the right of use of the ciphertext data provided by the data provider, the data demander can query the data quality evaluation result of the data provider, which is stored in the data transaction platform, so as to check whether the quality of the ciphertext data provided by the data provider for which the right of use is to be purchased meets the requirement, and then decide whether to purchase the ciphertext data.
The embodiment of the invention can carry out quality evaluation on the ciphertext data of the second party through the first party equipment, and the quality evaluation process is based on the ciphertext, so that the privacy and the safety of the data can be ensured. In addition, before purchasing the right of use of the ciphertext data provided by the data provider, the data demander can query the data quality evaluation result of the data provider stored in the data transaction platform to check whether the quality of the ciphertext data provided by the data provider to purchase the right of use meets the requirement, and then determines whether to purchase the data, so that the data quality based on ciphertext data transaction can be improved, and the reliability of ciphertext data transaction can be improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Device embodiment
Referring to fig. 3, a block diagram of an embodiment of a data quality evaluation apparatus according to the present invention is shown, and is applied to a data transaction scenario for transacting data usage rights, and is applicable to a first-party device, where the apparatus may specifically include:
a data receiving module 301, configured to receive a security tag from a second party and snapshot data of a ciphertext, where the security tag is generated by the second party based on the ciphertext of a data entry in data to be provided, the snapshot data is part or all of the data in the data to be provided, and the data to be provided includes the ciphertext data;
a tag query module 302, configured to query the received security tags to obtain first security tags corresponding to data entries in the spot check data;
a tag calculation module 303, configured to generate a second security tag corresponding to a data entry in the selective examination data based on a ciphertext of the data entry in the selective examination data;
a tag matching module 304, configured to match the first security tag and the second security tag one by one;
and the quality evaluation module 305 is configured to, if the matching is successful, perform quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, so as to determine a quality evaluation result of the data to be provided.
Optionally, the apparatus may further include:
the index calculation module is used for calculating an index value corresponding to the sampling inspection data according to a preset sampling proportion and sending the index value to the second party;
the data receiving module is specifically configured to receive the security tag from the second party and the snapshot data of the ciphertext corresponding to the index value.
Optionally, the quality evaluation module 305 may specifically include:
the data acquisition submodule is used for acquiring the same data corresponding to the spot check data from a trusted third party, and the same data comprises ciphertext data;
the data comparison submodule is used for carrying out comparison operation based on a ciphertext on the key data item of each data item in the spot check data and the key data item of each data item in the same data to obtain a comparison result;
and the first evaluation submodule is used for obtaining a quality evaluation result corresponding to the sampling inspection data according to the comparison result.
Optionally, the quality evaluation module 305 may specifically include:
the model training submodule is used for carrying out ciphertext-based model training by using the spot check data to obtain a target model;
the model prediction submodule is used for testing the target model based on a preset test set to obtain a prediction result of the target model;
and the second evaluation submodule is used for obtaining a quality evaluation result corresponding to the sampling inspection data according to the prediction result.
Optionally, the quality evaluation module 305 may specifically include:
the data checking sub-module is used for performing any one or more of the following checking operations on the spot check data: checking a repeated value, checking a missing value, checking data inclination and checking an abnormal value to obtain a checking result;
and the third evaluation submodule is used for obtaining a quality evaluation result corresponding to the sampling inspection data according to the inspection result.
Optionally, the quality evaluation module 305 may specifically include:
the history acquisition submodule is used for acquiring history feedback information corresponding to the spot check data;
and the fourth evaluation submodule is used for obtaining a quality evaluation result corresponding to the sampling inspection data according to the historical feedback information.
Optionally, the apparatus may further include:
and the result uploading module is used for storing the quality evaluation result of the data to be provided and the received safety mark to a data transaction platform after carrying out digital signature on the data to be provided and the received safety mark.
Optionally, the quality evaluation result includes any one or more of the following information: an evaluation mode, a ratio of evaluation data, and an extraction mode.
Optionally, the security marker is calculated by using an irreversible encryption algorithm to obtain a ciphertext of a data entry in the data to be provided.
System embodiment
Referring to fig. 4, a block diagram of an embodiment of a data quality evaluation system of the present invention is shown, which is applied to a data transaction scenario for transacting data usage rights, and the system includes a first-party device 410, a second-party device 420, and a data transaction platform 430.
The embodiment of the invention can be applied to a data transaction scene of transaction data use right, a main carrier of data transaction can be a data transaction platform, a data provider can provide data of the use right to be sold to the data transaction platform, a data demander can submit the requirement of required data to the data transaction platform, the data transaction platform can recommend data information of the data provider meeting the requirement of the data demander to the data demander according to the requirement provided by the data demander, and then the data demander and the data provider can realize data transaction of the data use right based on the data transaction platform.
In order to ensure the privacy and safety of data in the data transaction process, the data transaction platform can support the transaction of the right of use of ciphertext data. The embodiment of the invention can carry out quality evaluation on the ciphertext data of the second party through the first party equipment. Wherein, the first party refers to the checking party for evaluating the quality of the ciphertext data. The second party refers to the party to be inspected who accepts the quality evaluation of the ciphertext data that it owns.
The first party device may be an electronic device of a data-requiring party, a third-party evaluation institution, and a data trading platform. The second party device may comprise an electronic device referred to as a data provider. The electronic devices include, but are not limited to: server, intelligent terminal, etc.
The second-party device 420 comprises a first mark calculating module and a data sending module;
the first mark calculation module is used for calculating a security mark based on a ciphertext of a data item in data to be provided by a second party;
the data sending module is configured to send the selective examination data of the security marker and the ciphertext to the first-party device, where the selective examination data is part or all of the data to be provided, and the data to be provided includes the ciphertext data;
the first-party device 410 comprises a data receiving module, a label querying module, a second label calculating module, a label matching module, a quality evaluating module and a result storing module;
the data receiving module is used for receiving the second-party security mark and ciphertext spot check data sent by the second-party equipment;
the mark query module is used for querying the received security marks to obtain first security marks corresponding to data entries in the spot check data;
the second mark calculation module is configured to generate a second security mark corresponding to a data entry in the selective examination data based on a ciphertext of the data entry in the selective examination data;
the mark matching module is used for matching the first safety mark with the second safety mark one by one;
the quality evaluation module is used for carrying out quality evaluation on the sampling inspection data under the condition of successful matching to obtain a quality evaluation result of the sampling inspection data so as to determine the quality evaluation result of the data to be provided;
and the result storage module is used for storing the quality evaluation result of the data to be provided to a data transaction platform.
The first party device may digitally sign the quality evaluation result of the data to be provided obtained by evaluation and the security label received from the second party device, and store the quality evaluation result of the data to be provided, the security label and the digital signature in the data transaction platform, where the quality evaluation result includes any one or more of the following information: an evaluation mode, a ratio of evaluation data, and an extraction mode. Therefore, before purchasing the right of use of the ciphertext data provided by the data provider, the data demander can query the data quality evaluation result of the data provider stored in the data transaction platform to check whether the quality of the ciphertext data provided by the data provider meets the requirement and then decide whether to purchase the data, so that the data quality based on ciphertext data transaction can be improved, and the reliability of the ciphertext data transaction can be improved.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
An embodiment of the present invention provides an apparatus for data quality assessment, applied to a data transaction scenario for transacting data usage rights, adapted for a first-party device, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs including instructions for: receiving a security marker and ciphertext spot check data from a second party, wherein the security marker is generated by the second party based on the ciphertext of a data entry in data to be provided, the spot check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data; inquiring to obtain a first security mark corresponding to a data entry in the spot check data from the received security marks; generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data; matching the first security mark with the second security mark one by one; and under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, thereby determining the quality evaluation result of the data to be provided.
Fig. 5 is a block diagram illustrating an apparatus 800 for data quality assessment in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 5, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 6 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
A non-transitory computer-readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the data quality assessment method shown in fig. 1.
A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a data quality assessment method, the method comprising: receiving a security marker and ciphertext spot check data from a second party, wherein the security marker is generated by the second party based on the ciphertext of a data entry in data to be provided, the spot check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data; inquiring to obtain a first security mark corresponding to a data entry in the spot check data from the received security marks; generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data; matching the first security mark with the second security mark one by one; and under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, thereby determining the quality evaluation result of the data to be provided.
The embodiment of the invention discloses A1 and a data quality evaluation method, which is applied to a data transaction scene of transaction data use right, is suitable for first-party equipment and comprises the following steps:
receiving a security marker and ciphertext spot check data from a second party, wherein the security marker is generated by the second party based on the ciphertext of a data entry in data to be provided, the spot check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data;
inquiring to obtain a first security mark corresponding to a data entry in the spot check data from the received security marks;
generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data;
matching the first security mark with the second security mark one by one;
and under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, thereby determining the quality evaluation result of the data to be provided.
A2, before the receiving of the snapshot data of security labels and ciphertext from the second party, the method of a1, the method further comprising:
according to a preset sampling proportion, calculating an index value corresponding to the sampling inspection data, and sending the index value to the second party;
the receiving of the security token and the random inspection data of the ciphertext from the second party comprises:
and receiving the security mark from the second party and the snapshot data of the ciphertext corresponding to the index value.
A3, the method of A1, wherein the quality assessment of the spot check data comprises:
acquiring the same data corresponding to the spot check data from a trusted third party, wherein the same data comprises ciphertext data;
performing ciphertext-based comparison operation on the key data items of the data items in the spot check data and the key data items of the data items in the same data to obtain a comparison result;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the comparison result.
A4, according to the method of A1, the quality evaluation of the spot check data to obtain a quality evaluation result corresponding to the spot check data includes:
performing ciphertext-based model training by using the spot check data to obtain a target model;
testing the target model based on a preset test set to obtain a prediction result of the target model;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the prediction result.
A5, according to the method of A1, the quality evaluation of the spot check data to obtain a quality evaluation result corresponding to the spot check data includes:
performing any one or more of the following checking operations on the spot check data: checking a repeated value, checking a missing value, checking data inclination and checking an abnormal value to obtain a checking result;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the inspection result.
A6, according to the method of A1, the quality evaluation of the spot check data to obtain a quality evaluation result corresponding to the spot check data includes:
acquiring historical feedback information corresponding to the spot check data;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the historical feedback information.
A7, according to the method of A1, after the determining the quality assessment result of the data to be provided, the method further comprises:
and after the quality evaluation result of the data to be provided and the received safety mark are digitally signed, storing the result to a data transaction platform.
A8, according to the method of any one of A1 to A7, wherein the quality assessment result comprises any one or more of the following information: an evaluation mode, a ratio of evaluation data, and an extraction mode.
A9, according to the method of any A1 to A7, the security mark is calculated by using an irreversible encryption algorithm to the ciphertext of the data entry in the data to be provided.
The embodiment of the invention discloses B10 and a data quality evaluation device, which is applied to a data transaction scene of transaction data use right, is suitable for first-party equipment and comprises the following components:
the system comprises a data receiving module, a data processing module and a data processing module, wherein the data receiving module is used for receiving a security mark from a second party and random check data of a ciphertext, the security mark is generated by the second party based on the ciphertext of a data entry in data to be provided, the random check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data;
the mark query module is used for querying the received security marks to obtain first security marks corresponding to data entries in the spot check data;
the mark calculation module is used for generating a second safety mark corresponding to the data item in the selective examination data based on the ciphertext of the data item in the selective examination data;
the mark matching module is used for matching the first safety mark with the second safety mark one by one;
and the quality evaluation module is used for carrying out quality evaluation on the sampling inspection data under the condition of successful matching to obtain a quality evaluation result of the sampling inspection data, so that the quality evaluation result of the data to be provided is determined.
B11, the apparatus of B10, the apparatus further comprising:
the index calculation module is used for calculating an index value corresponding to the sampling inspection data according to a preset sampling proportion and sending the index value to the second party;
the data receiving module is specifically configured to receive the security tag from the second party and the snapshot data of the ciphertext corresponding to the index value.
B12, the apparatus of B10, the quality assessment module comprising:
the data acquisition submodule is used for acquiring the same data corresponding to the spot check data from a trusted third party, and the same data comprises ciphertext data;
the data comparison submodule is used for carrying out comparison operation based on a ciphertext on the key data item of each data item in the spot check data and the key data item of each data item in the same data to obtain a comparison result;
and the first evaluation submodule is used for obtaining a quality evaluation result corresponding to the sampling inspection data according to the comparison result.
B13, the apparatus of B10, the quality assessment module comprising:
the model training submodule is used for carrying out ciphertext-based model training by using the spot check data to obtain a target model;
the model prediction submodule is used for testing the target model based on a preset test set to obtain a prediction result of the target model;
and the second evaluation submodule is used for obtaining a quality evaluation result corresponding to the sampling inspection data according to the prediction result.
B14, the apparatus of B10, the quality assessment module comprising:
the data checking sub-module is used for performing any one or more of the following checking operations on the spot check data: checking a repeated value, checking a missing value, checking data inclination and checking an abnormal value to obtain a checking result;
and the third evaluation submodule is used for obtaining a quality evaluation result corresponding to the sampling inspection data according to the inspection result.
B15, the apparatus of B10, the quality assessment module comprising:
the history acquisition submodule is used for acquiring history feedback information corresponding to the spot check data;
and the fourth evaluation submodule is used for obtaining a quality evaluation result corresponding to the sampling inspection data according to the historical feedback information.
B16, the apparatus of B10, the apparatus further comprising:
and the result uploading module is used for storing the quality evaluation result of the data to be provided and the received safety mark to a data transaction platform after carrying out digital signature on the data to be provided and the received safety mark.
B17, the device according to any one of B10 to B15, wherein the quality assessment result comprises any one or more of the following information: an evaluation mode, a ratio of evaluation data, and an extraction mode.
B18, according to the device of any one of B10 to B15, the safety mark is obtained by calculating the ciphertext of the data entry in the data to be provided by using an irreversible encryption algorithm.
The embodiment of the invention discloses C19, an apparatus for data quality assessment, applied to a data transaction scenario for transacting data usage rights, and applicable to a first-party device, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs include instructions for:
receiving a security marker and ciphertext spot check data from a second party, wherein the security marker is generated by the second party based on the ciphertext of a data entry in data to be provided, the spot check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data;
inquiring to obtain a first security mark corresponding to a data entry in the spot check data from the received security marks;
generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data;
matching the first security mark with the second security mark one by one;
and under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, thereby determining the quality evaluation result of the data to be provided.
C20, the device of C19, the device also configured to execute the one or more programs by one or more processors including instructions for:
according to a preset sampling proportion, calculating an index value corresponding to the sampling inspection data, and sending the index value to the second party;
the receiving of the security token and the random inspection data of the ciphertext from the second party comprises:
and receiving the security mark from the second party and the snapshot data of the ciphertext corresponding to the index value.
C21, the apparatus of C19, the quality assessment of the spot check data comprising:
acquiring the same data corresponding to the spot check data from a trusted third party, wherein the same data comprises ciphertext data;
performing ciphertext-based comparison operation on the key data items of the data items in the spot check data and the key data items of the data items in the same data to obtain a comparison result;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the comparison result.
C22, according to the apparatus of C19, the quality evaluation of the spot check data to obtain a quality evaluation result corresponding to the spot check data includes:
performing ciphertext-based model training by using the spot check data to obtain a target model;
testing the target model based on a preset test set to obtain a prediction result of the target model;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the prediction result.
C23, according to the apparatus of C19, the quality evaluation of the spot check data to obtain a quality evaluation result corresponding to the spot check data includes:
performing any one or more of the following checking operations on the spot check data: checking a repeated value, checking a missing value, checking data inclination and checking an abnormal value to obtain a checking result;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the inspection result.
C24, according to the apparatus of C19, the quality evaluation of the spot check data to obtain a quality evaluation result corresponding to the spot check data includes:
acquiring historical feedback information corresponding to the spot check data;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the historical feedback information.
C25, the device of C19, the device also configured to execute the one or more programs by one or more processors including instructions for:
and after the quality evaluation result of the data to be provided and the received safety mark are digitally signed, storing the result to a data transaction platform.
C26, the device according to any one of C19 to C25, wherein the quality assessment result includes any one or more of the following information: an evaluation mode, a ratio of evaluation data, and an extraction mode.
C27, the device according to any of C19 to C25, wherein the security mark is calculated by using an irreversible encryption algorithm to the ciphertext of the data entry in the data to be provided.
Embodiments of the present invention disclose D29, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a data quality assessment method as described in one or more of a 1-a 9.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
The data quality evaluation method, the data quality evaluation device and the device for data quality evaluation provided by the invention are described in detail, specific examples are applied in the text to explain the principle and the implementation mode of the invention, and the description of the above examples is only used to help understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A data quality assessment method is applied to a data transaction scene of transaction data usage right and is applicable to first-party equipment, and the method comprises the following steps:
receiving a security marker and ciphertext spot check data from a second party, wherein the security marker is generated by the second party based on the ciphertext of a data entry in data to be provided, the spot check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data;
inquiring to obtain a first security mark corresponding to a data entry in the spot check data from the received security marks;
generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data;
matching the first security mark with the second security mark one by one;
and under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, thereby determining the quality evaluation result of the data to be provided.
2. The method of claim 1, wherein prior to receiving the snapshot data of the security token and the ciphertext from the second party, the method further comprises:
according to a preset sampling proportion, calculating an index value corresponding to the sampling inspection data, and sending the index value to the second party;
the receiving of the security token and the random inspection data of the ciphertext from the second party comprises:
and receiving the security mark from the second party and the snapshot data of the ciphertext corresponding to the index value.
3. The method of claim 1, wherein the quality assessment of the spot check data comprises:
acquiring the same data corresponding to the spot check data from a trusted third party, wherein the same data comprises ciphertext data;
performing ciphertext-based comparison operation on the key data items of the data items in the spot check data and the key data items of the data items in the same data to obtain a comparison result;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the comparison result.
4. The method according to claim 1, wherein the performing quality evaluation on the spot check data to obtain a quality evaluation result corresponding to the spot check data comprises:
performing ciphertext-based model training by using the spot check data to obtain a target model;
testing the target model based on a preset test set to obtain a prediction result of the target model;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the prediction result.
5. The method according to claim 1, wherein the performing quality evaluation on the spot check data to obtain a quality evaluation result corresponding to the spot check data comprises:
performing any one or more of the following checking operations on the spot check data: checking a repeated value, checking a missing value, checking data inclination and checking an abnormal value to obtain a checking result;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the inspection result.
6. The method according to claim 1, wherein the performing quality evaluation on the spot check data to obtain a quality evaluation result corresponding to the spot check data comprises:
acquiring historical feedback information corresponding to the spot check data;
and obtaining a quality evaluation result corresponding to the sampling inspection data according to the historical feedback information.
7. A data quality assessment apparatus, applied to a data transaction scenario for transacting data usage rights, adapted to a first-party device, the apparatus comprising:
the system comprises a data receiving module, a data processing module and a data processing module, wherein the data receiving module is used for receiving a security mark from a second party and random check data of a ciphertext, the security mark is generated by the second party based on the ciphertext of a data entry in data to be provided, the random check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data;
the mark query module is used for querying the received security marks to obtain first security marks corresponding to data entries in the spot check data;
the mark calculation module is used for generating a second safety mark corresponding to the data item in the selective examination data based on the ciphertext of the data item in the selective examination data;
the mark matching module is used for matching the first safety mark with the second safety mark one by one;
and the quality evaluation module is used for carrying out quality evaluation on the sampling inspection data under the condition of successful matching to obtain a quality evaluation result of the sampling inspection data, so that the quality evaluation result of the data to be provided is determined.
8. An apparatus for data quality assessment, applied to a data trading scenario for trading data usage rights, adapted for a first party device, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs including instructions for:
receiving a security marker and ciphertext spot check data from a second party, wherein the security marker is generated by the second party based on the ciphertext of a data entry in data to be provided, the spot check data is part or all of the data in the data to be provided, and the data to be provided comprises the ciphertext data;
inquiring to obtain a first security mark corresponding to a data entry in the spot check data from the received security marks;
generating a second security label corresponding to the data entry in the selective examination data based on the ciphertext of the data entry in the selective examination data;
matching the first security mark with the second security mark one by one;
and under the condition of successful matching, performing quality evaluation on the sampling inspection data to obtain a quality evaluation result of the sampling inspection data, thereby determining the quality evaluation result of the data to be provided.
9. A data quality evaluation system is characterized in that the system is applied to a data transaction scene of transaction data use right, and the system comprises a first party device, a second party device and a data transaction platform; wherein the content of the first and second substances,
the second-party equipment comprises a first mark calculation module and a data transmission module;
the first mark calculation module is used for calculating a security mark based on a ciphertext of a data item in data to be provided by a second party;
the data sending module is configured to send the selective examination data of the security marker and the ciphertext to the first-party device, where the selective examination data is part or all of the data to be provided, and the data to be provided includes the ciphertext data;
the first party equipment comprises a data receiving module, a mark inquiring module, a second mark calculating module, a mark matching module, a quality evaluating module and a result storing module;
the data receiving module is used for receiving the second-party security mark and ciphertext spot check data sent by the second-party equipment;
the mark query module is used for querying the received security marks to obtain first security marks corresponding to data entries in the spot check data;
the second mark calculation module is configured to generate a second security mark corresponding to a data entry in the selective examination data based on a ciphertext of the data entry in the selective examination data;
the mark matching module is used for matching the first safety mark with the second safety mark one by one;
the quality evaluation module is used for carrying out quality evaluation on the sampling inspection data under the condition of successful matching to obtain a quality evaluation result of the sampling inspection data so as to determine the quality evaluation result of the data to be provided;
and the result storage module is used for storing the quality evaluation result of the data to be provided to a data transaction platform.
10. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the data quality assessment method of any one of claims 1 to 6.
CN202010220629.2A 2020-03-25 2020-03-25 Data quality evaluation method and device for data quality evaluation Pending CN112685765A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010220629.2A CN112685765A (en) 2020-03-25 2020-03-25 Data quality evaluation method and device for data quality evaluation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010220629.2A CN112685765A (en) 2020-03-25 2020-03-25 Data quality evaluation method and device for data quality evaluation

Publications (1)

Publication Number Publication Date
CN112685765A true CN112685765A (en) 2021-04-20

Family

ID=75445278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010220629.2A Pending CN112685765A (en) 2020-03-25 2020-03-25 Data quality evaluation method and device for data quality evaluation

Country Status (1)

Country Link
CN (1) CN112685765A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1698111A (en) * 2001-12-05 2005-11-16 皇家飞利浦电子股份有限公司 Method and apparatus for verifying the integrity of system data
CN108009646A (en) * 2017-11-30 2018-05-08 深圳市轱辘车联数据技术有限公司 Vehicle data processing method and server
CN108647966A (en) * 2018-05-09 2018-10-12 深圳市融讯科技有限公司 A kind of data interactive method and device based on block chain
CN109472596A (en) * 2018-10-16 2019-03-15 中国传媒大学 Alliance's chain common recognition method and system based on transaction assessment
CN109948684A (en) * 2019-03-12 2019-06-28 百度在线网络技术(北京)有限公司 Quality detecting method, device and its relevant device of point cloud data mark quality

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1698111A (en) * 2001-12-05 2005-11-16 皇家飞利浦电子股份有限公司 Method and apparatus for verifying the integrity of system data
CN108009646A (en) * 2017-11-30 2018-05-08 深圳市轱辘车联数据技术有限公司 Vehicle data processing method and server
CN108647966A (en) * 2018-05-09 2018-10-12 深圳市融讯科技有限公司 A kind of data interactive method and device based on block chain
CN109472596A (en) * 2018-10-16 2019-03-15 中国传媒大学 Alliance's chain common recognition method and system based on transaction assessment
CN109948684A (en) * 2019-03-12 2019-06-28 百度在线网络技术(北京)有限公司 Quality detecting method, device and its relevant device of point cloud data mark quality

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹刚 等: "《数字内容安全技术》", 31 October 2016, 北京:中国传媒大学出版社, pages: 221 *

Similar Documents

Publication Publication Date Title
US11810080B2 (en) Systems and method for tracking enterprise events using hybrid public-private blockchain ledgers
CN113168637B (en) Method and system for secondary fraud detection during transaction verification
EP3207464B1 (en) Method, device, terminal, and server for verifying security of service operation
US11297500B2 (en) Authenticating digital evidence
CA3057396A1 (en) Methods and devices for acquiring and recording tracking information on blockchain
US10643208B2 (en) Digital payment system
CN111489155B (en) Data processing method and device for data processing
CN112184218B (en) Data processing method and device and data processing device
CN111383113A (en) Suspicious client prediction method, device, equipment and readable storage medium
CN110909259A (en) Block chain-based user recommendation method, device, equipment and storage medium
CN114840568B (en) Ciphertext sorting method and device and ciphertext sorting device
US11120160B2 (en) Distributed personal data storage and encrypted personal data service based on secure computation
CN110765490A (en) Method and apparatus for processing information
CN112487415B (en) Method and device for detecting security of computing task
CN111367993A (en) Data service method and device based on end-to-end
CN114266680A (en) Block chain-based electronic contract signing method, device and system
CA2948229A1 (en) Systems and method for tracking enterprise events using hybrid public-private blockchain ledgers
CN106251223A (en) Counterparty's reliability determines method and apparatus
CN112685765A (en) Data quality evaluation method and device for data quality evaluation
CN106408304B (en) Account security management method and device
CN109815393B (en) Information processing method and device, computer equipment and readable storage medium
US20210168129A1 (en) System and method for persistent authentication of a user for issuing virtual tokens
CN115329390B (en) Financial privacy information security auditing method and device based on privacy protection calculation
CN115499254B (en) User data processing method, device and system and readable storage medium
CN112016928B (en) Payment method and device and payment device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination