CN114742417A - Data quality evaluation method and device, electronic equipment and storage medium - Google Patents

Data quality evaluation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114742417A
CN114742417A CN202210396311.9A CN202210396311A CN114742417A CN 114742417 A CN114742417 A CN 114742417A CN 202210396311 A CN202210396311 A CN 202210396311A CN 114742417 A CN114742417 A CN 114742417A
Authority
CN
China
Prior art keywords
preset
data
evaluated
time
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210396311.9A
Other languages
Chinese (zh)
Inventor
赵柯
于洋
高经郡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kejie Technology Co ltd
Original Assignee
Beijing Kejie Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kejie Technology Co ltd filed Critical Beijing Kejie Technology Co ltd
Priority to CN202210396311.9A priority Critical patent/CN114742417A/en
Publication of CN114742417A publication Critical patent/CN114742417A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data processing, which comprises a data quality evaluation method and device, electronic equipment and a storage medium, wherein the data quality evaluation method comprises the following steps: acquiring data to be evaluated; wherein the data to be evaluated is financial data; respectively calculating initial scores of all preset indexes of the data to be evaluated in all first preset periods according to preset scoring rules of all preset indexes; respectively calculating the comprehensive scores of all preset indexes according to a preset time attenuation rule; and displaying the comprehensive scores of the preset indexes. According to the method and the device, the data quality is graded from multiple dimensions by using multiple preset indexes, the evaluation quality of each dimension of the data is displayed to the user according to a value grading mode, the user can conveniently and quickly locate the data, and the data cleaning efficiency is improved.

Description

Data quality evaluation method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data quality evaluation method and apparatus, an electronic device, and a storage medium.
Background
With the support of current computer technology, many businesses need to continually obtain more and more value from data. Behind valuable data, hidden irregular and disorderly data, these irregular data and the disorderly data can influence the exactness of final result, so, need filter, wash the data that do not have value usually, improve the normalization of original data to improve the validity and the accuracy of data. However, in the presence of a large amount of problematic data, it is difficult for a user to quickly locate the data, and the efficiency of data cleaning is low.
Disclosure of Invention
In order to quickly locate the problems of data and improve the efficiency of data cleaning, the invention provides a data quality assessment method in a first aspect, which adopts the following technical scheme:
a data quality assessment method, comprising:
acquiring data to be evaluated; wherein the data to be evaluated is financial data;
respectively calculating initial scores of the preset indexes of the data to be evaluated in each first preset period according to preset scoring rules of the preset indexes;
respectively calculating the comprehensive scores of the preset indexes according to the initial scores of the preset indexes and a preset time attenuation rule;
and displaying the comprehensive scores of the preset indexes.
By adopting the technical scheme, the data quality is graded from multiple dimensions by utilizing multiple preset indexes, the evaluation quality of each dimension of the data is displayed for a user according to the mode of scores, the user can conveniently and quickly locate the problems in the data, and the data cleaning efficiency is improved.
Optionally, before the displaying of the comprehensive scores of the preset indexes, the method further includes: performing weighted calculation on the comprehensive scores of the preset indexes according to a preset weight rule to generate data quality comprehensive scores; and
and when the comprehensive scores of the preset indexes are displayed, displaying the comprehensive scores of the data quality.
By adopting the technical scheme, the data quality comprehensive score is generated by carrying out weighted calculation on the plurality of preset index comprehensive scores, and when the evaluation quality of each dimension of the data is displayed for the user, the data quality comprehensive score is displayed for the user at the same time, so that the user can conveniently carry out integral control on the data quality, and whether the problem of positioning data exists according to the specific dimension is determined, and the data cleaning efficiency is improved.
Optionally, the number of the first preset periods is a first preset number, the first preset number is greater than 1, and the first preset periods of the first preset number are continuous on a time axis;
for any preset index, respectively calculating the comprehensive scores of the preset indexes according to the initial scores of the preset indexes and a preset time attenuation rule, and specifically comprising the following steps of:
acquiring a second preset number of initial scores of the preset indexes, and calculating a first average value;
acquiring a third preset number of initial scores of the preset indexes, and calculating a second average value;
according to a preset time attenuation rule, performing time weighting on the first average value and the second average value to generate a preset index comprehensive score;
the sum of the second preset number and the third preset number is equal to the first preset number, and the first preset period corresponding to the preset index initial scores of the second preset number is earlier than the first preset period corresponding to the preset index initial scores of the third preset number on a time axis; the preset time attenuation rule is that the farther the current time is, the smaller the time weight coefficient is.
By adopting the technical scheme, the lower weight is given to the first preset period which is farther away from the current time, so that the comprehensive score of the preset index is calculated according to the time attenuation factor, and the comprehensive score of the preset index with higher accuracy can be obtained.
Optionally, the preset index includes at least two of integrity, accuracy, stability, repeatability, normalization and timeliness.
By adopting the technical scheme, the preset indexes are specified, so that the data quality evaluation can be conveniently carried out according to the specific indexes.
Optionally, if the preset index is integrity, the preset scoring rule includes:
if the data to be evaluated is time data or day data, judging whether the data to be evaluated has newly added data in the first preset period, if so, checking a self-defined rule, and if not, not deducting the score;
if the data to be evaluated is week data, month data or year data, acquiring last execution time of the data to be evaluated in a reverse-pushing mode, and judging whether incremental data exist between the last execution time and the ending time of the first preset period or not, if so, checking a custom rule, and if not, deducting the points.
By adopting the technical scheme, the preset scoring rule of the integrity index is defined according to the data type, and the accuracy of the integrity index is improved.
Optionally, the customized rule includes:
and counting the null value total amount of the data to be evaluated in the first preset period based on a preset null value rule, judging whether the null value total amount is greater than the preset total amount, if so, deducting a preset score, and if not, not deducting the score.
By adopting the technical scheme, the self-defined rule of the integrity index is specifically limited based on the preset null value rule, and the accuracy of the integrity index is improved.
Optionally, if the preset index is timeliness, the preset scoring rule includes:
if the data to be evaluated is time data, calculating the average starting time of historical data at the same time point in a second preset period, subtracting the average starting time of the data to be evaluated at the same time point from the starting time of the data to be evaluated in a first preset period, and generating actual delay time; calculating a timeliness initial score according to the actual delay time and a preset deduction standard;
if the data to be evaluated is day data, week data, month data or year data, calculating the average task starting time of the historical data in a third preset period, calculating the task fluctuation rate of the data to be evaluated in the first preset period relative to the average task starting time, judging whether the task fluctuation rate is greater than the third preset fluctuation rate, if so, deducting a preset value, and if not, deducting the value.
By adopting the technical scheme, the preset scoring rule of the timeliness index is defined according to the data type, and the accuracy of the timeliness index is improved.
In a second aspect, the present invention provides a data quality evaluation apparatus, which adopts the following technical solutions:
a data quality evaluation apparatus comprising:
the acquisition module is used for acquiring data to be evaluated; wherein the data to be evaluated is financial data;
the preset index initial score calculating module is used for calculating each preset index initial score of the data to be evaluated in each first preset period according to a preset score rule of each preset index;
the preset index comprehensive score calculating module is used for calculating each preset index comprehensive score according to the preset index initial score and a preset time attenuation rule;
and the display module is used for displaying the comprehensive scores of the preset indexes.
In a third aspect, the present invention provides an electronic device, which adopts the following technical solutions:
an electronic device comprising a memory and a processor, the memory having stored thereon a computer program which can be loaded by the processor and which performs the method.
In a fourth aspect, the present invention provides a computer-readable storage medium, which adopts the following technical solutions:
a computer-readable storage medium storing a computer program that can be loaded by a processor and executes the method.
In summary, the invention includes at least one of the following beneficial technical effects:
1. the data quality is graded from multiple dimensions by utilizing multiple preset indexes, the evaluation quality of each dimension of the data is displayed for a user according to a score mode, the user can conveniently and quickly locate the problems of the data, and the workload of data cleaning is reduced.
2. The data quality comprehensive score is generated by carrying out weighted calculation on the plurality of preset index comprehensive scores, and when the evaluation quality of each dimensionality of the data is displayed for a user, the data quality comprehensive score is displayed for the user at the same time, so that the user can conveniently carry out integral control on the data quality to determine whether the data needs to be positioned according to the specific dimensionality, and the workload of data cleaning is further reduced.
Drawings
FIG. 1 is a flow chart of a data quality assessment method according to an embodiment of the present invention.
FIG. 2 is a flow chart of a data quality assessment method according to another embodiment of the present invention.
Fig. 3 is a block diagram of a data quality evaluating apparatus according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to fig. 1-4 and the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention discloses a data quality evaluation method, and referring to fig. 1, the data quality evaluation method comprises the following steps:
and S11, acquiring the data to be evaluated.
The data to be evaluated is financial data and is stored in a database, and comprises information such as user names, operation types, money amounts, dates, mobile phone numbers, identity numbers, IP addresses and the like. The data to be evaluated can be data in a plurality of data tables or data in a single data table, and the data to be evaluated can be obtained from the database through SQL sentences.
And S12, respectively calculating the initial scores of the preset indexes of the data to be evaluated in each first preset period according to the preset scoring rules of the preset indexes.
The number of the first preset periods is a first preset number, the first preset number is greater than 1, and the first preset periods of the first preset number are continuous on the time axis. The first preset number and the first preset period are preset according to actual requirements, for example, the first preset number is 24 or 30, and the first preset period is 1 hour or 1 day, which is not limited herein. It should be understood by those skilled in the art that the first preset number of first preset cycles is an actual time range for evaluating the data to be evaluated, and may be a time interval range closest to the current time, or may be a specified time interval range.
The predetermined criteria include at least two of integrity, accuracy, stability, repeatability, normalization, and timeliness. Different preset indexes have different preset scoring rules.
If each preset index initial score needs to be calculated according to the data type, before each preset index initial score is calculated, data to be evaluated is divided into 5 types of hour data, day data, week data, month data and year data according to task attributes. The data to be evaluated may include one or more data types, each type of data amount is one or more, wherein the task attribute is determined by the inherent attribute of the financial data, for example, for stock market data, data with the dimensions of time, day, week, month, year and the like is generally related to the trend of stocks, and the corresponding task also has the corresponding dimension of task attribute, for example, for day data, an updating task is generally executed once a day, and the corresponding task attribute is updated day by day.
It should be noted that each preset index is subjected to quality evaluation based on the basic score, and the basic score of each preset index is the same, for example, 100 or 0, and is not limited herein.
In each first preset period, when each preset index initial score of the data to be evaluated is calculated according to a preset score rule of each preset index, cumulative scores are carried out on multiple types of data based on basic scores, for example, the data to be evaluated comprises 4 pieces of data, namely time data a, time data B, month data C and year data D, the basic scores are 100 scores, in the initial scores of the integrity indexes, the time data a deducts the preset scores, the time data B deducts the preset scores, the month data C does not have scores, the year data D deducts the preset scores, and the initial scores of the integrity indexes are 85.
Specifically, the preset scoring rule of the integrity index includes: if the data to be evaluated is time data or day data, judging whether the data to be evaluated has newly added data in a first preset period, if so, checking a self-defined rule, and if not, not deducting the score; if the data to be evaluated is week data, month data or year data, acquiring the last execution time of the data to be evaluated in a reverse-pushing mode, judging whether incremental data exist between the last execution time and the deadline of a first preset period, if so, checking the self-defined rule, and if not, deducting. Wherein, the last execution time refers to the last execution time of the task.
Wherein, the self-defining rule comprises: counting the null value total amount of the data to be evaluated in a first preset period based on a preset null value rule, judging whether the null value total amount is larger than the preset total amount, if so, deducting a preset value, and if not, not deducting the value; the preset null value rule defines what kind of data is null value data, for example, no data content, the data content only contains punctuation marks, the data content only contains emoticons, and the like; the preset null value total amount and the preset score value are preset according to actual requirements, for example, the preset null value total amount is 2, 3 or 5, and the preset score value is 2, 3 or 5, which is not particularly limited herein.
In the preset scoring rule of the accuracy index, the scoring modes of the hour data, the day data, the week data, the month data and the year data are the same, and the method specifically comprises the following steps: judging whether the enumeration value of the data to be evaluated in the first preset period is larger than a preset enumeration value or not, if so, deducting a preset value, and if not, not deducting the value; judging whether the maximum value of the data to be evaluated in a first preset period is larger than a preset maximum value or not, if so, deducting a preset score, and if not, not deducting the score; judging whether the minimum value of the data to be evaluated in the first preset period is smaller than a preset minimum value or not, if so, deducting a preset score, and if not, not deducting the score; judging whether the data volume of the data to be evaluated in the first preset period is in a preset range, if so, deducting a preset score, and if not, not deducting the score; and executing the preset SQL statement, judging whether the execution result of the SQL statement is false, if so, deducting the preset score, and if not, not deducting the score.
The preset scoring rule of the stability index comprises the following steps: if the data to be evaluated is time data or day data, judging whether the fluctuation rate of the data table in a first preset period is smaller than a first preset fluctuation rate, if so, not deducting the score, and if not, deducting a preset score; if the data to be evaluated is week data, month data or year data, judging whether the average fluctuation rate of the latest preset times of the data table is smaller than a second preset fluctuation rate, if so, not deducting the score, and if not, deducting the preset score. The fluctuation rate of the size of the data table in the first preset period refers to the fluctuation rate of the size of the data table at the ending time and the initial time of the first preset period; the first preset fluctuation rate and the second preset fluctuation rate are preset according to actual requirements, and the first preset fluctuation rate and the second preset fluctuation rate can be the same or different, for example, the first preset fluctuation rate and the second preset fluctuation rate can be one of 85%, 90% or 95%; the latest preset number refers to a preset number of fluctuation messages closest to the deadline of the first preset period, and may be preset according to actual requirements, for example, 5, 7, or 30. Further, if the actual fluctuation times of the data table are smaller than the preset times, the average fluctuation rate is calculated according to the actual fluctuation times.
In the preset scoring rule of the repeatability indexes, the scoring modes of the time data, the day data, the week data, the month data and the year data are the same, and the method specifically comprises the following steps: and counting the total amount of the repeated data, judging whether the total amount of the repeated data is greater than the preset total amount of the repeated data, deducting a preset score if the total amount of the repeated data is greater than the preset total amount of the repeated data, and not deducting the score if the total amount of the repeated data is not greater than the preset total amount of the repeated data.
In the preset scoring rule of the normative index, the scoring modes of the time data, the day data, the week data, the month data and the year data are the same, and the method specifically comprises the following steps: and checking the format of each field according to a preset checking rule, counting the number of records with format checking errors, judging whether the number of records with format checking errors is greater than the preset number of records, deducting a preset score if the number of records with format checking errors is greater than the preset number of records, and not deducting the score if the number of records with format checking errors is not greater than the preset number of records. The preset check rule comprises whether the character string is messy code, whether the date format is correct, whether the mobile phone number format is correct, whether the number of digits of the identity card number is correct, whether the IP address is correct and the like, and specific setting can be carried out according to fields contained in the data table.
The preset scoring rules of the timeliness indexes comprise: if the data to be evaluated is time data, calculating the average starting time of the historical data at the same time point in a second preset period, subtracting the starting time of the data to be evaluated at the first preset period from the average starting time of the data to be evaluated at the same time point, and generating actual delay time; calculating a timeliness initial score according to the actual delay time and a preset deduction standard; if the data to be evaluated is day data, week data, month data or year data, calculating the average task starting time of the historical data in a third preset period, calculating the task fluctuation rate of the data to be evaluated in the first preset period relative to the average task starting time, judging whether the task fluctuation rate is larger than the third preset fluctuation rate, if so, deducting a preset score, and if not, not deducting the score. The second preset period and the third preset period are both greater than the first preset period, and are both earlier than the first preset period on the time axis, preferably, the second preset period and the third preset period are both adjacent to the first preset period on the time axis. The second preset period and the third preset period may be the same or different, for example, the second preset period is 7 days, the third preset period is 30 days, or both the second preset period and the third preset period are 30 days.
Wherein, the preset deduction standard comprises:
the actual delay time is less than or equal to 1 hour, and the marks are not deducted;
if the actual delay time is less than or equal to 2 hours after 1 hour, 10 minutes are deducted;
if the actual delay time is less than or equal to 3 hours after 2 hours, deducting 20 minutes;
if the actual delay time is less than or equal to 4 hours after 3 hours, deducting 30 minutes;
if the actual delay time is less than or equal to 5 hours and is more than 4 hours, deducting 40 minutes;
if the actual delay time is less than or equal to 6 hours after 5 hours, deducting 50 minutes;
if the actual delay time is less than or equal to 8 hours and is less than or equal to 7 hours, deducting 60 minutes;
if the actual delay time is less than or equal to 9 hours and is more than 8 hours, deducting 70 minutes;
if the actual delay time is more than 8 hours, the timeliness score is 0;
if no delay exists, the mark is not deducted.
It should be noted that, if there are multiple tasks writing data to the data table at the same time, the task that runs the slowest is used as the deduction basis.
And S13, respectively calculating the comprehensive scores of the preset indexes according to the initial scores of the preset indexes and a preset time attenuation rule.
Specifically, for any preset index, step S13 specifically includes:
acquiring a second preset number of preset index initial scores, and calculating a first average value;
acquiring a third preset number of preset index initial scores, and calculating a second average value;
and carrying out time weighting on the first average value and the second average value according to a preset time attenuation rule to generate the preset index comprehensive score.
The sum of the second preset quantity and the third preset quantity is equal to the first preset quantity, and the first preset period corresponding to the preset index initial score of the second preset quantity is earlier than the first preset period corresponding to the preset index initial score of the third preset quantity on a time axis. The second preset number and the third preset number may be the same or different. As an example, in the case where the first preset number is 24 and the first preset period is 1 hour, the second preset number is 6, the third preset number is 18, or both the second preset number and the third preset number are 12; in the case where the first preset number is 30 and the first preset period is 1 day, the second preset number is 10, the third preset number is 20, or both the second preset number and the third preset number are 15.
The preset time attenuation rule is a unified preset time attenuation rule, and the longer the preset time attenuation rule is from the current time, the smaller the time weight coefficient is, and the preset time attenuation rule can be specifically set according to actual requirements, for example, the time weight coefficient of the first average value is 0.3, and the time weight coefficient of the second average value is 0.7.
And S14, displaying the comprehensive scores of the preset indexes.
In the embodiment, the data quality is graded from multiple dimensions by utilizing multiple preset indexes, the evaluation quality of each dimension of the data is displayed for a user according to a score mode, the user can conveniently and quickly locate the data, and the data cleaning efficiency is improved.
As another embodiment of the data quality evaluation method, referring to fig. 2, the data quality evaluation method includes:
s21, acquiring data to be evaluated;
s22, respectively calculating the initial scores of the preset indexes of the data to be evaluated in each first preset period according to the preset scoring rules of the preset indexes;
s23, respectively calculating the comprehensive scores of the preset indexes according to the initial scores of the preset indexes and a preset time attenuation rule;
steps 21-23 are the same as steps S11-S13 of the previous embodiment and will not be described again.
And S24, performing weighted calculation on the comprehensive scores of the preset indexes according to preset weight rules to generate the comprehensive scores of the data quality.
In the preset weight rule, the sum of the weight coefficients of the preset indexes is equal to 1, and the weights of the preset indexes may be equal or unequal, for example, the weights of integrity, accuracy, stability, repeatability, normalization and timeliness are 0.2, 0.3, 0.1, or 1/6 in sequence.
And S25, displaying the comprehensive score of each preset index and the comprehensive score of the data quality.
And displaying the comprehensive score of the data quality while displaying the comprehensive score of each preset index.
In the embodiment, the data quality comprehensive score is generated by performing weighted calculation on the plurality of preset index comprehensive scores, and when the evaluation quality of each dimension of the data is displayed to the user, the data quality comprehensive score is displayed to the user at the same time, so that the user can conveniently perform integral control on the data quality to determine whether the data needs to be positioned according to the specific dimension, and the efficiency of data cleaning is further improved. Specifically, when the comprehensive data quality score is high, the data quality is excellent, data cleaning work is not needed, and the problem of positioning data according to specific dimensions is not needed; meanwhile, when the comprehensive data quality score is low, the data quality is poor, the quality of the data cannot be guaranteed only through data cleaning, at the moment, the data can be abandoned without cleaning, and the problem of positioning data according to specific dimensions is solved.
The embodiment of the invention discloses a data quality evaluation device, referring to fig. 3, the data quality evaluation device comprises:
the acquiring module 31 is used for acquiring data to be evaluated;
a preset index initial score calculating module 32, configured to calculate, according to a preset score rule of each preset index, each preset index initial score of the data to be evaluated in each first preset period;
a preset index comprehensive score calculating module 33, configured to calculate each preset index comprehensive score according to the preset index initial score and a preset time decay rule;
and the display module 34 is configured to display the comprehensive scores of the preset indexes.
As an implementation manner, the data quality assessment device further comprises a data quality comprehensive score calculation module, configured to perform weighted calculation on each preset index comprehensive score according to a preset weight rule, so as to generate a data quality comprehensive score; the display module is also used for displaying the data quality comprehensive scores.
The data quality scoring device described in this embodiment may be used to implement the above method embodiments, and the principle and technical effect are similar, which are not described herein again.
Based on the same technical concept, the embodiment of the present disclosure also provides an electronic device 400. Referring to fig. 4, the electronic device 400 includes a processor 401, a memory 402, and a bus. The memory 402 is used for storing computer programs and includes an internal memory 4021 and an external memory 4022; the internal memory 4021 is used to temporarily store arithmetic data in the processor 401 and data exchanged with the external memory 4022 such as a hard disk, and the processor 401 exchanges data with the external memory 4022 through the internal memory 4021.
In the embodiment of the present application, the memory 402 is specifically used for storing a computer program for executing the technical solution of the present application, and is controlled by the processor 401 to execute. That is, when the electronic device 400 is running, the processor 401 and the memory 402 communicate via the bus, so that the processor 401 executes the computer program stored in the memory 402, thereby executing the method described in any of the foregoing embodiments.
The Memory 402 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), and the like.
The processor 401 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 400. In other embodiments of the present application, electronic device 400 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The present embodiment also provides a computer-readable storage medium, such as a floppy disk, an optical disk, a hard disk, a flash Memory, a usb (Secure Digital Memory Card), an MMC (Multimedia Card), etc., in which a computer program implementing the above steps is stored, and the computer program can be executed by one or more processors to implement the method in the above embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The foregoing is a preferred embodiment of the present invention and is not intended to limit the scope of the invention in any way, and any feature disclosed in this specification (including the abstract and drawings) may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.
The foregoing is a preferred embodiment of the present invention and is not intended to limit the scope of the invention in any way, and any feature disclosed in this specification (including the abstract and drawings) may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

Claims (10)

1. A data quality assessment method, comprising:
acquiring data to be evaluated; wherein the data to be evaluated is financial data;
respectively calculating the initial scores of the preset indexes of the data to be evaluated in each first preset period according to the preset scoring rules of the preset indexes;
respectively calculating the comprehensive scores of the preset indexes according to the initial scores of the preset indexes and a preset time attenuation rule;
and displaying the comprehensive scores of the preset indexes.
2. The method according to claim 1, further comprising, before said presenting each pre-set metric composite score: performing weighted calculation on the comprehensive scores of the preset indexes according to a preset weight rule to generate data quality comprehensive scores; and
and when the comprehensive scores of all the preset indexes are displayed, displaying the comprehensive scores of the data quality.
3. The method according to claim 1 or 2, characterized in that: the number of the first preset periods is a first preset number, the first preset number is greater than 1, and the first preset periods of the first preset number are continuous on a time axis;
for any preset index, respectively calculating the comprehensive scores of the preset indexes according to the initial scores of the preset indexes and a preset time attenuation rule, and specifically comprising the following steps of:
acquiring a second preset number of initial scores of the preset indexes, and calculating a first average value;
acquiring a third preset number of initial scores of the preset indexes, and calculating a second average value;
according to a preset time attenuation rule, performing time weighting on the first average value and the second average value to generate a preset index comprehensive score;
the sum of the second preset number and the third preset number is equal to the first preset number, and the first preset period corresponding to the preset index initial scores of the second preset number is earlier than the first preset period corresponding to the preset index initial scores of the third preset number on a time axis; the preset time attenuation rule is that the closer the preset time attenuation rule is to the current time, the larger the time weight coefficient is.
4. The method of claim 1 or 2, wherein the predetermined criteria include at least two of integrity, accuracy, stability, repeatability, normalization and timeliness.
5. The method of claim 4, wherein if the predetermined indicator is integrity, the predetermined scoring rule comprises:
if the data to be evaluated is time data or day data, judging whether the data to be evaluated has newly added data in the first preset period, if so, checking a self-defined rule, and if not, not deducting the score;
if the data to be evaluated is week data, month data or year data, acquiring last execution time of the data to be evaluated in a reverse-pushing mode, and judging whether incremental data exist between the last execution time and the ending time of the first preset period or not, if so, checking a custom rule, and if not, deducting the points.
6. The method of claim 5, wherein the custom rule comprises:
and counting the null value total amount of the data to be evaluated in the first preset period based on a preset null value rule, judging whether the null value total amount is greater than the preset total amount, if so, deducting a preset score, and if not, not deducting the score.
7. The method of claim 4, wherein if the predetermined indicator is timeliness, the predetermined scoring rule comprises:
if the data to be evaluated is time data, calculating the average starting time of the historical data at the same time point in a second preset period, and subtracting the starting time of the data to be evaluated at the first preset period from the average starting time at the same time point to generate actual delay time; calculating a timeliness initial score according to the actual delay time and a preset deduction standard;
if the data to be evaluated is day data, week data, month data or year data, calculating the average task starting time of the historical data in a third preset period, calculating the task fluctuation rate of the data to be evaluated in the first preset period relative to the average task starting time, judging whether the task fluctuation rate is greater than the third preset fluctuation rate, if so, deducting a preset value, and if not, deducting the value.
8. A data quality evaluation apparatus, comprising:
the acquisition module is used for acquiring data to be evaluated; wherein the data to be evaluated is financial data;
the preset index initial score calculating module is used for respectively calculating each preset index initial score of the data to be evaluated in each first preset period according to a preset score rule of each preset index;
the preset index comprehensive score calculating module is used for calculating each preset index comprehensive score according to the preset index initial score and a preset time attenuation rule;
and the display module is used for displaying the comprehensive scores of the preset indexes.
9. An electronic device, characterized in that: comprising a memory and a processor, said memory having stored thereon a computer program which can be loaded by said processor and which performs the method according to any of claims 1-7.
10. A computer-readable storage medium characterized by: a computer program which can be loaded by a processor and which executes the method according to any of claims 1-7.
CN202210396311.9A 2022-04-15 2022-04-15 Data quality evaluation method and device, electronic equipment and storage medium Pending CN114742417A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210396311.9A CN114742417A (en) 2022-04-15 2022-04-15 Data quality evaluation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210396311.9A CN114742417A (en) 2022-04-15 2022-04-15 Data quality evaluation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114742417A true CN114742417A (en) 2022-07-12

Family

ID=82281355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210396311.9A Pending CN114742417A (en) 2022-04-15 2022-04-15 Data quality evaluation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114742417A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115719181A (en) * 2022-11-24 2023-02-28 中电金信软件有限公司 Data quality analysis method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262664A (en) * 2011-07-26 2011-11-30 北京百度网讯科技有限公司 Quality estimating method and quality estimating device
CN108334636A (en) * 2018-03-02 2018-07-27 成都康赛信息技术有限公司 Data Quality Assessment Methodology
CN112506904A (en) * 2020-12-02 2021-03-16 深圳市酷开网络科技股份有限公司 Data quality evaluation method and device, terminal equipment and storage medium
CN113379219A (en) * 2021-06-04 2021-09-10 广东省电信规划设计院有限公司 Quality evaluation method and device for emergency management data
CN113570257A (en) * 2021-07-30 2021-10-29 北京房江湖科技有限公司 Index data evaluation method and device based on scoring model, medium and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262664A (en) * 2011-07-26 2011-11-30 北京百度网讯科技有限公司 Quality estimating method and quality estimating device
CN108334636A (en) * 2018-03-02 2018-07-27 成都康赛信息技术有限公司 Data Quality Assessment Methodology
CN112506904A (en) * 2020-12-02 2021-03-16 深圳市酷开网络科技股份有限公司 Data quality evaluation method and device, terminal equipment and storage medium
CN113379219A (en) * 2021-06-04 2021-09-10 广东省电信规划设计院有限公司 Quality evaluation method and device for emergency management data
CN113570257A (en) * 2021-07-30 2021-10-29 北京房江湖科技有限公司 Index data evaluation method and device based on scoring model, medium and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王健等: ""面向任务代价差异的移动群智感知激励模型"", 《电子与信息学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115719181A (en) * 2022-11-24 2023-02-28 中电金信软件有限公司 Data quality analysis method and device
CN115719181B (en) * 2022-11-24 2023-08-01 中电金信软件有限公司 Data quality analysis method and device

Similar Documents

Publication Publication Date Title
CN113064897B (en) Method, device, equipment and storage medium for generating business index model
CN111460011A (en) Page data display method and device, server and storage medium
CN112380278A (en) Financial data report generation method, device, equipment and storage medium
CN113032403A (en) Data insight method, device, electronic equipment and storage medium
CN114742417A (en) Data quality evaluation method and device, electronic equipment and storage medium
CN110309496B (en) Data summarizing method, electronic device and computer readable storage medium
CN115185904A (en) Cloud storage data processing method and device, electronic equipment and readable storage medium
CN107958346A (en) The recognition methods of abnormal behaviour and device
CN107808336B (en) Financial index calculation method and device
CN110489394B (en) Intermediate data processing method and device
CN114387085A (en) Method and device for processing pipeline data, computer equipment and storage medium
CN114417089A (en) Query method, query device, terminal equipment and computer readable storage medium
CN114036048A (en) Case activity detection method, device, equipment and storage medium
CN113688133A (en) Data processing method, system, device, medium and equipment based on compliance calculation
CN109840213B (en) Test data creating method, device, terminal and storage medium for GUI test
Silverberg et al. Long memory and economic growth in the world economy since the 19th century
US10558647B1 (en) High performance data aggregations
CN110647454A (en) Method and device for determining system user access information
CN117829121B (en) Data processing method, device, electronic equipment and medium
CN113806336B (en) Data quality assessment method and system
CN113609407B (en) Regional consistency verification method and device
CN110032584B (en) Data statistical method and system
CN115423595B (en) File information processing method and device, computer equipment and storage medium
CN115374110A (en) Adjustment table processing method, system, electronic device and storage medium
CN114722243A (en) Data table sorting method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220712