CN117271501A - Data quality inspection method and device, electronic equipment and storage medium - Google Patents

Data quality inspection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117271501A
CN117271501A CN202311566927.7A CN202311566927A CN117271501A CN 117271501 A CN117271501 A CN 117271501A CN 202311566927 A CN202311566927 A CN 202311566927A CN 117271501 A CN117271501 A CN 117271501A
Authority
CN
China
Prior art keywords
quality inspection
data
data quality
target
inspected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311566927.7A
Other languages
Chinese (zh)
Inventor
梁敏
李昕
王蒙
唐磊
王萌
薛秀荣
杨仕勇
陈建辉
蔡周勇
孙效静
高辉
路静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South Digital Technology Co ltd
Shandong Institute Of Land And Spatial Data And Remote Sensing Technology Shandong Sea Area Dynamic Monitoring And Monitoring Center
Original Assignee
South Digital Technology Co ltd
Shandong Institute Of Land And Spatial Data And Remote Sensing Technology Shandong Sea Area Dynamic Monitoring And Monitoring Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South Digital Technology Co ltd, Shandong Institute Of Land And Spatial Data And Remote Sensing Technology Shandong Sea Area Dynamic Monitoring And Monitoring Center filed Critical South Digital Technology Co ltd
Priority to CN202311566927.7A priority Critical patent/CN117271501A/en
Publication of CN117271501A publication Critical patent/CN117271501A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention discloses a data quality inspection method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: the access module acquires a target data quality inspection scheme; the scheduling module determines a first target execution module based on the complexity coefficient, and the first target execution module performs data quality inspection on the data to be inspected based on a target data quality inspection scheme; when the first target execution module successfully performs data quality inspection on the data to be inspected, the first target execution module generates a data quality inspection result of the data to be inspected; the method comprises the steps that when a first target execution module fails in data quality inspection of data to be inspected, and the number of times of data quality inspection failure is smaller than or equal to a preset data quality inspection failure number threshold, a second target execution module is determined, and the second target execution module performs data quality inspection on the data to be inspected based on a target data quality inspection scheme; and the second target execution module generates a data quality inspection result of the data to be inspected.

Description

Data quality inspection method and device, electronic equipment and storage medium
Technical Field
The present invention relates to data quality detection technology, and in particular, to a data quality detection method and apparatus, an electronic device, and a storage medium.
Background
With the continuous development of space technology, the data volume of space achievement data generated by the space technology is also increasing. In practical applications, the quality of the spatial outcome data is often parametrically irregular and cannot be directly used. Therefore, after the spatial result data is obtained, quality of the spatial result data is often required to be checked first, and then the spatial result data with unqualified quality is processed so as to meet the data use requirement.
In the prior art, the form quality inspection of the quality of the space result data is mostly judged by naked eyes manually, and the quality inspection efficiency of the quality inspection of the space result data by a manual mode is low.
Disclosure of Invention
The embodiment of the invention provides a data quality inspection method and device, electronic equipment and a storage medium, so as to solve the problems.
In one aspect of the embodiment of the present invention, a data quality inspection method is provided, which is applied to a data quality inspection system, where the data quality inspection system includes: the system comprises an access module, a scheduling module and a plurality of execution modules, wherein the method comprises the following steps: in response to receiving the data to be inspected sent by the client, the access module acquires a target data quality inspection scheme of the data to be inspected; the scheduling module determines a first target execution module from the plurality of execution modules based on complexity coefficients included in the target data quality inspection scheme, and sends the target data quality inspection scheme to the first target execution module so that the first target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme; responding to successful data quality inspection of the data to be inspected by the first target execution module, generating a data quality inspection result of the data to be inspected by the first target execution module, and sending the data quality inspection result to the client; responding to the failure of the first target execution module to the data quality inspection of the data to be inspected, determining the failure times of the data quality inspection corresponding to the data to be inspected by the dispatching module, and updating the failure times of the data quality inspection to obtain updated data quality inspection failure times; in response to the update data quality inspection failure times being less than or equal to a preset data quality inspection failure times threshold, the scheduling module determines a second target execution module from the plurality of execution modules, and sends the target data quality inspection scheme to the second target execution module, so that the second target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme; and responding to the successful data quality inspection of the data to be inspected by the second target execution module, generating a data quality inspection result of the data to be inspected by the second target execution module, and sending the data quality inspection result to the client.
In another aspect of the embodiment of the present invention, a data quality inspection device is provided, and is applied to a data quality inspection system, where the data quality inspection system includes: an access module, a scheduling module, and a plurality of execution modules, the apparatus comprising: the scheme determining unit is used for responding to the received data to be inspected sent by the client, and the access module acquires a target data quality inspection scheme of the data to be inspected; the first scheduling unit is used for determining a first target execution module from the plurality of execution modules based on complexity coefficients included in the target data quality inspection scheme, and sending the target data quality inspection scheme to the first target execution module so that the first target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme; the first sending unit is used for responding to the success of the first target execution module in checking the data quality of the data to be checked, generating a data quality check result of the data to be checked by the first target execution module, and sending the data quality check result to the client; the scheduling module is used for determining the data quality inspection failure times corresponding to the data to be inspected and updating the data quality inspection failure times to obtain updated data quality inspection failure times; the second scheduling unit is used for responding to the fact that the update data quality inspection failure times are smaller than or equal to a preset data quality inspection failure times threshold, determining a second target execution module from the plurality of execution modules by the scheduling module, and sending the target data quality inspection scheme to the second target execution module so that the second target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme; the second sending unit is used for responding to the success of the second target execution module in checking the data quality of the data to be checked, generating a data quality check result of the data to be checked by the second target execution module, and sending the data quality check result to the client.
In still another aspect of the embodiment of the present invention, there is provided an electronic device including: a memory for storing a computer program; and the processor is used for executing the computer program stored in the memory and realizing a data quality inspection method when the computer program is executed.
In yet another aspect of the embodiments of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, implements a data quality inspection method.
In the implementation of the invention, the data quality inspection system receives the data to be inspected sent by the client, and the access module acquires a target data quality inspection scheme of the data to be inspected; the scheduling module determines a first target execution module from a plurality of execution modules based on complexity coefficients included in a target data quality inspection scheme, and sends the target data quality inspection scheme to the first target execution module so that the first target execution module performs data quality inspection on data to be inspected based on the target data quality inspection scheme; and then when the first target execution module succeeds in data quality inspection of the data to be inspected, the first target execution module generates a data quality inspection result of the data to be inspected and sends the data quality inspection result to the client. In the embodiment of the invention, the access module determines the target quality inspection scheme of the data to be inspected according to the data to be inspected, so that the target data quality inspection scheme suitable for the data to be inspected can be determined according to the data to be inspected, the quality of the data quality inspection is improved, the first target execution module performs quality inspection on the data to be inspected according to the target data quality inspection scheme, the automatic quality inspection is realized, the efficiency of the data quality inspection is effectively improved, and the quality of the data quality inspection is further improved.
In addition, in the embodiment of the invention, the first target execution module is determined from the plurality of execution modules through the complexity coefficient included in the target data quality inspection scheme, so that the corresponding first target execution module is distributed to the data to be inspected according to the complexity of the data quality inspection, and the efficiency of the data quality inspection is further improved.
Meanwhile, in the implementation of the invention, when the first target execution module fails in data quality inspection of the data to be inspected, the scheduling module determines the data quality inspection failure times corresponding to the data to be inspected, and updates the data quality inspection failure times to obtain updated data quality inspection failure times; when the update data quality inspection failure times are smaller than or equal to a preset data quality inspection failure times threshold, the scheduling module determines a second target execution module from the plurality of execution modules and sends the target data quality inspection scheme to the second target execution module so that the second target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme; and when the second target execution module successfully performs data quality inspection on the data to be inspected, the second target execution module generates a data quality inspection result of the data to be inspected and sends the data quality inspection result to the client. Therefore, when the data quality inspection fails, the second target execution module for the data quality inspection can be efficiently redistributed to the data to be inspected, so that the user experience is improved, and the data quality inspection efficiency is further improved.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
The invention may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a data quality inspection system according to an exemplary embodiment of the present invention;
FIG. 2 is a flow chart of a data quality inspection method according to an exemplary embodiment of the present invention;
FIG. 3 is a block diagram of a data quality inspection device in accordance with one embodiment of the present invention;
fig. 4 is a schematic structural diagram of an application embodiment of the electronic device of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations with electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.
In the embodiment of the invention, the following steps are included:
spatial outcome data generally refers to data acquired or processed based on spatial techniques (e.g., satellite positioning, remote sensing, geographic information systems, etc.), which typically have spatial location information. The spatial outcome data may include, for example, but is not limited to: spatial remote sensing data, satellite navigation data, geographic information system data.
Spatial remote sensing data: the method is characterized in that the earth surface information obtained through remote sensing satellites comprises related data such as land utilization, vegetation coverage, topography, urban expansion and the like. Satellite navigation data: the method is position information obtained through satellite positioning technology, and comprises data such as vehicle track, mobile equipment positioning, topography and geomorphology measurement and the like. Geographic information system data: and the data for organizing, managing, analyzing and expressing the space data based on the geographic information technology comprises data such as maps, land utilization maps, traffic road network maps and the like.
Fig. 1 is a schematic structural diagram of a data quality inspection system according to an exemplary embodiment of the present invention, as shown in fig. 1, the data quality inspection system includes: the system comprises an access module, a scheduling module, a plurality of execution modules and a database.
The data quality inspection system may be deployed on an electronic device, which may include, for example, but is not limited to: computers, servers, smartphones, etc.
Fig. 2 is a flow chart of a data quality inspection method according to an exemplary embodiment of the present invention. The embodiment can be applied to a data quality inspection system, as shown in fig. 1, and includes the following steps:
in step S110, in response to receiving the data to be inspected sent by the client, the access module obtains a target data inspection scheme of the data to be inspected. In this embodiment, a data quality inspection scheme for quality inspection of data to be inspected is referred to as a target data quality inspection scheme.
The client may not be on the electronic device, and the user may send the data to be inspected to the data quality inspection system through the client. The data to be quality checked may be spatial outcome data.
In a specific implementation manner, the access module may further perform data format conversion on the target data quality inspection scheme, so that the target data quality inspection scheme after the data format conversion may run on the module operation. Illustratively, the access module may convert the quality inspection scheme to EXE format.
In some alternative embodiments, the target data quality inspection scheme may be obtained by the following methods, specifically including: the access module determines the data type included in the data to be inspected; and then, the access module determines a target data quality inspection scheme according to the data type included in the data to be inspected and the corresponding relation between the preset data type and the data quality inspection scheme.
The data type may be set according to actual requirements, for example, the data type may be topology data, chart data, and the like.
In a specific implementation manner, a plurality of data quality inspection schemes can be pre-programmed through a data quality inspection tool, for example, a data quality inspection tool smero, according to a data type and a quality inspection related rule of the data type, and each data quality inspection scheme is bound with one data type to form a corresponding relation between the data quality inspection scheme and the data type, so that a corresponding relation between a preset data type and the data quality inspection scheme is obtained, all the quality inspection schemes are stored in a scheme storage server, and the scheme storage server is in communication connection with a data quality inspection system. And storing the corresponding relation between the preset data type and the data quality inspection scheme into a database.
The access module may request the data type included in the data to be inspected from the client, or the client may directly send the data to be inspected and the data type included in the data to be inspected to the data inspection system, then the access module queries a data inspection scheme corresponding to the data type included in the data to be inspected in a corresponding relation between a preset data type and a data inspection scheme, and invokes the quality inspection scheme from the scheme storage server, and then determines the data inspection scheme as a target data quality inspection scheme.
In some alternative embodiments, further comprising: the access module distributes data identification for identifying the data to be inspected to the data to be inspected, and stores the data to be inspected to the database.
The data Identification (ID) may be any type of existing identification or custom code. The access module may assign a data identifier for uniquely identifying the data to be inspected to the data to be inspected, and the access module may add the data identifier to the target data quality inspection scheme, so that the target data quality inspection scheme includes the data identifier of the data to be inspected.
In step S120, the scheduling module determines a first target execution module from the multiple execution modules based on the complexity coefficient included in the target data quality inspection scheme, and sends the target data quality inspection scheme to the first target execution module, so that the first target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme.
The target data quality inspection scheme comprises a complexity coefficient. The complexity factor is used to characterize the difficulty of the target data execution scheme. The complexity factor may be set based on data quality inspection experience. Illustratively, when the target data quality inspection scheme is a data quality inspection scheme for topology data, the complexity factor included in the target data quality inspection scheme is high.
In some alternative embodiments, the first target execution module may be determined by: the scheduling module determines a first target task queue according to a complexity coefficient, a corresponding relation between a preset task queue and the complexity coefficient and a preset scheduling algorithm, which are included in the target data quality inspection scheme, and sends the target data quality inspection scheme to the first target task queue; and the scheduling module determines a first target execution module based on the corresponding execution level of each execution module and the corresponding relation between the preset execution level and the task queue.
Wherein, any one of the plurality of execution modules is corresponding to one execution level. In a specific implementation manner, each of the plurality of execution modules in this embodiment may be deployed on a separate server, cloud, or the like. The level of the execution module may be determined based on the device configuration in which the execution module is deployed. For example, the execution module may set the CPU parameters, memory parameters of the computing device (e.g., server) on which it is deployed. The corresponding relation between the CPU parameters and the memory parameters and the execution level may be preset, for example, when the CPU of the server where the execution module is deployed is dual-core and the memory is below 4G, the execution module is determined to be at a low execution level; when the CPU of the server where the execution module is deployed is between the dual cores and 8 cores and the memory is between 4G and 16G, determining that the execution module is at a medium execution level; when the CPU of the server where the execution module is deployed is 8 cores or more and the memory is 16G or more than 16G, the execution module is determined to be at a higher execution level.
In one particular implementation, the data quality inspection system may also include message middleware, which may be, for example, rabbitMQ Server.
Multiple groups of task queues can be preset in the message middleware, each group of task queues comprises at least one queue, and each group of task queues is used for processing a target data quality inspection scheme with a complexity coefficient. For example, three sets of task queues may be provided, where the first set of task queues includes a plurality of task queues with a large data volume, the second set of task queues includes a plurality of task queues with a medium data volume, and the third set of task queues includes a plurality of task queues with a small data volume. The correspondence between the preset task queue and the complexity coefficient may be preset. For example, it is assumed that the complexity coefficients include 1 level (low), 2 level (medium), and 3 level (high), and accordingly, in the preset correspondence between the preset task queues and the complexity coefficients, the task queue with large data volume corresponds to the 3 level complexity coefficient, the task queue with medium data volume corresponds to the 2 level complexity coefficient, and the task queue with small data volume corresponds to the 1 level complexity coefficient.
The preset scheduling algorithm may be an SJF (Short Job First) algorithm, and/or a combined priority scheduling algorithm, etc. For example, when it is determined that the task queue corresponding to the complexity coefficient included in the target data quality inspection scheme is a task queue with a large data volume, the task pair corresponding to the target data quality inspection scheme may be determined from a plurality of task queues with a large data volume according to an SJF algorithm (preset scheduling algorithm)
A first target task queue for processing a target data quality inspection scheme. Alternatively, the task queue with the shortest queue may be selected from among a plurality of task queues with large data volume as the first target task queue. The scheduling module schedules the target data quality inspection scheme to a first target task queue.
In a specific implementation manner, a plurality of data interfaces are arranged between the scheme storage server and the data quality inspection system, and different data interfaces are used for transmitting target data quality inspection schemes with different complexity coefficients. The data quality inspection system may determine a complexity factor of the target quality inspection scheme based on the data interface from which the target data quality inspection scheme was received.
In some alternative embodiments, further comprising: and responding to the first target execution module receiving the target data quality inspection scheme, and acquiring the data to be inspected from the database by the first target execution module according to the data identification in the target data quality inspection scheme.
When the first target execution module receives the target data quality inspection scheme sent by the scheduling module, the first target execution module acquires the data to be inspected from the database according to the data identification of the data to be inspected included in the target data quality inspection scheme, and then the first target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme.
In one specific implementation, the target data quality inspection scheme may further include: quality inspection items, quality inspection parameters, etc. The quality inspection items may include: the method comprises the steps of a file path of data to be inspected, a number of a target data quality inspection scheme, a complexity coefficient of the target data quality inspection scheme, parameters required by related quality inspection rules in quality inspection items, a data type of returned data quality inspection results and the like. The quality inspection items may include: catalog integrity check, mathematical basis normalization check, data structure normalization check, mandatory attribute integrity check, attribute data value domain normalization check, data topology normalization check, data attribute logic consistency check, result report normalization check, form content consistency check, library form consistency check and the like of the data to be detected.
Step S130, in response to the success of the data quality inspection of the data to be inspected by the first target execution module, the first target execution module generates a data quality inspection result of the data to be inspected and sends the data quality inspection result to the client.
The data quality inspection result may include a problem, quality inspection time, etc. of the data to be inspected.
In step S140, in response to the failure of the first target execution module to quality-check the data of the data to be checked, the scheduling module determines the number of data quality-check failures corresponding to the data to be checked, and updates the number of data quality-check failures to obtain the number of updated data quality-check failures.
In a specific implementation manner, when the first target execution module does not respond to the scheduling module for more than a first preset time period, or does not finish quality inspection of the data to be inspected for more than a second preset time period, the data quality inspection failure can be determined, and at this time, the first target quality inspection module feeds back a data quality inspection failure message to the scheduling module;
when the scheduling module receives the feedback data quality inspection failure message, the scheduling module invokes the data quality inspection failure times corresponding to the data to be inspected and adds 1 to the invoked data quality inspection failure times to finish updating the data quality inspection failure times corresponding to the data to be inspected, and obtain updated data quality inspection failure times corresponding to the data to be inspected.
It should be noted that the sequence of execution between the step S130 and the step S140 is not performed.
In step S150, in response to the number of failed quality inspection of the updated data being less than or equal to the preset number of failed quality inspection of the data threshold, the scheduling module determines a second target execution module from the plurality of execution modules, and sends a target data quality inspection scheme to the second target execution module, so that the second target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme.
The preset threshold value of the data quality inspection failure times can be set according to actual requirements.
In a specific implementation manner, when the number of failed quality inspection of the updated data is greater than a preset threshold value of the number of failed quality inspection of the data, the scheduling module confirms the failed quality inspection of the data and feeds back a message of failed quality inspection of the data to the client.
In some alternative embodiments, the second target execution module may be determined by a method specifically including:
in step S150-1, the scheduling module determines a second target task queue according to the complexity coefficient included in the target data quality inspection scheme, the corresponding relationship between the preset task queue and the complexity coefficient, and the preset scheduling algorithm, and sends the target data quality inspection scheme to the second target task queue.
The second target task queue may be the same as the first target task queue or may be different from the first target task queue. The manner of determining the second target task queue is the same as the manner of determining the first target task queue, and will not be described in detail here.
In step S150-2, the scheduling module determines the second target execution module based on the corresponding execution levels of the execution modules and the corresponding relation between the preset execution level and the task queue.
The second target execution module may be the same as the first target execution module or may be different from the first target execution module. The manner of determining the second target execution module is the same as that of determining the first target execution module, and will not be described in detail here.
In some alternative embodiments, the second target execution module may be determined by another method, including: the scheduling module determines a second target execution module from the other execution modules according to the current processing data quantity of the other execution modules and the execution level corresponding to the first target execution module.
The other execution modules are execution modules except the first target execution module in the plurality of execution modules. The current processing data quantity of the execution module is the current quality inspection data quantity of the execution module.
In a specific implementation, the scheduling module selects an execution module with an execution level higher than or equal to the execution level corresponding to the first target execution module from other execution modules as a preparation execution module; and then selecting the execution module with the least current processing data amount from the preparation execution modules as a second target execution module.
Step S160, in response to the success of the data quality inspection of the data to be inspected by the second target execution module, the second target execution module generates a data quality inspection result of the data to be inspected and sends the data quality inspection result to the client.
In one embodiment, when the second target execution module fails in data quality inspection of the data to be inspected, the scheduling module determines the number of data quality inspection failures corresponding to the data to be inspected, updates the number of data quality inspection failures to obtain updated data quality inspection failures, and when the number of data quality inspection failures is smaller than or equal to a preset data quality inspection failure number threshold, the scheduling module redetermines the second target execution module from the plurality of execution modules, and sends a target data quality inspection scheme to the redetermined second target execution module, so that the redetermined second target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme, and when the number of data quality inspection failures is larger than the preset data quality inspection failure number threshold, the scheduling module confirms the data quality inspection failures and feeds back a data quality inspection failure message to the client.
In the implementation of the invention, the data quality inspection system receives the data to be inspected sent by the client, and the access module acquires a target data quality inspection scheme of the data to be inspected; the scheduling module determines a first target execution module from a plurality of execution modules based on complexity coefficients included in a target data quality inspection scheme, and sends the target data quality inspection scheme to the first target execution module so that the first target execution module performs data quality inspection on data to be inspected based on the target data quality inspection scheme; and then when the first target execution module succeeds in data quality inspection of the data to be inspected, the first target execution module generates a data quality inspection result of the data to be inspected and sends the data quality inspection result to the client. In the embodiment of the invention, the access module determines the target quality inspection scheme of the data to be inspected according to the data to be inspected, so that the target data quality inspection scheme suitable for the data to be inspected can be determined according to the data to be inspected, the quality of the data quality inspection is improved, the first target execution module performs quality inspection on the data to be inspected according to the target data quality inspection scheme, the automatic quality inspection is realized, the efficiency of the data quality inspection is effectively improved, and the quality of the data quality inspection is further improved.
In addition, in the embodiment of the invention, the first target execution module is determined from the plurality of execution modules through the complexity coefficient included in the target data quality inspection scheme, so that the corresponding first target execution module is distributed to the data to be inspected according to the complexity of the data quality inspection, and the efficiency of the data quality inspection is further improved.
Meanwhile, in the implementation of the invention, when the first target execution module fails in data quality inspection of the data to be inspected, the scheduling module determines the data quality inspection failure times corresponding to the data to be inspected, and updates the data quality inspection failure times to obtain updated data quality inspection failure times; when the update data quality inspection failure times are smaller than or equal to a preset data quality inspection failure times threshold, the scheduling module determines a second target execution module from the plurality of execution modules and sends the target data quality inspection scheme to the second target execution module so that the second target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme; and when the second target execution module successfully performs data quality inspection on the data to be inspected, the second target execution module generates a data quality inspection result of the data to be inspected and sends the data quality inspection result to the client. Therefore, when the data quality inspection fails, the second target execution module for the data quality inspection can be efficiently redistributed to the data to be inspected, so that the user experience is improved, and the data quality inspection efficiency is further improved.
Fig. 3 is a block diagram of a data quality inspection apparatus according to an embodiment of the present invention. As shown in fig. 3, the data quality inspection device is applied to a data quality inspection system, and the data quality inspection system comprises: an access module, a scheduling module, and a plurality of execution modules, the apparatus comprising:
the scheme determining unit 200 is configured to, in response to receiving the to-be-inspected data sent by the client, obtain a target data quality inspection scheme of the to-be-inspected data by using the access module;
a first scheduling unit 210, configured to determine a first target execution module from the plurality of execution modules based on a complexity coefficient included in the target data quality inspection scheme, and send the target data quality inspection scheme to the first target execution module, so that the first target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme;
a first sending unit 220, configured to, in response to success of the first target execution module in checking the data quality of the data to be checked, generate a data quality check result of the data to be checked, and send the data quality check result to the client;
The retrieving unit 230 is configured to, in response to a failure of the first target execution module to quality test the data of the data to be tested, determine a number of data quality test failures corresponding to the data to be tested, and update the number of data quality test failures to obtain an updated number of data quality test failures;
a second scheduling unit 240, configured to, in response to the number of failed quality inspection times of the updated data being less than or equal to a preset number of failed quality inspection times threshold, determine a second target execution module from the plurality of execution modules, and send the target data quality inspection scheme to the second target execution module, so that the second target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme;
and the second sending unit 250 is configured to, in response to successful data quality inspection of the data to be inspected by the second target execution module, generate a data quality inspection result of the data to be inspected, and send the data quality inspection result to the client.
In one embodiment of the present invention, the execution level corresponding to any one of the plurality of execution modules in the embodiment of the present invention; the first scheduling unit 210 is specifically configured to:
The scheduling module determines a first target task queue according to a complexity coefficient, a corresponding relation between a preset task queue and the complexity coefficient and a preset scheduling algorithm, which are included in the target data quality inspection scheme, and sends the target data quality inspection scheme to the first target task queue;
the scheduling module determines the first target execution module based on the corresponding execution level of each execution module and the corresponding relation between the preset execution level and the task queue.
In one embodiment of the present invention, the second scheduling unit 240 in the embodiment of the present invention is specifically configured to:
the scheduling module determines a second target task queue according to the complexity coefficient, the corresponding relation between a preset task queue and the complexity coefficient and a preset scheduling algorithm, which are included in the target data quality inspection scheme, and sends the target data quality inspection scheme to the second target task queue;
the scheduling module determines the second target execution module based on the corresponding execution level of each execution module and the corresponding relation between the preset execution level and the task queue.
In one embodiment of the present invention, the second scheduling unit 240 in the embodiment of the present invention is specifically configured to:
And the scheduling module determines the second target execution module in the other execution modules according to the current processing data quantity of the other execution modules and the execution level corresponding to the first target execution module, wherein the other execution modules are the execution modules except the first target execution module in the plurality of execution modules.
In one embodiment of the present invention, the scheme determining unit 200 in the embodiment of the present invention is specifically configured to:
the access module determines the data type included in the data to be inspected;
and determining the target data quality inspection scheme according to the data type included in the data to be inspected and the corresponding relation between the preset data type and the data quality inspection scheme.
In one embodiment of the present invention, the data quality inspection system in the embodiment of the present invention further includes: a database;
the scheme determining unit 200 is further configured to allocate, to the to-be-inspected data, a data identifier for identifying the to-be-inspected data, and store the to-be-inspected data to the database.
In an embodiment of the present invention, after the scheduling module in the embodiment of the present invention determines, based on the complexity coefficient of the target data quality inspection scheme, a first target execution module from the plurality of execution modules, the method further includes:
And in response to the first target execution module receiving the target data quality inspection scheme, the first target execution module acquires the data to be inspected from the database according to the data identification in the target data quality inspection scheme.
In addition, the embodiment of the invention also provides electronic equipment, which comprises:
a memory for storing a computer program;
and the processor is used for executing the computer program stored in the memory, and when the computer program is executed, the data quality inspection method according to any one of the embodiments of the invention is realized.
Fig. 4 is a schematic structural diagram of an application embodiment of the electronic device of the present invention. Next, an electronic device according to an embodiment of the present invention is described with reference to fig. 4. The electronic device may be either or both of the first device and the second device, or a stand-alone device independent thereof, which may communicate with the first device and the second device to receive the acquired input signals therefrom.
As shown in fig. 4, the electronic device includes one or more processors and memory.
The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform the desired functions.
The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by a processor to implement the data quality inspection methods and/or other desired functions of the various embodiments of the present invention described above.
In one example, the electronic device may further include: input devices and output devices, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
In addition, the input device may include, for example, a keyboard, a mouse, and the like.
The output device may output various information including the determined distance information, direction information, etc., to the outside. The output devices may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device that are relevant to the present invention are shown in fig. 4 for simplicity, components such as buses, input/output interfaces, etc. being omitted. In addition, the electronic device may include any other suitable components depending on the particular application.
In addition to the methods and apparatus described above, embodiments of the invention may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a data quality inspection method according to various embodiments of the invention described in the above section of the specification.
The computer program product may write program code for performing operations of embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the invention may also be a computer-readable storage medium, on which computer program instructions are stored which, when executed by a processor, cause the processor to perform the steps in a data quality inspection method according to various embodiments of the invention described in the above section of the description.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Claims (10)

1. A data quality inspection method, applied to a data quality inspection system, the data quality inspection system comprising: the system comprises an access module, a scheduling module and a plurality of execution modules, wherein the method comprises the following steps:
in response to receiving the data to be inspected sent by the client, the access module acquires a target data quality inspection scheme of the data to be inspected;
the scheduling module determines a first target execution module from the plurality of execution modules based on complexity coefficients included in the target data quality inspection scheme, and sends the target data quality inspection scheme to the first target execution module so that the first target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme;
responding to successful data quality inspection of the data to be inspected by the first target execution module, generating a data quality inspection result of the data to be inspected by the first target execution module, and sending the data quality inspection result to the client;
responding to the failure of the first target execution module to the data quality inspection of the data to be inspected, determining the failure times of the data quality inspection corresponding to the data to be inspected by the dispatching module, and updating the failure times of the data quality inspection to obtain updated data quality inspection failure times;
In response to the update data quality inspection failure times being less than or equal to a preset data quality inspection failure times threshold, the scheduling module determines a second target execution module from the plurality of execution modules, and sends the target data quality inspection scheme to the second target execution module, so that the second target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme;
and responding to the successful data quality inspection of the data to be inspected by the second target execution module, generating a data quality inspection result of the data to be inspected by the second target execution module, and sending the data quality inspection result to the client.
2. The method of claim 1, wherein any of the plurality of execution modules corresponds to an execution level;
the scheduling module determines a first target execution module from the plurality of execution modules based on a complexity coefficient of the target data quality inspection scheme, including:
the scheduling module determines a first target task queue according to a complexity coefficient, a corresponding relation between a preset task queue and the complexity coefficient and a preset scheduling algorithm, which are included in the target data quality inspection scheme, and sends the target data quality inspection scheme to the first target task queue;
The scheduling module determines the first target execution module based on the corresponding execution level of each execution module and the corresponding relation between the preset execution level and the task queue.
3. The method of claim 2, wherein the scheduling module determining a second target execution module from the plurality of execution modules comprises:
the scheduling module determines a second target task queue according to the complexity coefficient, the corresponding relation between a preset task queue and the complexity coefficient and a preset scheduling algorithm, which are included in the target data quality inspection scheme, and sends the target data quality inspection scheme to the second target task queue;
the scheduling module determines the second target execution module based on the corresponding execution level of each execution module and the corresponding relation between the preset execution level and the task queue.
4. The method of claim 2, wherein the scheduling module determining a second target execution module from the plurality of execution modules comprises:
and the scheduling module determines the second target execution module in the other execution modules according to the current processing data quantity of the other execution modules and the execution level corresponding to the first target execution module, wherein the other execution modules are the execution modules except the first target execution module in the plurality of execution modules.
5. The method according to any of claims 1-4, wherein the access module obtaining a target data quality inspection plan for the data to be inspected, comprises:
the access module determines the data type included in the data to be inspected;
and determining the target data quality inspection scheme according to the data type included in the data to be inspected and the corresponding relation between the preset data type and the data quality inspection scheme.
6. The method of claim 1, wherein the data quality inspection system further comprises: a database;
the method comprises the following steps:
the access module distributes data identification for identifying the data to be inspected to the data to be inspected, and stores the data to be inspected to the database.
7. The method of claim 6, wherein the scheduling module, after determining a first target execution module from the plurality of execution modules based on the complexity factor of the target data quality inspection scheme, further comprises:
and in response to the first target execution module receiving the target data quality inspection scheme, the first target execution module acquires the data to be inspected from the database according to the data identification in the target data quality inspection scheme.
8. A data quality inspection device, characterized in that it is applied to a data quality inspection system, said data quality inspection system comprising: an access module, a scheduling module, and a plurality of execution modules, the apparatus comprising:
the scheme determining unit is used for responding to the received data to be inspected sent by the client, and the access module acquires a target data quality inspection scheme of the data to be inspected;
the first scheduling unit is used for determining a first target execution module from the plurality of execution modules based on complexity coefficients included in the target data quality inspection scheme, and sending the target data quality inspection scheme to the first target execution module so that the first target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme;
the first sending unit is used for responding to the success of the first target execution module in checking the data quality of the data to be checked, generating a data quality check result of the data to be checked by the first target execution module, and sending the data quality check result to the client;
the scheduling module is used for determining the data quality inspection failure times corresponding to the data to be inspected and updating the data quality inspection failure times to obtain updated data quality inspection failure times;
The second scheduling unit is used for responding to the fact that the update data quality inspection failure times are smaller than or equal to a preset data quality inspection failure times threshold, determining a second target execution module from the plurality of execution modules by the scheduling module, and sending the target data quality inspection scheme to the second target execution module so that the second target execution module performs data quality inspection on the data to be inspected based on the target data quality inspection scheme;
the second sending unit is used for responding to the success of the second target execution module in checking the data quality of the data to be checked, generating a data quality check result of the data to be checked by the second target execution module, and sending the data quality check result to the client.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing a computer program stored in said memory, and said computer program, when executed, implementing a data quality inspection method as claimed in any one of the preceding claims 1-7.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements a data quality inspection method according to any of the preceding claims 1-7.
CN202311566927.7A 2023-11-23 2023-11-23 Data quality inspection method and device, electronic equipment and storage medium Pending CN117271501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311566927.7A CN117271501A (en) 2023-11-23 2023-11-23 Data quality inspection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311566927.7A CN117271501A (en) 2023-11-23 2023-11-23 Data quality inspection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117271501A true CN117271501A (en) 2023-12-22

Family

ID=89209165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311566927.7A Pending CN117271501A (en) 2023-11-23 2023-11-23 Data quality inspection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117271501A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200250074A1 (en) * 2019-02-04 2020-08-06 Oracle International Corporation Test Orchestration Platform
CN112540914A (en) * 2020-11-27 2021-03-23 北京百度网讯科技有限公司 Execution method, execution device, server and storage medium for unit test
CN114925153A (en) * 2022-05-26 2022-08-19 广州城市信息研究所有限公司 Service-based geographic information data quality detection method, device and equipment
CN116483811A (en) * 2022-12-28 2023-07-25 浙江省测绘科学技术研究院 Real-time synchronous quality inspection method and device for geographic information data production process and computer equipment thereof
CN116777284A (en) * 2023-06-27 2023-09-19 上海飞未信息技术有限公司 Space and attribute data integrated quality inspection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200250074A1 (en) * 2019-02-04 2020-08-06 Oracle International Corporation Test Orchestration Platform
CN112540914A (en) * 2020-11-27 2021-03-23 北京百度网讯科技有限公司 Execution method, execution device, server and storage medium for unit test
CN114925153A (en) * 2022-05-26 2022-08-19 广州城市信息研究所有限公司 Service-based geographic information data quality detection method, device and equipment
CN116483811A (en) * 2022-12-28 2023-07-25 浙江省测绘科学技术研究院 Real-time synchronous quality inspection method and device for geographic information data production process and computer equipment thereof
CN116777284A (en) * 2023-06-27 2023-09-19 上海飞未信息技术有限公司 Space and attribute data integrated quality inspection method

Similar Documents

Publication Publication Date Title
CN109144696B (en) Task scheduling method and device, electronic equipment and storage medium
US7493597B2 (en) System and method for model based generation of application programming interface test code
CN110608982A (en) Detection method, detection device, mobile equipment, electronic equipment and storage medium
US20190245766A1 (en) Performance evaluation method, apparatus for performance evaluation, and non-transitory computer-readable storage medium for storing program
CN108492005B (en) Project data processing method and device, computer equipment and storage medium
CN110554958A (en) Graph database testing method, system, device and storage medium
CN111324441A (en) Operating environment switching method and device, computer equipment and storage medium
CN110058920A (en) Virtual machine performance detection method and device, electronic equipment, storage medium
CN112698952A (en) Unified management method and device for computing resources, computer equipment and storage medium
US20200394080A1 (en) Load distribution for integration scenarios
CN110647318A (en) Method, device, equipment and medium for creating instance of stateful application
CN112817869A (en) Test method, test device, test medium, and electronic apparatus
CN113076901A (en) Model stability interpretation method, device, equipment and storage medium
CN117271501A (en) Data quality inspection method and device, electronic equipment and storage medium
CN116974874A (en) Database testing method and device, electronic equipment and readable storage medium
CN116662132A (en) Evaluation method, virtual deployment method, computer device, and storage medium
CN112084114B (en) Method and apparatus for testing interfaces
CN113760768A (en) Test method, monitoring platform, electronic equipment and storage medium
CN113518974A (en) System and method for finding and identifying computing nodes in a network
CN112258116A (en) Position coverage range data updating method of logistics service and related equipment
CN112181825A (en) Test case library construction method and device, electronic equipment and medium
CN115486028B (en) Health checking method and device, electronic equipment and storage medium
CN115460101B (en) Network service management method, device, equipment and storage medium
CN116127342B (en) Information clustering processing method, system and platform based on hotel
CN112632992B (en) Test method, test device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination