CA3177209A1 - Data cleaning method - Google Patents
Data cleaning methodInfo
- Publication number
- CA3177209A1 CA3177209A1 CA3177209A CA3177209A CA3177209A1 CA 3177209 A1 CA3177209 A1 CA 3177209A1 CA 3177209 A CA3177209 A CA 3177209A CA 3177209 A CA3177209 A CA 3177209A CA 3177209 A1 CA3177209 A1 CA 3177209A1
- Authority
- CA
- Canada
- Prior art keywords
- data
- cleaned
- field
- rule
- deleting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000004140 cleaning Methods 0.000 title claims description 51
- 238000001914 filtration Methods 0.000 claims abstract description 63
- 230000008569 process Effects 0.000 claims description 35
- 238000004590 computer program Methods 0.000 claims description 28
- 230000015654 memory Effects 0.000 claims description 20
- 230000000717 retained effect Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 description 17
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000002950 deficient Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000008187 granular material Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Preliminary Treatment Of Fibers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A data cleansing method. The method comprises: acquiring data from a first data source, and establishing an independent data stream by using the acquired data (101); filtering the data in the data stream to obtain data to be cleansed (102); deleting or filling a field comprising a missing value in the data to be cleansed, to obtain preliminary cleansed data (103); detecting whether the preliminary cleansed data conforms to a preset determination rule, and deleting the data not conforming to the determination rule to obtain final cleansed data (104); and outputting the final cleansed data to a second data source (105). By using the above-mentioned method, data security can be improved.
Description
DATA CLEANING METHOD
BACKGROUND OF THE INVENTION
Technical Field [0001] The present application relates to the field of big data processing technology, and more particularly to a data processing method.
Description of Related Art
BACKGROUND OF THE INVENTION
Technical Field [0001] The present application relates to the field of big data processing technology, and more particularly to a data processing method.
Description of Related Art
[0002] With the advent of the Age of Network, large quantities of information data incessantly rush into the network, and data quantities are increased each year at a speed of 50%.
Under the support of colossal data sources, enterprise decisions are more and more based on data analyses, rather than the mere reliance on experience and intuition as traditional case. Data cleaning is an indispensable link in the data analyzing process as a whole, and its resultant quality directly affects model effect and the final data analyzing conclusion.
Data cleaning means a process to recheck and verify the data, and aims to delete repetitive data, rectify existing errors, and ensure consistency of data. In practical operations, data cleaning usually occupies 50% to 80% of the time of the entire data analyzing process.
Under the support of colossal data sources, enterprise decisions are more and more based on data analyses, rather than the mere reliance on experience and intuition as traditional case. Data cleaning is an indispensable link in the data analyzing process as a whole, and its resultant quality directly affects model effect and the final data analyzing conclusion.
Data cleaning means a process to recheck and verify the data, and aims to delete repetitive data, rectify existing errors, and ensure consistency of data. In practical operations, data cleaning usually occupies 50% to 80% of the time of the entire data analyzing process.
[0003] Data cleaning includes two types as offline data cleaning and real-time data cleaning, by which offline data cleaning data can be cleaned with more refined granules by means of complicated processing at the expense of performance, and such cleaning includes missing value processing, abnormal value processing, repetitive value processing, null value filling, unifying units, whether to perform standardized processing, whether to delete unnecessary variables, and whether to sort, etc.; in comparison with offline data cleaning, due to its requirement on real time, real-time cleaning is more adapted to missing value filling, filtering and data legitimacy checking of data, but the currently Date Regue/Date Received 2022-09-27 available data cleaning process is usually integral with the data analyzing process, coupling between the two is large, the data cleaning process is greatly affected by the function of data-analyzing of other codes, data loss easily tends to occur, and data security is rendered inferior.
SUMMARY OF THE INVENTION
SUMMARY OF THE INVENTION
[0004] In view of the above technical problems, there is an urgent need to propose a data cleaning method capable of enhancing data security.
[0005] There is provided a data cleaning method that comprises:
[0006] obtaining data from a first data source, and creating an independent data stream by employing the obtained data;
[0007] subjecting data in the data stream to a filtering process, and obtaining data to be cleaned;
[0008] deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data;
[0009] detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and
[0010] outputting the finally cleaned data to a second data source.
[0011] In one of the embodiments, the step of deleting or filling in any field containing missing values in the data to be cleaned includes:
[0012] calculating to obtain a missing rate of the field according to a proportion of number of pieces of the missing values of the field in a total number of pieces;
[0013] determining an attribute importance degree of the field according to an index required to be analyzed; and
[0014] deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field.
Date Regue/Date Received 2022-09-27
Date Regue/Date Received 2022-09-27
[0015] In one of the embodiments, the step of deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field includes:
[0016] filling in the field when the missing rate of the field is lower than a preset missing rate threshold and the attribute importance degree thereof is lower than a preset importance grading threshold;
[0017] deleting the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is lower than the preset importance grading threshold; and
[0018] complementing the missing values of the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is greater than the preset importance grading threshold.
[0019] In one of the embodiments, the method further comprises:
[0020] probing metadata that describes data attribute of the data in the first data source, analyzing to obtain any quality problem present in the data according to the metadata, and setting a filtering rule according to the quality problem;
[0021] the step of subjecting data in the data stream to a filtering process, and obtaining data to be cleaned includes: subjecting data in the data stream to a filtering process according to the filtering rule, and obtaining data to be cleaned.
[0022] In one of the embodiments, the step of subjecting data in the data stream to a filtering process includes:
[0023] row-grade filtering, whereby any row not required in the data is removed; and
[0024] column-grade filtering, whereby, when one row has plural columns, fields to which any required column corresponds are merely selected and retained.
[0025] In one of the embodiments, the preset judging rule includes a legitimacy rule and a logic Date Regue/Date Received 2022-09-27 rule, and the step of detecting whether the preliminarily cleaned data conforms to a preset judging rule includes:
[0026] setting the preliminarily cleaned data as a maximum value that conforms to the legitimacy rule, or deleting the data, if the preliminarily cleaned data does not conform to the legitimacy rule; and
[0027] deleting the preliminarily cleaned data and generating an alarming instruction, if the preliminarily cleaned data does not conform to the logic rule.
[0028] In one of the embodiments, the first data source and the second data source are of different data types of the same and single distributed messaging system, the distributed messaging system is Kafka, the first data source and the second data source are two different Topics of Kafka, and the data stream is embodied as a data stream based on Spark Streaming.
[0029] There is provided a data cleaning device that comprises:
[0030] a data obtaining module, for obtaining data from a first data source, and creating an independent data stream by employing the obtained data;
[0031] a data filtering module, for subjecting data in the data stream to a filtering process, and obtaining data to be cleaned;
[0032] a preliminarily cleaning module, for deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data;
[0033] a finally cleaning module, for detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and
[0034] a data outputting module, for outputting the finally cleaned data to a second data source.
[0035] There is provided a computer equipment that comprises a memory, a processor, and a computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:
[0036] obtaining data from a first data source, and creating an independent data stream by Date Regue/Date Received 2022-09-27 employing the obtained data;
[0037] subjecting data in the data stream to a filtering process, and obtaining data to be cleaned;
[0038] deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data;
[0039] detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and
[0040] outputting the finally cleaned data to a second data source.
[0041] There is provided a computer-readable storage medium that stores thereon a computer program, and the following steps are realized when the computer program is executed by a processor:
[0042] obtaining data from a first data source, and creating an independent data stream by employing the obtained data;
[0043] subjecting data in the data stream to a filtering process, and obtaining data to be cleaned;
[0044] deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data;
[0045] detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and
[0046] outputting the finally cleaned data to a second data source.
[0047] In comparison with prior-art technology, the present invention achieves the following advantageous effects.
[0048] In the data cleaning method, and corresponding device, computer equipment and storage medium, data cleaning is performed by creating an independent data stream, and data obtained from a first data source is cleaned and thereafter placed in another data source for processing by subsequent businesses, so that the data cleaning process is separated Date Regue/Date Received 2022-09-27 from data analyzing codes, coupling among the codes is reduced, and data security is effectively enhanced.
[0049] Moreover, data filtering is placed as the first step of data cleaning in the present invention, whereby reducing quantity of data to be subsequently cleaned, and enhancing the efficiency in cleaning the data.
BRIEF DESCRIPTION OF THE DRAWINGS
BRIEF DESCRIPTION OF THE DRAWINGS
[0050] Fig. 1 is a flowchart schematically illustrating the data cleaning method in an embodiment;
and
and
[0051] Fig. 2 is a block diagram illustrating the structure of the data cleaning device in an embodiment.
DETAILED DESCRIPTION OF THE INVENTION
DETAILED DESCRIPTION OF THE INVENTION
[0052] In order to make the objectives, technical solutions and advantages of the present application more lucid and clear, the present application is described in greater detail below with reference to accompanying drawings and embodiments. As should be understood, the specific embodiments as described here are merely meant to explain the present application, rather than to restrict the present application.
[0053] In one embodiment, as shown in Fig. 1, the present application provides a data cleaning method that comprises the following steps.
[0054] Step 101 - obtaining data from a first data source, and creating an independent data stream by employing the obtained data.
[0055] The first data source is a source from which data is obtained, and the data stream is a set Date Regue/Date Received 2022-09-27 of orderly data sequence of nodes with starting points and ending points.
[0056] Specifically, by creating an independent data stream for data cleaning, the present invention separates the data cleaning process from data analyzing codes, and reduces coupling among the codes.
[0057] Step 102 - subjecting data in the data stream to a filtering process, and obtaining data to be cleaned.
[0058] Specifically, data filtering is placed as the first step of data cleaning, whereby can effectively reduce quantity of data to be subsequently cleaned, and greatly enhance the efficiency in cleaning the data.
[0059] Step 103 - deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data.
[0060] The missing values mean information deficient in the data, that is to say, one or some attribute(s) of the data is/are incomplete in value(s).
[0061] Step 104 - detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data.
[0062] Step 105 - outputting the finally cleaned data to a second data source.
[0063] The second data source is another data source that is different from the first data source, and it is employed to store data to be used or processed by subsequent businesses.
[0064] Specifically, the data cleaning process of the present invention is independent of other Date Regue/Date Received 2022-09-27 processing processes of data analysis, and is not affected by other codes, so security of data is higher.
[0065] In the data cleaning method, data cleaning is performed by creating an independent data stream, and data obtained from a first data source is cleaned and thereafter placed in another data source for processing by subsequent businesses, so that the data cleaning process is separated from data analyzing codes, coupling among the codes is reduced, and data security is effectively enhanced.
[0066] As one of specific modes of execution, the first data source and the second data source are of different data types of the same and single distributed messaging system, for instance, the distributed messaging system is Kafka, the first data source and the second data source are two different Topics of Kafka, and the data stream is embodied as a data stream based on Spark Streaming.
[0067] In one of the embodiments, the step of deleting or filling in any field containing missing values in the data to be cleaned includes:
[0068] calculating to obtain a missing rate of the field according to a proportion of number of pieces of the missing values of the field in a total number of pieces;
[0069] determining an attribute importance degree of the field according to an index required to be analyzed; and
[0070] deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field.
[0071] The missing rate of the field is the proportion of the number of pieces of the missing values of the field in the total number of pieces.
[0072] For instance, there are altogether 100 pieces of records in a salary field, and 20 pieces of records are missing values, then the missing rate is 20%.
Date Regue/Date Received 2022-09-27
Date Regue/Date Received 2022-09-27
[0073] Judging criterion of the attribute importance degree of the field is decided by the index required to be analyzed, for example, it is required to portray or label users so as to supply data for subsequent precise marketing, it is then required to collect attribute information of the users, for instance, such attribute information as ages and genders of the users are important fields.
[0074] In one of the embodiments, the step of deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field includes:
[0075] filling in the field when the missing rate of the field is lower than a preset missing rate threshold and the attribute importance degree thereof is lower than a preset importance grading threshold.
[0076] Specifically, if the field attribute is numerical-type data, it suffices to fill in the field according to the circumstance of data distribution; further specifically, if data is unifolinly distributed, the field is filled in by means of a mean value; if data is distributed in a skewed manner, the field is filled in by means of a median;
[0077] deleting the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is lower than the preset importance grading threshold; and
[0078] complementing the missing values of the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is greater than the preset importance grading threshold.
[0079] Specifically, the step of complementing the missing values of the field includes:
[0080] complementing through other information, such as using an ID card number to reckon gender, native place, date of birth, and age, etc.;
[0081] complementing through foregoing and following data, for instance, when data is deficient in a time sequence, foregoing and following mean values can be used to serve as Date Regue/Date Received 2022-09-27 complementary values, when there are many missing values, numerical values obtained through a smoothening process can serve as the complementary values;
[0082] where it is impossible to complement, removal is necessitated, but deletion should not be made for possible use subsequently.
[0083] As one of specific modes of execution, the missing rate threshold can be any numeral value between 90% and 95%.
[0084] In one of the embodiments, before data in the data stream is subjected to a filtering process, metadata that describes data attribute of the data in the first data source is firstly probed, any quality problem present in the data is then analyzed and obtained according to the metadata, a filtering rule is set according to the quality problem, the data in the data stream is subjected to a filtering process according to the filtering rule, and the data to be cleaned is obtained in Step 102.
[0085] Metadata is also referred to as intermediary data, relay data, and it is data that describes data, mainly describing information of data attributes, and supporting such functions as indicating storage locations, historical data, resource searching, and document recording, etc.
[0086] Specifically, the data attribute required to be processed is packaged into metadata, thus enabling the program to possess better expandability. At the same time, a corresponding filtering rule is stipulated with respect to any quality problem of the data, thus facilitating enhancement of data filtering efficiency.
[0087] In one of the embodiments, the step of subjecting data in the data stream to a filtering process includes:
[0088] row-grade filtering, whereby any row not required in the data is removed; and
[0089] column-grade filtering, whereby, when one row has plural columns, fields to which any Date Regue/Date Received 2022-09-27 required column corresponds are merely selected and retained.
[0090] Specifically, the combination of row-grade filtering with column-grade filtering makes it possible to effectively quicken the data filtering speed.
[0091] For instance, a process to calculate pv/uv by divided channels:
[0092] the log data includes approximately 200 such fields as the IP address, browser information, client terminal equipment information, the specific access time, the specific page accessed, the page previously accessed, and the access time duration, etc., the requirement in this embodiment is to count the clicking amount of each channel and the access amount of the independent IP.
[0093] By row-grade filtering, log data relevant to the channels is selected and retained only, so that log data not containing the channels is filtered away;
[0094] by column-grade filtering, cid (channel name), uid (equipment identification), and ip address are selected from the approximately 200 fields contained in the log data relevant to the channels, unnecessary fields are filtered away, and it is then possible to count and obtain pv/nv of each channel;
[0095] pv is an acronym of Page View, namely page browsing amount, one access by a user to a certain page in a website is recorded once, and the amount of multiple accesses by the user to the same and single page becomes the total number of pv;
[0096] uv is an acronym of unique visitor, and means a natural person that accesses to and browses the page through the internet.
[0097] In this embodiment, in consideration of expandability, for instance, it might be required to count a retention rate of users in subsequent data processing, it is possible to further record such data as the access time of each ip address, and so on.
[0098] The retention rate of users is a ratio of old users to the total users.
Date Regue/Date Received 2022-09-27
Date Regue/Date Received 2022-09-27
[0099] In one of the embodiments, the preset judging rule includes a legitimacy rule and a logic rule, and the step of detecting whether the preliminarily cleaned data conforms to a preset judging rule includes:
[0100] setting the preliminarily cleaned data as a maximum value that conforms to the legitimacy rule, or deleting the data, if the preliminarily cleaned data does not conform to the legitimacy rule; and
[0101] deleting the preliminarily cleaned data and generating an alarming instruction, if the preliminarily cleaned data does not conform to the logic rule.
[0102] The legitimacy rule is such format requirement rule as numerical values, dates, and field contents, etc.
[0103] Specifically, field-type legitimacy rule: a date field format is "YYYY-MM-DD".
[0104] Field content legitimacy rule: the gender is male, female, or unknown;
the date of birth is earlier than or equal to "today".
the date of birth is earlier than or equal to "today".
[0105] The logic rule is a rule of common sense used for judging whether the data conforms to logics, for instance, ages of people usually lie between 0 and 120, and any piece of data is judged as abnormal if the age of 200 appears therein.
[0106] After the data has been cleaned by the legitimacy rule and the logic rule, any data not conforming to format requirements and logics is removed, and valid, finally cleaned data is obtained.
[0107] As should be understood, although the various steps in the flowchart of Fig. 1 are sequentially displayed as indicated by arrows, these steps are not necessarily executed in the sequences indicated by arrows. Unless otherwise explicitly noted in this paper, Date Regue/Date Received 2022-09-27 execution of these steps is not restricted by any sequence, as these steps can also be executed in other sequences (than those indicated in the drawings). Moreover, at least partial steps in the flowchart of Fig. 1 may include plural sub-steps or multi-phases, these sub-steps or phases are not necessarily completed at the same timing, but can be executed at different timings, and these sub-steps or phases are also not necessarily sequentially performed, but can be performed in turns or alternately with other steps or with at least some of sub-steps or phases of other steps.
[0108] In one embodiment, as shown in Fig. 2, there is provided a data cleaning device that comprises a data obtaining module, a data filtering module, a preliminarily cleaning module, a finally cleaning module, and a data outputting module, of which:
[0109] the data obtaining module is employed for obtaining data from a first data source, and creating an independent data stream by employing the obtained data;
[0110] the data filtering module is employed for subjecting data in the data stream to a filtering process, and obtaining data to be cleaned;
[0111] the preliminarily cleaning module is employed for deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data;
[0112] the finally cleaning module is employed for detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and
[0113] the data outputting module is employed for outputting the finally cleaned data to a second data source.
[0114] During specific implementation, the first data source and the second data source are of different data types of the same and single distributed messaging system.
[0115] In one embodiment, the preliminarily cleaning module includes a missing rate sub-module, an importance degree sub-module, and a missing value processing sub-module, Date Regue/Date Received 2022-09-27 of which:
[0116] the missing rate sub-module is employed for calculating to obtain a missing rate of the field according to a proportion of number of pieces of the missing values of the field in a total number of pieces;
[0117] the importance degree sub-module is employed for determining an attribute importance degree of the field according to an index required to be analyzed; and
[0118] the missing value processing sub-module is employed for deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field.
[0119] Further, the missing value processing sub-module includes a comparing unit and a preliminarily processing unit, of which:
[0120] the comparing unit is employed for comparing the missing rate and the attribute importance degree of the field respectively with a preset missing rate threshold and a preset importance grading threshold, and the preliminarily processing unit is employed for filling in, deleting or complementing the field:
[0121] filling in the field when the missing rate of the field is lower than the preset missing rate threshold and the attribute importance degree thereof is lower than the preset importance grading threshold;
[0122] deleting the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is lower than the preset importance grading threshold; and
[0123] complementing the missing values of the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is greater than the preset importance grading threshold.
[0124] In one embodiment, the data cleaning device further comprises a data probing module for firstly probing metadata that describes data attribute of the data in the first data source before the data in the data stream is subjected to a filtering process, then analyzing to Date Regue/Date Received 2022-09-27 obtain any quality problem present in the data according to the metadata, and setting a filtering rule according to the quality problem.
[0125] In one embodiment, the data filtering module includes a row-grade filtering unit and a column-grade filtering unit, of which:
[0126] the row-grade filtering unit is employed for removing any row not required in the data;
and the column-grade filtering unit is employed for, when one row has plural columns, merely selecting and retaining fields to which any required column corresponds.
and the column-grade filtering unit is employed for, when one row has plural columns, merely selecting and retaining fields to which any required column corresponds.
[0127] In one embodiment, the finally cleaning module includes a legitimacy detecting unit, a logics detecting unit, and a finally processing unit, of which:
[0128] the legitimacy detecting unit is employed for detecting whether the preliminarily cleaned data conforms to a preset legitimacy rule;
[0129] the logics detecting unit is employed for detecting whether the preliminarily cleaned data conforms to a preset logic rule; and
[0130] the finally processing unit is employed for setting the preliminarily cleaned data not conforming to the legitimacy rule as a maximum value that conforms to the legitimacy rule, or deleting the data; and deleting the preliminarily cleaned data not conforming to the logic rule, and generating an alarming instruction.
[0131] Specific definitions relevant to the data cleaning device may be inferred from the aforementioned definitions to the data cleaning method, so no repetition is made in this context. The various modules in the aforementioned data cleaning device can be wholly or partly realized via software, hardware, and a combination of software with hardware.
The various modules can be embedded in the form of hardware in a processor in a computer equipment or independent of any computer equipment, and can also be stored in the form of software in a memory in a computer equipment, so as to facilitate the processor to invoke and perform operations corresponding to the aforementioned various modules.
Date Regue/Date Received 2022-09-27
The various modules can be embedded in the form of hardware in a processor in a computer equipment or independent of any computer equipment, and can also be stored in the form of software in a memory in a computer equipment, so as to facilitate the processor to invoke and perform operations corresponding to the aforementioned various modules.
Date Regue/Date Received 2022-09-27
[0132] In one embodiment, a computer equipment is provided, and the computer equipment can be a terminal. The computer equipment comprises a processor, a memory, a network interface, a display screen and an input means connected to each other via a system bus.
The processor of the computer equipment is employed to provide computing and controlling capabilities. The memory of the computer equipment includes a nonvolatile storage medium and an internal memory. The nonvolatile storage medium stores therein an operating system and a computer program. The internal memory provides environment for the running of the operating system and the computer program in the nonvolatile storage medium. The network interface of the computer equipment is employed to connect to an external terminal via network for communication. The computer program realizes a data cleaning method when it is executed by a processor. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, the input means of the computer equipment can be a touch layer covering on the display screen, can also be a press button, a track ball or a touch control board disposed on the housing of the computer equipment, and can further be an externally connected keyboard, touch control board or mouse, etc.
The processor of the computer equipment is employed to provide computing and controlling capabilities. The memory of the computer equipment includes a nonvolatile storage medium and an internal memory. The nonvolatile storage medium stores therein an operating system and a computer program. The internal memory provides environment for the running of the operating system and the computer program in the nonvolatile storage medium. The network interface of the computer equipment is employed to connect to an external terminal via network for communication. The computer program realizes a data cleaning method when it is executed by a processor. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, the input means of the computer equipment can be a touch layer covering on the display screen, can also be a press button, a track ball or a touch control board disposed on the housing of the computer equipment, and can further be an externally connected keyboard, touch control board or mouse, etc.
[0133] In one embodiment, there is provided a computer equipment that comprises a memory, a processor and a computer program stored on the memory and operable on the processor, and the following steps are realized when the processor executes the computer program:
obtaining data from a first data source, and creating an independent data stream by employing the obtained data; subjecting data in the data stream to a filtering process, and obtaining data to be cleaned; deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data; detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and outputting the finally cleaned data to a second data source.
Date Regue/Date Received 2022-09-27
obtaining data from a first data source, and creating an independent data stream by employing the obtained data; subjecting data in the data stream to a filtering process, and obtaining data to be cleaned; deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data; detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and outputting the finally cleaned data to a second data source.
Date Regue/Date Received 2022-09-27
[0134] In one embodiment, when the processor executes the computer program, the following steps are further realized: calculating to obtain a missing rate of the field according to a proportion of number of pieces of the missing values of the field in a total number of pieces; determining an attribute importance degree of the field according to an index required to be analyzed; and deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field.
[0135] In one embodiment, when the processor executes the computer program, the following steps are further realized: filling in the field when the missing rate of the field is lower than a preset missing rate threshold and the attribute importance degree thereof is lower than a preset importance grading threshold; deleting the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is lower than the preset importance grading threshold; and complementing the missing values of the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is greater than the preset importance grading threshold.
[0136] In one embodiment, when the processor executes the computer program, the following steps are further realized: probing metadata that describes data attribute of the data in the first data source, analyzing to obtain any quality problem present in the data according to the metadata, and setting a filtering rule according to the quality problem;
and subjecting data in the data stream to a filtering process according to the filtering rule, and obtaining data to be cleaned.
and subjecting data in the data stream to a filtering process according to the filtering rule, and obtaining data to be cleaned.
[0137] In one embodiment, when the processor executes the computer program, the following steps are further realized: row-grade filtering, whereby any row not required in the data is removed; and column-grade filtering, whereby, when one row has plural columns, fields to which any required column corresponds are merely selected and retained.
Date Regue/Date Received 2022-09-27
Date Regue/Date Received 2022-09-27
[0138] The preset judging rule includes a legitimacy rule and a logic rule, in one embodiment, when the processor executes the computer program, the following steps are further realized: setting the preliminarily cleaned data as a maximum value that conforms to the legitimacy rule, or deleting the data, if the preliminarily cleaned data does not conform to the legitimacy rule; and deleting the preliminarily cleaned data and generating an alarming instruction, if the preliminarily cleaned data does not conform to the logic rule.
[0139] In one embodiment, there is provided a computer-readable storage medium storing thereon a computer program, and the following steps are realized when the computer program is executed by a processor: obtaining data from a first data source, and creating an independent data stream; subjecting data in the data stream by employing the obtained data to a filtering process, and obtaining data to be cleaned; deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data; detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and outputting the finally cleaned data to a second data source.
[0140] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: calculating to obtain a missing rate of the field according to a proportion of number of pieces of the missing values of the field in a total number of pieces; determining an attribute importance degree of the field according to an index required to be analyzed; and deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field.
[0141] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: filling in the field when the missing rate of the field is lower than a preset missing rate threshold and the attribute importance degree thereof is lower than a preset importance grading threshold; deleting the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance Date Regue/Date Received 2022-09-27 degree thereof is lower than the preset importance grading threshold; and complementing the missing values of the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is greater than the preset importance grading threshold.
[0142] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: probing metadata that describes data attribute of the data in the first data source, analyzing to obtain any quality problem present in the data according to the metadata, and setting a filtering rule according to the quality problem;
and subjecting data in the data stream to a filtering process according to the filtering rule, and obtaining data to be cleaned.
and subjecting data in the data stream to a filtering process according to the filtering rule, and obtaining data to be cleaned.
[0143] In one embodiment, when the computer program is executed by a processor, the following steps are further realized: row-grade filtering, whereby any row not required in the data is removed; and column-grade filtering, whereby, when one row has plural columns, fields to which any required column corresponds are merely selected and retained.
[0144] The preset judging rule includes a legitimacy rule and a logic rule, in one embodiment, when the computer program is executed by a processor, the following steps are further realized: setting the preliminarily cleaned data as a maximum value that conforms to the legitimacy rule, or deleting the data, if the preliminarily cleaned data does not conform to the legitimacy rule; and deleting the preliminarily cleaned data and generating an alarming instruction, if the preliminarily cleaned data does not conform to the logic rule.
[0145] As comprehensible to persons ordinarily skilled in the art, the entire or partial flows in the methods according to the aforementioned embodiments can be completed via a computer program instructing relevant hardware, the computer program can be stored in a nonvolatile computer-readable storage medium, and the computer program can include the flows as embodied in the aforementioned various methods when executed. Any Date Regue/Date Received 2022-09-27 reference to the memory, storage, database or other media used in the various embodiments provided by the present application can all include nonvolatile and/or volatile memory/memories. The nonvolatile memory can include a read-only memory (ROM), a programmable ROM (PROM), an electrically programmable ROM (EPROM), an electrically erasable and programmable ROM (EEPROM) or a flash memory. The volatile memory can include a random access memory (RAM) or an external cache memory. To serve as explanation rather than restriction, the RAM is obtainable in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM
(SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM
(RDRAM), etc.
(SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM
(RDRAM), etc.
[0146] Technical features of the aforementioned embodiments are randomly combinable, while all possible combinations of the technical features in the aforementioned embodiments are not exhausted for the sake of brevity, but all these should be considered to fall within the scope recorded in the Description as long as such combinations of the technical features are not mutually contradictory.
[0147] The foregoing embodiments are merely directed to several modes of execution of the present application, and their descriptions are relatively specific and detailed, but they should not be hence misunderstood as restrictions to the inventive patent scope. As should be pointed out, persons with ordinary skill in the art may further make various modifications and improvements without departing from the conception of the present application, and all these should pertain to the protection scope of the present application.
Accordingly, the patent protection scope of the present application shall be based on the attached Claims.
Date Regue/Date Received 2022-09-27
Accordingly, the patent protection scope of the present application shall be based on the attached Claims.
Date Regue/Date Received 2022-09-27
Claims (10)
1. A data cleaning method, characterized in that the method comprises:
obtaining data from a first data source, and creating an independent data stream by employing the obtained data;
subjecting data in the data stream to a filtering process, and obtaining data to be cleaned;
deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data;
detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and outputting the finally cleaned data to a second data source.
obtaining data from a first data source, and creating an independent data stream by employing the obtained data;
subjecting data in the data stream to a filtering process, and obtaining data to be cleaned;
deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data;
detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and outputting the finally cleaned data to a second data source.
2. The method according to Claim 1, characterized in that the step of deleting or filling in any field containing missing values in the data to be cleaned includes:
calculating to obtain a missing rate of the field according to a proportion of number of pieces of the missing values of the field in a total number of pieces;
determining an attribute importance degree of the field according to an index required to be analyzed; and deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field.
calculating to obtain a missing rate of the field according to a proportion of number of pieces of the missing values of the field in a total number of pieces;
determining an attribute importance degree of the field according to an index required to be analyzed; and deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field.
3. The method according to Claim 2, characterized in that the step of deleting or filling in the field containing missing values according to the missing rate and the attribute importance degree of the field includes:
filling in the field when the missing rate of the field is lower than a preset missing rate threshold and the attribute importance degree thereof is lower than a preset importance grading threshold;
deleting the field when the missing rate of the field is not lower than the preset missing rate Date Regue/Date Received 2022-09-27 threshold and the attribute importance degree thereof is lower than the preset importance grading threshold; and complementing the missing values of the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is greater than the preset importance grading threshold.
filling in the field when the missing rate of the field is lower than a preset missing rate threshold and the attribute importance degree thereof is lower than a preset importance grading threshold;
deleting the field when the missing rate of the field is not lower than the preset missing rate Date Regue/Date Received 2022-09-27 threshold and the attribute importance degree thereof is lower than the preset importance grading threshold; and complementing the missing values of the field when the missing rate of the field is not lower than the preset missing rate threshold and the attribute importance degree thereof is greater than the preset importance grading threshold.
4. The method according to Claim 1, characterized in that the method further comprises:
probing metadata that describes data attribute of the data in the first data source, analyzing to obtain any quality problem present in the data according to the metadata, and setting a filtering rule according to the quality problem; and that the step of subjecting data in the data stream to a filtering process, and obtaining data to be cleaned includes: subjecting data in the data stream to a filtering process according to the filtering rule, and obtaining data to be cleaned.
probing metadata that describes data attribute of the data in the first data source, analyzing to obtain any quality problem present in the data according to the metadata, and setting a filtering rule according to the quality problem; and that the step of subjecting data in the data stream to a filtering process, and obtaining data to be cleaned includes: subjecting data in the data stream to a filtering process according to the filtering rule, and obtaining data to be cleaned.
5. The method according to any one of Claims 1 to 4, characterized in that the step of subjecting data in the data stream to a filtering process includes:
row-grade filtering, whereby any row not required in the data is removed; and column-grade filtering, whereby, when one row has plural columns, fields to which any required column corresponds are merely selected and retained.
row-grade filtering, whereby any row not required in the data is removed; and column-grade filtering, whereby, when one row has plural columns, fields to which any required column corresponds are merely selected and retained.
6. The method according to anyone of Claims 1 to 4, characterized in that the preset judging rule includes a legitimacy rule and a logic rule, and that the step of detecting whether the preliminarily cleaned data conforms to a preset judging rule includes:
setting the preliminarily cleaned data as a maximum value that conforms to the legitimacy rule, or deleting the data, if the preliminarily cleaned data does not conform to the legitimacy rule; and deleting the preliminarily cleaned data and generating an alarming instruction, if the preliminarily cleaned data does not conform to the logic rule.
setting the preliminarily cleaned data as a maximum value that conforms to the legitimacy rule, or deleting the data, if the preliminarily cleaned data does not conform to the legitimacy rule; and deleting the preliminarily cleaned data and generating an alarming instruction, if the preliminarily cleaned data does not conform to the logic rule.
7. The method according to Claim 1, characterized in that the first data source and the second Date Regue/Date Received 2022-09-27 data source are of different data types of the same and single distributed messaging system, that the distributed messaging system is Kafka, that the first data source and the second data source are two different Topics of Kafka, and that the data stream is embodied as a data stream based on Spark Streaming.
8. A data cleaning device, characterized in that the device comprises:
a data obtaining module, for obtaining data from a first data source, and creating an independent data stream by employing the obtained data;
a data filtering module, for subjecting data in the data stream to a filtering process, and obtaining data to be cleaned;
a preliminarily cleaning module, for deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data;
a finally cleaning module, for detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and a data outputting module, for outputting the finally cleaned data to a second data source.
a data obtaining module, for obtaining data from a first data source, and creating an independent data stream by employing the obtained data;
a data filtering module, for subjecting data in the data stream to a filtering process, and obtaining data to be cleaned;
a preliminarily cleaning module, for deleting or filling in any field containing missing values in the data to be cleaned, and obtaining preliminarily cleaned data;
a finally cleaning module, for detecting whether the preliminarily cleaned data conforms to a preset judging rule, deleting any data that does not conform to the judging rule, and obtaining finally cleaned data; and a data outputting module, for outputting the finally cleaned data to a second data source.
9. A computer equipment, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, characterized in that steps of the method according to any one of Claims 1 to 7 are realized when the processor executes the computer program.
10. A computer-readable storage medium, storing a computer program thereon, characterized in that steps of the method according to any one of Claims 1 to 7 are realized when the computer program is executed by a processor.
Date Regue/Date Received 2022-09-27
Date Regue/Date Received 2022-09-27
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910308949.0A CN110162519A (en) | 2019-04-17 | 2019-04-17 | Data clearing method |
CN201910308949.0 | 2019-04-17 | ||
PCT/CN2019/109121 WO2020211299A1 (en) | 2019-04-17 | 2019-09-29 | Data cleansing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CA3177209A1 true CA3177209A1 (en) | 2020-10-22 |
Family
ID=67639550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA3177209A Pending CA3177209A1 (en) | 2019-04-17 | 2019-09-29 | Data cleaning method |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN110162519A (en) |
CA (1) | CA3177209A1 (en) |
WO (1) | WO2020211299A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114356902A (en) * | 2021-12-14 | 2022-04-15 | 中核武汉核电运行技术股份有限公司 | Industrial data quality management method and device |
CN114385606A (en) * | 2021-12-09 | 2022-04-22 | 湖北省信产通信服务有限公司数字科技分公司 | Big data cleaning method and system, storage medium and electronic equipment |
CN115794795A (en) * | 2022-12-08 | 2023-03-14 | 湖北华中电力科技开发有限责任公司 | Power distribution station power consumption data standardized cleaning method, device and system and storage medium |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110162519A (en) * | 2019-04-17 | 2019-08-23 | 苏宁易购集团股份有限公司 | Data clearing method |
CN110716928A (en) * | 2019-09-09 | 2020-01-21 | 上海凯京信达科技集团有限公司 | Data processing method, device, equipment and storage medium |
CN110704410A (en) * | 2019-09-27 | 2020-01-17 | 中冶赛迪重庆信息技术有限公司 | Data cleaning method, system and equipment |
CN110781176A (en) * | 2019-11-06 | 2020-02-11 | 国网山东省电力公司威海供电公司 | Power grid data quality improvement method based on data correlation |
CN110990447B (en) * | 2019-12-19 | 2023-09-15 | 北京锐安科技有限公司 | Data exploration method, device, equipment and storage medium |
CN111563071A (en) * | 2020-04-03 | 2020-08-21 | 深圳价值在线信息科技股份有限公司 | Data cleaning method and device, terminal equipment and computer readable storage medium |
CN111966735A (en) * | 2020-07-22 | 2020-11-20 | 山东高速信息工程有限公司 | NIFI-based micro-service data interaction method and system |
CN111859814B (en) * | 2020-07-30 | 2023-07-28 | 中国电建集团昆明勘测设计研究院有限公司 | Rock aging deformation prediction method and system based on LSTM deep learning |
CN112287562B (en) * | 2020-11-18 | 2023-03-10 | 国网新疆电力有限公司经济技术研究院 | Power equipment retired data completion method and system |
CN113268476A (en) * | 2021-06-07 | 2021-08-17 | 一汽解放汽车有限公司 | Data cleaning method and device applied to Internet of vehicles and computer equipment |
CN113535697B (en) * | 2021-07-07 | 2024-05-24 | 广州三叠纪元智能科技有限公司 | Climbing frame data cleaning method, climbing frame control device and storage medium |
CN113568811A (en) * | 2021-07-28 | 2021-10-29 | 中国南方电网有限责任公司 | Distributed safety monitoring data processing method |
CN114549052A (en) * | 2022-01-20 | 2022-05-27 | 深圳市宝视佳科技有限公司 | Data-based accurate marketing method, device, equipment and storage medium |
CN116186698A (en) * | 2022-12-16 | 2023-05-30 | 广东技术师范大学 | Machine learning-based secure data processing method, medium and equipment |
CN115809406B (en) * | 2023-02-03 | 2023-05-12 | 佰聆数据股份有限公司 | Fine granularity classification method, device, equipment and storage medium for electric power users |
CN117290315B (en) * | 2023-10-11 | 2024-06-25 | 河南师范大学 | Data classification cleaning method |
CN117540151B (en) * | 2023-12-08 | 2024-06-28 | 深圳市亲邻科技有限公司 | Data preprocessing method of data pushing system |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160179599A1 (en) * | 2012-10-11 | 2016-06-23 | University Of Southern California | Data processing framework for data cleansing |
CN105989163A (en) * | 2015-03-04 | 2016-10-05 | ***通信集团福建有限公司 | Data real-time processing method and system |
CN106294745A (en) * | 2016-08-10 | 2017-01-04 | 东方网力科技股份有限公司 | Big data cleaning method and device |
CN107025301A (en) * | 2017-04-25 | 2017-08-08 | 西安理工大学 | Flight ensures the method for cleaning of data |
CN108596386A (en) * | 2018-04-20 | 2018-09-28 | 上海市司法局 | A kind of prediction convict repeats the method and system of crime probability |
CN109063964A (en) * | 2018-07-02 | 2018-12-21 | 浙江百先得服饰有限公司 | A kind of platform data processing system |
CN109255523B (en) * | 2018-08-16 | 2021-07-20 | 北京奥技异科技发展有限公司 | Analytical index computing platform based on KKS coding rule and big data architecture |
CN109492002B (en) * | 2018-10-19 | 2021-03-23 | 浙江大学华南工业技术研究院 | Smart power grid big data storage and analysis system and processing method |
CN110162519A (en) * | 2019-04-17 | 2019-08-23 | 苏宁易购集团股份有限公司 | Data clearing method |
-
2019
- 2019-04-17 CN CN201910308949.0A patent/CN110162519A/en active Pending
- 2019-09-29 WO PCT/CN2019/109121 patent/WO2020211299A1/en active Application Filing
- 2019-09-29 CA CA3177209A patent/CA3177209A1/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114385606A (en) * | 2021-12-09 | 2022-04-22 | 湖北省信产通信服务有限公司数字科技分公司 | Big data cleaning method and system, storage medium and electronic equipment |
CN114356902A (en) * | 2021-12-14 | 2022-04-15 | 中核武汉核电运行技术股份有限公司 | Industrial data quality management method and device |
CN115794795A (en) * | 2022-12-08 | 2023-03-14 | 湖北华中电力科技开发有限责任公司 | Power distribution station power consumption data standardized cleaning method, device and system and storage medium |
CN115794795B (en) * | 2022-12-08 | 2023-09-22 | 湖北华中电力科技开发有限责任公司 | Power distribution station electricity consumption data standardization cleaning method, device, system and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020211299A1 (en) | 2020-10-22 |
CN110162519A (en) | 2019-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA3177209A1 (en) | Data cleaning method | |
CN109543925B (en) | Risk prediction method and device based on machine learning, computer equipment and storage medium | |
CN110569214A (en) | Index construction method and device for log file and electronic equipment | |
JP5810719B2 (en) | Data arrangement changing program, data arrangement changing method, and data arrangement changing apparatus | |
CN109656779A (en) | Internal memory monitoring method, device, terminal and storage medium | |
CN111400361A (en) | Data real-time storage method and device, computer equipment and storage medium | |
CN112153375B (en) | Front-end performance testing method, device, equipment and medium based on video information | |
CN106445815A (en) | Automated testing method and device | |
CN111858278A (en) | Log analysis method and system based on big data processing and readable storage device | |
CN112948504B (en) | Data acquisition method and device, computer equipment and storage medium | |
CN113190531A (en) | Database migration method, device, equipment and storage medium | |
CN112527786A (en) | Data table partition adding method and device, computer equipment and storage medium | |
CN113691631B (en) | Data cleaning method and device and electronic equipment | |
CN115827691A (en) | Batch processing result verification method and device, computer equipment and storage medium | |
CN115145674A (en) | Page jump method, device, equipment and medium based on dynamic anchor point | |
CN115098503A (en) | Null value data processing method and device, computer equipment and storage medium | |
CN114661686A (en) | Message extraction method, device, equipment, medium and program product of log file | |
CN114238052A (en) | Pressure measurement data filtering method and device, storage medium and computer equipment | |
CN113778996A (en) | Large data stream data processing method and device, electronic equipment and storage medium | |
CN113761443A (en) | Website page data acquisition and statistics method, storage medium and equipment | |
CN114722261A (en) | Resource processing method and device, electronic equipment and storage medium | |
CN112256685A (en) | Spreadsheet-based segmentation de-duplication import method and related product | |
CN112187564A (en) | vSAN performance test method, apparatus, computer device and storage medium | |
CN112800005B (en) | Deep inspection method, system, terminal and storage medium for file system | |
CN117076292A (en) | Webpage testing method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |
Effective date: 20220927 |
|
EEER | Examination request |
Effective date: 20220927 |
|
EEER | Examination request |
Effective date: 20220927 |
|
EEER | Examination request |
Effective date: 20220927 |
|
EEER | Examination request |
Effective date: 20220927 |
|
EEER | Examination request |
Effective date: 20220927 |
|
EEER | Examination request |
Effective date: 20220927 |
|
EEER | Examination request |
Effective date: 20220927 |