CN117520324A - Government affair data cleaning method and device, electronic equipment and storage medium - Google Patents

Government affair data cleaning method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117520324A
CN117520324A CN202311478018.8A CN202311478018A CN117520324A CN 117520324 A CN117520324 A CN 117520324A CN 202311478018 A CN202311478018 A CN 202311478018A CN 117520324 A CN117520324 A CN 117520324A
Authority
CN
China
Prior art keywords
index
data
initial
index data
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311478018.8A
Other languages
Chinese (zh)
Inventor
华夏扬
张晓栋
黎永昇
马鑫磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom Guangdong Industrial Internet Co Ltd
Original Assignee
China Unicom Guangdong Industrial Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom Guangdong Industrial Internet Co Ltd filed Critical China Unicom Guangdong Industrial Internet Co Ltd
Priority to CN202311478018.8A priority Critical patent/CN117520324A/en
Publication of CN117520324A publication Critical patent/CN117520324A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a method, a device, electronic equipment and a storage medium for cleaning government affair data, wherein the method comprises the following steps: acquiring an initial index table, and acquiring an analysis rule corresponding to the initial index table; performing data cleaning on index data in the initial index table according to the analysis rule; if the first index data which does not accord with the strong rule exists in the initial index table, the first index data is subjected to non-warehousing processing; if the second index data which does not accord with the weak rule exists in the initial index table, generating a warning mark corresponding to the second index data, and carrying out warehouse entry processing on the second index data and the corresponding warning mark. By implementing the embodiment of the application, the index data can be more efficiently cleaned, and the efficiency of government affair data management and analysis is further improved.

Description

Government affair data cleaning method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of data processing, in particular to a method and a device for cleaning government affair data, electronic equipment and a storage medium.
Background
At present, management of government affair data is a very important content in government, and in most cases, government affair data is managed for each government department respectively and then is integrated by a specific department, so that management of government affair data among the government departments is not uniform, and management and analysis of government affair data also exist among peer departments and upper and lower departments under the condition of planning and management lacking in systematicness and integrity, and government affair data are difficult to circulate among government departments, so that management and analysis efficiency of government affair data are low.
Disclosure of Invention
The embodiment of the application discloses a government affair data cleaning method, a government affair data cleaning device, electronic equipment and a storage medium, which can effectively clean data of an initial index table, and further improve the management and analysis efficiency of government affair data.
The embodiment of the application discloses a method for cleaning government affair data, which comprises the following steps:
acquiring an initial index table, and acquiring an analysis rule corresponding to the initial index table; the initial index table comprises at least one index and index data corresponding to each index, and the analysis rule is determined according to historical government affair data; the analysis rule comprises at least one strong rule and/or at least one weak rule;
performing data cleaning on the index data in the initial index table according to the analysis rule;
if the first index data which does not accord with the strong rule exists in the initial index table, performing non-warehousing processing on the first index data;
if the second index data which does not accord with the weak rule exists in the initial index table, generating a warning mark corresponding to the second index data, and warehousing the second index data and the corresponding warning mark.
As an alternative embodiment, after the acquiring the initial index table and acquiring the analysis rule, the method further includes:
preprocessing the index data in the initial index table to obtain preprocessed index data;
the step of cleaning the data of the index data in the initial index table according to the analysis rule includes:
and cleaning the data of the preprocessed index data according to the analysis rule.
As an optional implementation manner, before the obtaining the initial indicator table and obtaining the analysis rule corresponding to the initial indicator table, the method further includes:
configuring at least one index in response to the configuration instruction;
collecting index data corresponding to each index according to the at least one index;
generating an initial index table according to the at least one index and index data corresponding to each index, and storing the initial index table.
As an optional implementation manner, after the performing the non-binning processing on the first index data, the method further includes:
determining a first index corresponding to the first index data;
re-acquiring third index data corresponding to the first index, wherein the third index data accords with the strong rule;
And generating a target index table according to the third index data, the second index data and the corresponding warning identifier and the index data conforming to the analysis rule in the initial index table.
As an optional implementation manner, the re-acquiring the third index data corresponding to the first index includes:
generating a first form to be filled in according to the first index;
transmitting the first form to be filled in to a first terminal; the first terminal is a terminal for uploading the initial index table;
receiving a first form which is sent by the first terminal and is completed to be filled;
and extracting third index data corresponding to the first index from the filled first form.
As an optional implementation manner, after the generating the warning identifier corresponding to the second index data, the method further includes:
generating warning information according to the second index data and the corresponding warning mark;
the warning information is sent to a first terminal; the first terminal is a terminal for uploading the initial index table; the warning information is used for prompting that the second index data does not accord with the weak rule.
As an optional implementation manner, after the initial indicator table is obtained and the analysis rule corresponding to the initial indicator table is obtained, the method further includes:
checking the at least one index contained in the initial index table and index data corresponding to each index to judge and determine whether abnormal information exists in the initial index table; the abnormal information comprises the absence of indexes of the initial index table and/or the absence of index data corresponding to the presence of indexes of the initial index table;
if the initial index table has abnormal information, outputting a check report according to the abnormal information in the initial index table, and sending the check report to a first terminal; the check report is used for indicating the first terminal equipment to upload the missing index and the corresponding index data, and/or is used for indicating the first terminal equipment to upload the missing index data;
receiving the supplementary index data sent by the first terminal;
the step of cleaning the data of the index data in the initial index table according to the analysis rule includes:
and cleaning the index data in the initial index table and the supplementary index data according to the analysis rule.
The embodiment of the application discloses belt cleaning device of government affair data, the device includes:
the table acquisition module is used for acquiring an initial index table and acquiring an analysis rule corresponding to the initial index table; the initial index table comprises at least one index and index data corresponding to each index, and the analysis rule is determined according to historical government affair data; the analysis rule comprises at least one strong rule and/or at least one weak rule;
the cleaning module is used for cleaning the data of the index data in the initial index table according to the analysis rule;
the first processing module is used for performing non-warehousing processing on the first index data if the first index data which does not accord with the strong rule exists in the initial index table;
and the second processing module is used for generating a warning mark corresponding to the second index data if the second index data which does not accord with the weak rule exists in the initial index table, and carrying out warehouse entry processing on the second index data and the corresponding warning mark.
The embodiment of the application discloses electronic equipment, which comprises a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor realizes any one of the government affair data cleaning methods disclosed by the embodiment of the application.
The embodiment of the application discloses a computer readable storage medium which stores a computer program, wherein the computer program realizes any one of the government affair data cleaning methods disclosed in the embodiment of the application when being executed by a processor.
Compared with the related art, the embodiment of the application has the following beneficial effects:
the embodiment of the application provides a government affair data cleaning method, a government affair data cleaning device, electronic equipment and a storage medium, wherein an initial index table is obtained, and an analysis rule corresponding to the initial index table is obtained; the initial index table comprises at least one index and index data corresponding to each index, and the analysis rule is determined according to historical government affair data; the analysis rules include at least one strong rule and/or at least one weak rule; performing data cleaning on index data in the initial index table according to the analysis rule; if the first index data which does not accord with the strong rule exists in the initial index table, the first index data is subjected to non-warehousing processing; if the second index data which does not accord with the weak rule exists in the initial index table, generating a warning mark corresponding to the second index data, and carrying out warehouse entry processing on the second index data and the corresponding warning mark. According to the embodiment of the application, the analysis rules are determined according to historical government affair data, the government affair data can be cleaned in a targeted manner according to the analysis rules, the initial index table comprises a plurality of different indexes, the data cleaning is carried out on the initial index table through the analysis rules corresponding to the initial index table, the non-warehousing processing is carried out on the index data which does not accord with the strong rules, the warehousing processing is carried out on the index data which does not accord with the weak rules, the targeted processing can be carried out on the different index data, the data cleaning can be carried out on the index data more efficiently, and the management and analysis efficiency of the government affair data is further improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for cleaning government affair data according to an embodiment of the present application;
FIG. 2 is a flow chart of another method for cleaning government affair data according to the embodiments of the present application;
FIG. 3 is a schematic flow chart of generating a target index table according to an embodiment of the present disclosure;
FIG. 4 is a flow chart of another method for cleaning government affair data according to the embodiments of the present application;
FIG. 5 is a schematic diagram of generating a checkreport in one embodiment;
fig. 6 is a schematic structural diagram of a cleaning device for government affair data according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments and figures herein are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another element. For example, the first index data may be referred to as second index data, and similarly, the second index data may be referred to as first index data, without departing from the scope of the present application. Both the first index data and the second index data are index data, but they are not the same index data.
In the current technical development, some data cleaning methods and systems already exist, and some mature data cleaning methods and systems are applied and popularized in the industries of catering, retail, finance and the like. However, because of the specificity of the management of the government affair data, the management of the government affair data has higher requirements on overall planning and safety, and the current data cleaning method and system are not applicable to government affair related application scenes due to the lack of planning on the integrity, low safety and other short boards, and mainly appear in the following aspects:
The current technology is not efficient in managing government data. In the government affair data filling platform, namely the platform for collecting government affair data, the collected government affair data is poor in quality due to the fact that report formats for collecting government affair data are often different. Different from the management of data of general enterprises, the unit departments of government affair data design are numerous in level, different areas and departments have local areas or corresponding report standards of the local departments, and the report of collected government affair data is difficult to integrate into a unified format, so that the format of government affair data is also not unified, and the management is difficult. If the government data collected by each government department are unified in an artificial mode, the data quality is difficult to guarantee and the efficiency is low, meanwhile, the collected government data is required to be subjected to a plurality of processes such as archiving and backup, and the government data is easy to be confused in the processes, so that the accuracy and the integrity of the government data are affected.
The current technology is ambiguous in handling outliers of government data. The current data cleaning method and system lack an explicit processing strategy aiming at the abnormal value of the collected government affair data, so that the abnormal government affair data has negative influence on statistical analysis and data fusion sharing.
The current technology lacks comprehensive data cleansing rules for government data. Noise and errors often exist in the government affair data in the collection process, and the existing data cleaning rules are often not comprehensive enough, and cannot sufficiently clean the government affair data, so that the follow-up analysis result of the government affair data and the decision accuracy according to the government affair data are affected.
The current technology lacks systematic management of the filled indicators. Different from the data management in the industries of catering, retail, finance and the like, government affair data has more complicated index definition and index dimension system, and more parameters are needed for index definition and explanation description. The current technology does not have a management function for a large number of indexes, so that the management platform is specially aimed at government affair data.
In summary, these problems limit the ability to manage and analyze government data, thereby affecting the accuracy and efficiency of government decisions. Therefore, a data cleansing method for government data needs to be further improved and promoted.
The embodiment of the application discloses a government affair data cleaning method, a government affair data cleaning device, electronic equipment and a storage medium, which can effectively clean data of an initial index table, and further improve the efficiency of government affair data management and analysis. The following will describe in detail.
Fig. 1 is a flow chart of a method for cleaning government affair data according to an embodiment of the present application. The data cleaning method described in fig. 1 is applicable to an electronic device, which may include, but is not limited to, a mobile phone, a tablet computer, a wearable device, a notebook computer, a PC (personal computer), and the like, and embodiments of the present application are not limited thereto. As shown in fig. 1, the data cleansing method may include the steps of:
step S102, an initial index table is obtained, and an analysis rule corresponding to the initial index table is obtained; the initial index table comprises at least one index and index data corresponding to each index, and the analysis rule is determined according to historical government affair data; the analysis rules include at least one strong rule and/or at least one weak rule.
The government affair data can be data which can be directly collected by government departments in law, can be data which can be authorized and managed by third parties in law, or is data resources which are formed by depending on a government affair information system and the like because of responsibility execution, for example, the government departments can collect and manage the government affair data according to a government affair data filling platform.
Government data is data collected during daily work of government departments, and government data is resources such as files, materials, charts and data which can be related to the work of government departments. For example, government data may be the identity information that a government agency has collected the local residents, which may include the resident addresses of the local residents, identification card numbers, common telephone numbers, and the like. Because government data formats collected by the government data reporting platform are various, for example, a government department of a land collects first identity information of local residents, the first identity information comprises names, residence addresses, identification numbers, common telephone numbers and the like, another government department collects second identity information of the local residents, the second identity information comprises names, mass faces, identification numbers and home locations, and if the first identity information and the second identity information need to be integrated, indexes can be used for managing the government data.
The initial index table may include at least one index and index data corresponding to each index, where the index may be a parameter describing a business meaning of the data, for example, an index of "identification card number" may be set in the initial index table, and the index data corresponding to the index is an identification card number of a person needing to collect the data. If an initial index table contains two indexes, namely a name and an identification card number, and both the two indexes have corresponding index data, the initial index table can be shown in table 1:
TABLE 1
Name of name Identification card number
Zhang San 120112XXXX05046978
Li Si 220403XXXX09176016
Wherein, the index data corresponding to the index of the name is [ Zhang three, li four, … ], and the index data corresponding to the index of the identification card number is [120112XXXX05046978, 220403XXXX09176016, … ].
Government authorities collect government data that often conform to the nature of government businesses, and thus analysis rules may be determined from historical government data. The analysis rules may be used to perform data cleansing on the index data in the initial index table. The analysis rules are classified into strong rules, which may be rules that the index data must satisfy, and weak rules, which may be rules that the index data does not force but advocates to satisfy. For example, a strong rule may be set that "the data type cannot be text", where the strong rule may be used to perform data cleaning on an initial index table including an index of "identification card number", and if a text form appears on index data in the index of "identification card number", the index data in the text form does not conform to the strong rule, and data cleaning is required according to the strong rule. For another example, a weak rule may be set according to GDP (Gross Domestic Product, national production total) data of calendar a, which is "data range of 25000 to 35000 gigabytes", and may be used for data cleansing of an initial index table containing "2022 a GDP" index, and for index data that does not meet the weak rule, which may occur in the "2022 a GDP" index, it may be because the GDP value of 2022 a fluctuates more than the past year, and thus index data that does not meet the weak rule may be retained. After the initial index table is obtained, the analysis rules corresponding to the initial index table are obtained, and for different initial index tables, the data of the initial index table can be cleaned according to the corresponding analysis rules, and as the initial index table contains a plurality of different indexes, the data can be cleaned according to the characteristics of index data corresponding to the contained indexes by adopting the targeted analysis rules, so that the data cleaning of the index data in the initial index table is more targeted, and the data cleaning efficiency is improved.
As one embodiment, government data collected by government authorities may be consolidated into an initial index table and stored in an electronic device. The initial index table may be obtained by searching a table identifier corresponding to the initial index table, or may be formed by sorting the collected index data into the index corresponding to the index data. The method comprises the steps of obtaining an analysis rule corresponding to an initial index table, wherein at least one analysis rule is set in advance, an instruction is formed by checking the analysis rule corresponding to the initial index table, and the analysis rule corresponding to the initial index table is obtained according to the instruction. According to government business requirements, analysis rules arranged in the electronic equipment can be updated to realize data cleaning of different initial index data.
In some embodiments, after the initial index table is obtained and the analysis rule is obtained, the index data in the initial index table may be further preprocessed to obtain preprocessed index data; and cleaning the data of the preprocessed index data according to the analysis rule. The index data in the initial index table is preprocessed, and normalization processing, precision unified processing or unit unified processing can be performed on the index data, wherein the normalization processing is to limit the index data in a preset range (such as [0,1] or [ -1,1 ]). The unified precision processing of the index data is to limit the precision of the index data to the same precision standard, for example, for a numerical value in a certain index data, the last two digits of the decimal point can be reserved uniformly so as to achieve the standard of unified precision. The unit unified processing of the index data is to use the same unit for the numerical value of the same index data, for example, the numerical value in a certain index data comprises 1 meter, 1.5 meters and 50 centimeters, and 50 centimeters can be converted into 0.5 meters, so that the purpose of unit unified processing of the index data is achieved.
Step S104, data cleaning is carried out on the index data in the initial index table according to the analysis rule.
Government authorities collect government data that not only have traditional structured data, but also contain large amounts of unstructured data. Structured data, also called row data, is data logically expressed and implemented by a two-dimensional table structure, strictly following data format and length specifications, and is stored and managed mainly by relational databases. Unstructured data needs to be converted into structured data before data cleansing can be performed. In the process of converting unstructured data into structured data, the unstructured data can be converted into structured data by a method of extracting characteristic values.
As an implementation manner, if the analysis rule only includes a strong rule, the data cleaning is performed on the index data in the initial index table according to the analysis rule, so that only the index data conforming to the strong rule in the initial index table can be reserved; if the analysis rule only comprises a weak rule, the data cleaning is carried out on the index data in the initial index table according to the analysis rule, so that the index data conforming to the weak rule in the initial index table can be reserved, the index data not conforming to the weak rule in the initial index table can be reserved, and for two different index data, namely the index data conforming to the weak rule and the index data not conforming to the weak rule, the two different index data can be distinguished, namely the two different index data can be distinguished, the two different index data can be provided with corresponding marks, and the two different index data can be stored in different tables to distinguish the two different index data.
In some embodiments, the data cleaning of the index data may further include deleting the repeated index data, correcting the index data with the abnormality, and the like, where the index data with the abnormality may be that the index data has a null value or that the data type of the index data is different from the preset data type, and the like. In addition, the cleaned index data needs to maintain data consistency, namely, for index data integrated by multiple data sources, the semantics of the same index data need to be kept the same. In addition to data cleansing of the index data in the initial index table according to analysis rules, related data cleansing techniques such as comb statistics, data mining may be utilized. The data cleansing of the index data can be described in terms of the following processing flow of the data cleansing.
The index data having the missing value is processed, and the missing value may be data in which the index data is incomplete and thus has a gap. For index data with missing values, new data can be manually filled in to replace the missing values in the index data, and for the missing values in some index data, the average value, the maximum value, the minimum value or more complex probability estimation of the data source can be obtained from the data source corresponding to the obtained index data to deduce the obtained value to replace the missing values, so that the aim of processing the index data with the missing values is fulfilled.
And processing the index data with the error value, wherein the error value is the data which does not accord with the preset range of the index corresponding to the current index data. For the index data with the error value, the error value can be identified through deviation analysis, regression equation, simple rule base or preset constraint, and the like, and then the error value is corrected or deleted, so that the aim of processing the index data with the error value is fulfilled.
The index data from different data sources are processed, semantic conflict exists in the index data integrated by the data sources, and consistency of the index data from the different data sources can be maintained by defining integrity constraint or analyzing data association and other methods.
And processing the index data with the repeated values, detecting whether the index data are equal by judging whether the attribute values of the index data are equal, if equal values appear in the index data, repeating the equal values, and for the repeated values in the index data, combining the repeated values into one piece or eliminating the repeated values and leaving one piece of the repeated values, so as to achieve the aim of processing the index data with the repeated values.
And (3) carrying out data cleaning on the index data in the initial index table according to the analysis rule to obtain cleaned index data, wherein the cleaned index data can meet the data quality requirement, and can provide guarantee for subsequent management and analysis of the index data.
Step S106, if the first index data which does not accord with the strong rule exists in the initial index table, the first index data is not subjected to warehouse entry processing.
As an embodiment, since the strong rule is a rule that the index data in the initial index table must satisfy, if there is first index data that does not conform to the strong rule in the initial index table, the index data is not put in storage, where the out-of-storage processing may be to delete the index data that does not conform to the strong rule, or update the index data that does not conform to the strong rule until the index data conforms to the corresponding strong rule. For index data conforming to the strong rule, the index data may be stored in a database. For example, if the index data corresponding to the index of "identification card number" in the initial index table indicates a numerical value type other than numerals and english, the corresponding strong rule is "the numerical value type other than numerals and english cannot be indicated in the identification card number", the index data is the first index data, the first index data may be deleted, and the index data conforming to the strong rule is stored in the database.
By carrying out non-warehouse entry processing on the first index data which does not accord with the strong rule, the index data which is favorable for centralized management and data analysis can be screened out, and the management and analysis efficiency of government affair data is improved.
Step S108, if the second index data which does not accord with the weak rule exists in the initial index table, generating a warning mark corresponding to the second index data, and warehousing the second index data and the corresponding warning mark.
In some embodiments, in the initial indicator table, the indicator data that does not conform to the weak rule is second indicator data, for the second indicator data, a warning identifier corresponding to the second indicator data may be generated, and then the second indicator data, that is, the corresponding warning identifier, is subjected to warehouse entry processing, where the warning identifier corresponding to the second indicator data may be a prompt box that displays the second indicator data, or may be second indicator data that displays a specific color, so as to be different from the indicator data that conforms to the weak rule in the initial indicator table.
As an embodiment, after the warning identifier corresponding to the second index data is generated, warning information may be generated according to the second index data and the corresponding warning identifier; transmitting the warning information to the first terminal; the first terminal is a terminal for uploading the initial index table; the warning information is used for prompting that the second index data does not accord with the weak rule. According to the warning information corresponding to the initial index table is sent to the first terminal, so that a salesman uploading the initial index table can know the condition of cleaning the data of the initial index table, and the salesman can upload index data corresponding to important indexes again according to the warning information to obtain index data with higher quality. Meanwhile, when the salesman uploads the initial index table again, the warning information can be referred to, and the condition that second index data appear in the initial index table is actively avoided, so that index data which do not accord with weak rules exist in the uploaded initial index table can be reduced.
In the embodiment of the application, an initial index table is obtained, an analysis rule corresponding to the initial index table is obtained, data cleaning is performed on index data in the initial index table according to the analysis rule, if first index data which does not accord with the strong rule exists in the initial index table, non-warehousing processing is performed on the first index data, if second index data which does not accord with the weak rule exists in the initial index table, warning identification corresponding to the second index data is generated, warehousing processing is performed on the second index data and the corresponding warning identification, for different initial index tables, according to the characteristics of the index data in the initial index table, the corresponding analysis rule can be selected for data cleaning of the index data, non-warehousing processing is performed on the index data which does not accord with the strong rule, and data cleaning is performed on the index data which does not accord with the weak rule and the corresponding warning identification, so that the index data in the initial index table is subjected to targeted data cleaning, and the data cleaning can be performed more efficiently, and the government data management and analysis efficiency is improved.
Fig. 2 is a flow chart of another method for cleaning government affair data according to an embodiment of the present application. In one embodiment, as shown in FIG. 2, the method includes the steps of:
step S202, at least one index is configured in response to the configuration instruction.
The index can be used for describing the business meaning of government affair data, and an index system is built by standardized definition and standardized development of the index, and the government affair data is managed according to the perfect index system, so that the ambiguity of the data can be eliminated, and the communication cost of business and technology in work of government departments is reduced. The system is well configured and unified, and the visual, the usable and the manageable index data can be realized.
In one embodiment, at least one index may be configured according to a configuration instruction in response to the configuration instruction, and information of the configuration instruction may include basic information, caliber information, and storage information of the index.
The basic information may include index names, corresponding index catalogues, analysis dimensions, index classification grading and updating periods, wherein the corresponding index catalogues are superior classification of the indexes, and in the management of the indexes, the indexes with the same business process and attribute are generally classified into the same superior classification, for example, the 'nucleic acid detection' index, the 'vaccination' index and the 'national epidemic situation' index are all indexes related to epidemic situation prevention and control, and can be classified into the superior classification of the 'epidemic situation prevention and control'; the analysis dimension may be used to indicate a statistical method used when the index data corresponding to the index is subjected to data analysis, for example, the analysis dimension of the index "regional production total value" is a first method, and the first method may be to calculate the growth rate of the regional production total value by using a ring ratio, or to calculate the growth rate of the regional production total value by using a same ratio; the index classification is a parameter for indicating the safety degree of the index, for example, if the index classification is first-level, the index data corresponding to the index may be secret data, the index data cannot be disclosed to the outside, and if the index classification is second-level, the index data corresponding to the index may be publicable data, the index data may be disclosed to the outside; the update period is an update frequency of index data for indicating correspondence of the index.
The caliber information may include a service caliber, a technical caliber, and a responsible person. The service caliber is a statistical range of index data corresponding to the index, for example, the statistical range of the index "total value produced in the A field" is a total value produced in a preset time period by all industries in the A field, and a production value not in the A field range or a production value not in the preset time period is not in the statistical range of the index; the technical caliber is the processing specification of index data corresponding to the index, and the parameter is determined by the service caliber; the responsible person is a person who records index data corresponding to the responsible index.
Before responding to the configuration instruction, the configuration instruction sent by the terminal corresponding to the administrator can be received, and the configuration instruction can be generated by acquiring the content corresponding to the input index to be configured. The configured indexes can be managed by configuring the instruction, and the indexes are managed mainly by the same management of the indexes in the electronic stylus. The management of the indexes can be realized through the classification of the indexes, and the indexes can be divided into two types of basic indexes and composite indexes according to the configuration mode of the indexes. The basic indexes are indexes obtained by direct configuration according to configuration instructions, the composite indexes are new indexes obtained by processing or operating the basic indexes, one or more basic indexes are configured into corresponding composite indexes through four operations, function formulas and the like by a visual formula editor page in the electronic equipment, and corresponding database codes are automatically generated according to the configured composite indexes, so that index data stored in a database can be managed. The configuration of the composite index can meet the diversified configuration scenes of government departments, and meanwhile, the technical cost of index development can be reduced to a certain extent.
Step S204, index data corresponding to each index is collected according to at least one index.
As an implementation manner, the index data corresponding to each index is collected according to at least one index, the index data corresponding to each index can be derived from government departments to collect government data, the government data can be derived from various different types of data sources, business meanings according to the input government data are corresponding to the configured indexes, and the index data corresponding to each index which is filled manually can be obtained, so that the aim of collecting the index data corresponding to each index according to at least one index is fulfilled.
In some embodiments, government authorities may perform data preprocessing, data extraction, data filtering, data conversion, and data loading on government data in the collection of government data in multiple dimensions, multiple sources, and multiple structures.
And preprocessing the government affair data, wherein the preprocessing comprises correcting and revising some error data, simultaneously merging and sorting the government affair data, and storing the preprocessed government affair data into a new medium.
The data extraction of the government affair data is the process of extracting the data from different data sources, the data extraction of the government affair data can be carried out according to the mode of total extraction or increment extraction, the total extraction can be the government affair data required by copying from the data sources, the increment extraction can be the data source which is extracted last time, and the government affair data which is newly added or modified after the last time is extracted when the government affair data is extracted next time.
The data filtering of the government affair data is to preliminarily filter the data which does not accord with government application rules or invalid data in the data source, so that the processed government affair data is more standard and unified.
The data conversion of the government affair data is to convert the format, information code or numerical value of the government affair data.
The loading of the government data may be performing an inserting operation and a modifying operation on the government data, inserting different government data into different data tables to classify the government data, for example, inserting government data with a meaning of use into a first data table, and inserting government data without a meaning of use into a second data table, thereby distinguishing two government data. For the data loading work of the government affair data, a single database environment can be adopted, if the data magnitude of the government affair data is huge, for example, the data magnitude is more than ten millions, the government affair data can be stored by using a text file, and then the operation is carried out by combining script program processing, so that the data loading of the government affair data is realized.
Step S206, generating an initial index table according to at least one index and index data corresponding to each index, and storing the initial index table.
As an embodiment, an initial index table is generated according to at least one index and index data corresponding to each index, that is, the initial index table may include a plurality of indexes, and the initial index table may be stored in a database of the electronic device. When the initial index table needs to be acquired, the initial index table is searched according to the database, so that the purpose of acquiring the initial index table is achieved.
Step S208, an initial index table is obtained, and an analysis rule corresponding to the initial index table is obtained; the initial index table comprises at least one index and index data corresponding to each index, and the analysis rule is determined according to historical government affair data; the analysis rules include at least one strong rule and/or at least one weak rule.
Step S210, data cleaning is carried out on the index data in the initial index table according to the analysis rule.
In step S212, if the first index data that does not conform to the strong rule exists in the initial index table, the first index data is not put in storage.
Step S214, if the second index data which does not accord with the weak rule exists in the initial index table, generating a warning mark corresponding to the second index data, and warehousing the second index data and the corresponding warning mark.
The descriptions of step S208 to step S214 may refer to the descriptions related to step S102 to step S108 in the above embodiments, and are not repeated here.
In the embodiment of the application, at least one index is configured in response to a configuration instruction, index data corresponding to each index is collected according to the at least one index, an initial index table is generated according to the at least one index and the index data corresponding to each index, the initial index table is stored, then the initial index table and an analysis rule corresponding to the initial index table are acquired, the index data in the initial index table are subjected to data cleaning according to the analysis rule, the index is configured according to the configuration instruction, and then the index data corresponding to each index is collected according to the at least one index, so that the government affair data corresponding to different data sources can be integrated into the index data corresponding to each index, and the initial index table is obtained, so that the government affair data of different data sources can be flexibly processed, and the management capability of the government affair data can be further improved.
Referring to fig. 3, fig. 3 is a schematic flow chart of generating a target index table according to an embodiment of the present application. In one embodiment, the method for cleaning government affair data further includes the following steps:
step S302, a first index corresponding to the first index data is determined.
In one embodiment, the first index data is index data that does not conform to the strong rule corresponding to the initial index table, and since the first index data is not put in storage, the index data corresponding to the first index may be missing, and thus the index data of the first index may be collected again according to the first index corresponding to the first index data.
Step S304, third index data corresponding to the first index is acquired again, and the third index data accords with the strong rule.
As an implementation manner, the third index data corresponding to the first index is obtained again, the index form to be filled in can be generated through the first index, then the index form to be filled in is sent to the first terminal, a salesman of the first terminal fills in the index form to be filled in to obtain the index form to be filled in, then the index form to be filled in sent by the first terminal is received, the third index data is obtained according to the index form to be filled in, the third index data accords with the strong rule, and the third index data can be subjected to warehouse entry processing.
Step S306, a target index table is generated according to the third index data, the second index data and the corresponding warning mark, and the index data conforming to the analysis rule in the initial index table.
In some embodiments, the third index data, the second index data and the corresponding warning identifier are all capable of being subjected to warehouse entry processing, the index data conforming to the analysis rule in the initial index table is also data capable of being subjected to warehouse entry processing, the target index table is generated according to the third index data, the second index data, namely the corresponding warning identifier, and the index data compounding the analysis rule in the initial index table, and is stored in the database, and meanwhile, the index data in the target index table is index data obtained after data cleaning according to the analysis rule, so that the index data can be conveniently managed, and the method can be used for subsequent data analysis.
In the embodiment of the application, the first index corresponding to the first index data is determined, the third index data corresponding to the first index is obtained again, the third index data accords with the strong rule, the target index table is generated according to the third index data, the second index data, the corresponding warning mark and the index data which accords with the analysis rule in the initial index table, the first index data which does not accord with the strong rule can be updated, the third index data which accords with the strong rule is obtained, the target index table is obtained, the index data in the target index table accords with the analysis rule, and management and analysis of the index data are facilitated.
Fig. 4 is a flow chart of another method for cleaning government affair data according to an embodiment of the present application. In one embodiment, as shown in fig. 4, the method for cleaning government affair data includes the following steps:
step S402, an initial index table is obtained, and an analysis rule corresponding to the initial index table is obtained; the initial index table comprises at least one index and index data corresponding to each index, and the analysis rule is determined according to historical government affair data; the analysis rules include at least one strong rule and/or at least one weak rule.
The description of step S402 may refer to the description related to step S102 in the above embodiment, which is not repeated here.
Step S404, checking at least one index contained in the initial index table and index data corresponding to each index to judge whether abnormal information exists in the initial index table; the abnormal information includes the absence of the index of the initial index table and/or the absence of the index data corresponding to the index of the initial index table.
In some embodiments, the data quality check rule may be used to determine the data quality check rule according to the analysis result in combination with analysis of the business, system and data of government departments in each part of the city based on the data standard of the index data. And realizing data quality inspection of the key index data according to the data quality inspection rule, outputting an inspection report according to the data quality inspection result, and forming a data quality correction scheme according to the inspection report so as to correct the key index data.
In other embodiments, before the data cleaning is performed on the index data in the initial index table according to the analysis rule, at least one index included in the initial index table and the index data corresponding to each index may be checked, where the value fields of the index data corresponding to each index are recorded in a dictionary table, the value fields corresponding to the index in the dictionary table are searched, and then whether the index data corresponding to the index exceeds the value fields corresponding to the index is checked, and if yes, it is determined that abnormal information exists in the initial index table.
As an embodiment, the absence of the index in the initial index table may be that the description of the content corresponding to the business meaning of the index is incomplete, for example, for the index of "average height", the business meaning of the index cannot be clarified due to lack of the qualifier, and some qualifiers such as "Shanghai city", "Beijing city" may be added, so as to obtain the index of "average height of Shanghai city", that is, the index without the absence. The absence of the index data corresponding to the index in the initial index table may be that the index data corresponding to the important index is abnormal, the important index may be that the index data corresponding to the index cannot be a null value or an error value, if the index data corresponding to the important index is a null value or an error value, the index data corresponding to the important index is abnormal, the error value may be that the value of the index data is not within a preset range, or that the value type of the index data is not in compliance with a rule, etc., and for the data corresponding to the missing important index, abnormal information is determined in the initial index table, for example, the important index of the initial index table is an "identity card number", and for the initial index table, as shown in fig. 2:
TABLE 2
Name of name Identification card number The place of the household
Zhang San 120112XXXX05046978 A city
Li Si - B city
Wherein "-" indicates that the value is null, the initial index table is determined to have abnormal information because of the existence of data [ Li four, -, B city ].
Step S406, if the abnormal information exists in the initial index table, outputting a check report according to the abnormal information existing in the initial index table, and sending the check report to the first terminal; the check report is used for indicating the first terminal equipment to upload the missing index and the corresponding index data and/or is used for indicating the first terminal equipment to upload the missing index data.
In one embodiment, if the initial index table has abnormal information, a check report may be output according to the abnormal information, and then the check report may be sent to the first terminal, and the operator corresponding to the first terminal may supplement the index data in the initial index table according to the check report, upload the missing index and the corresponding index data according to the check report, and/or upload the missing index data, thereby obtaining a relatively perfect initial index table. In addition, the check report may further include primary key abnormality information, where the primary key abnormality information is that a primary key of the initial index table in the database is not unique, thereby causing an abnormality in the initial index table.
Fig. 5 is a schematic diagram of generating a check report, and as shown in fig. 5, a check operator obtains a check rule template in Excel format, which can be used to set a check rule. The checking rules are stored in a data checking knowledge base, and can be used for checking the index data in the initial index table based on the data checking knowledge base according to the checking rules. The check service personnel adds the check rule according to the check rule configuration template or modifies the check rule. The data checking knowledge base comprises three modules, namely a checking rule data dictionary, a problem data sample module and a checking report generation module. The checking rule data dictionary comprises at least one checking rule, and enables the checking script to execute checking operation according to the checking rule data dictionary, wherein the checking operation comprises value domain out-of-range checking, null value checking, inter-table relation checking, primary key uniqueness checking, longitudinal data quantity comparison checking and the like. Meanwhile, outputting a formatted check report and problem data acquisition, and backing up the problem data by the problem data sample module according to the output problem data acquisition, wherein the generated check report module can display the check report according to the formatted check report so that a check business operator can inquire and analyze the check report of the initial index table, and meanwhile, the check report is pushed to an interface so as to be sent to the first terminal.
Step S408, receiving the supplementary index data sent by the first terminal.
In one embodiment, the first terminal may send the supplementary index data to the electronic device in the form of a mail, or may generate an instruction containing the supplementary index data, and then receive the mail or the instruction that the first terminal sends the supplementary index data, so as to receive the supplementary index data sent by the first terminal.
Step S410, cleaning the index data and the supplementary index data in the initial index table according to the analysis rule.
As an embodiment, the supplementary index table may be generated from the supplementary index data, which is the index data in the initial index table, and then the data cleaning may be performed on the index data in the supplementary index table according to the analysis rule, or the data cleaning may be performed directly on the supplementary index data, which is the index data in the initial index table. If index data which does not accord with the strong rule exist in the index data and the supplementary index data in the initial index table, the index data which does not accord with the strong rule is not put into storage; if index data which does not accord with the weak rule exists in the index data and the supplementary index data in the initial index table, generating a warning mark corresponding to the index data which does not accord with the weak rule, and carrying out warehousing processing on the index data which does not accord with the weak rule and the warning mark corresponding to the index data which does not accord with the weak rule.
In step S412, if the first index data that does not conform to the strong rule exists in the initial index table, the first index data is not put in storage.
The description of step S412 may refer to the related description of step S106 in the above embodiment, which is not described herein.
In step S414, if there is second index data that does not conform to the weak rule in the initial index table, a warning identifier corresponding to the second index data is generated, and the second index data and the corresponding warning identifier are subjected to warehouse entry processing.
The description of step S414 may refer to the related description of step S108 in the above embodiment, and will not be repeated here.
In the embodiment of the application, after an initial index table is acquired and an analysis rule corresponding to the initial index table is acquired, at least one index contained in the initial index table and index data corresponding to each index are checked to judge whether abnormal information exists in the initial index table, if abnormal information exists in the initial index table, a check report is output according to the abnormal information existing in the initial index table, the check report is sent to a first terminal, supplementary index data sent by the first terminal are received, and the index data and the supplementary index data in the initial index table are subjected to data cleaning according to the analysis rule. According to the checking report, the index data corresponding to the abnormal information in the initial index table can be processed, so that a more complete initial index table is obtained, and a good data basis is provided for data cleaning of the index data in the initial index table, namely the supplementary index data, according to the analysis rule.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a cleaning device for government affair data according to an embodiment of the present application. The device can be applied to the electronic equipment, and is not particularly limited. As shown in fig. 6, the cleaning device 600 for government affair data may include: an acquisition module 601, a cleaning module 602, a first processing module 603, and a second processing module 604.
The table obtaining module 601 is configured to obtain an initial indicator table, and obtain an analysis rule corresponding to the initial indicator table; the initial index table comprises at least one index and index data corresponding to each index, and the analysis rule is determined according to historical government affair data; the analysis rules include at least one strong rule and/or at least one weak rule;
the cleaning module 602 is configured to perform data cleaning on the index data in the initial index table according to the analysis rule;
the first processing module 603 is configured to perform non-warehousing processing on the first index data if the first index data that does not conform to the strong rule exists in the initial index table;
and the second processing module 604 is configured to generate a warning identifier corresponding to the second index data if the second index data that does not conform to the weak rule exists in the initial index table, and perform a warehouse entry process on the second index data and the corresponding warning identifier.
As an alternative embodiment, the government affair data cleaning module device 600 further includes a preprocessing module:
the preprocessing module is used for preprocessing the index data in the initial index table to obtain preprocessed index data;
the cleaning module 603 is further configured to perform data cleaning on the preprocessed index data according to an analysis rule.
As an alternative embodiment, the government affair data cleaning module device 600 further includes a response module, a collection module, and a generation module:
the response module is used for responding to the configuration instruction and configuring at least one index;
the collection module is used for collecting index data corresponding to each index according to at least one index;
the generating module is used for generating an initial index table according to at least one index and index data corresponding to each index, and storing the initial index table.
As an optional implementation manner, the government affair data cleaning device 600 further includes an index determining module and a data obtaining module:
the index determining module is used for determining a first index corresponding to the first index data;
the data acquisition module is used for re-acquiring third index data corresponding to the first index, wherein the third index data accords with the strong rule;
The generating module is further configured to generate a target index table according to the third index data, the second index data, the corresponding warning identifier, and index data in the initial index table, where the index data meets an analysis rule.
As an optional implementation manner, the data acquisition module is further configured to generate a first form to be filled in according to the first index; transmitting a first form to be filled in to a first terminal; the first terminal is a terminal for uploading the initial index table; receiving a first form which is sent by a first terminal and is completed to be filled; and extracting third index data corresponding to the first index from the completed first form.
As an optional embodiment, the cleaning device 600 for government affair data further includes a warning information generating module and a transmitting module:
the warning information generation module is used for generating warning information according to the second index data and the corresponding warning mark;
the sending module is used for sending the warning information to the first terminal; the first terminal is a terminal for uploading the initial index table; the warning information is used for prompting that the second index data does not accord with the weak rule.
As an optional embodiment, the government affair data cleaning module device 600 further includes a checking module, an output module:
The checking module is used for checking at least one index contained in the initial index table and index data corresponding to each index so as to judge and determine whether the initial index table has abnormal information or not; the abnormal information comprises the absence of the index of the initial index table and/or the absence of index data corresponding to the index of the initial index table;
the output module is used for outputting a checking report according to the abnormal information in the initial index table if the abnormal information exists in the initial index table, and sending the checking report to the first terminal; the check report is used for indicating the first terminal equipment to upload the missing index and the corresponding index data and/or is used for indicating the first terminal equipment to upload the missing index data;
and the cleaning module is also used for receiving the supplementary index data sent by the first terminal and cleaning the index data and the supplementary index data in the initial index table according to the analysis rule.
In the embodiment of the application, an initial index table is obtained, an analysis rule corresponding to the initial index table is obtained, data cleaning is performed on index data in the initial index table according to the analysis rule, if first index data which does not accord with the strong rule exists in the initial index table, non-warehousing processing is performed on the first index data, if second index data which does not accord with the weak rule exists in the initial index table, warning identification corresponding to the second index data is generated, warehousing processing is performed on the second index data and the corresponding warning identification, for different initial index tables, according to the characteristics of the index data in the initial index table, the corresponding analysis rule can be selected for data cleaning of the index data, non-warehousing processing is performed on the index data which does not accord with the strong rule, and data cleaning is performed on the index data which does not accord with the weak rule and the corresponding warning identification, so that the index data in the initial index table is subjected to targeted data cleaning, and the data cleaning can be performed more efficiently, and the government data management and analysis efficiency is improved.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
As shown in fig. 7, the electronic device 700 may include:
a memory 701 storing executable program code;
a processor 702 coupled with the memory 701;
the processor 702 invokes executable program codes stored in the memory 701 to execute any of the methods for cleaning government affair data disclosed in the embodiments of the present application.
The embodiment of the application discloses a computer readable storage medium which stores a computer program, wherein the computer program, when executed by a processor, causes the processor to realize any one of the government affair data cleaning methods disclosed in the embodiment of the application.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments and that the acts and modules referred to are not necessarily required in the present application.
In various embodiments of the present application, it should be understood that the size of the sequence numbers of the above processes does not mean that the execution sequence of the processes is necessarily sequential, and the execution sequence of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-accessible memory. Based on such understanding, the technical solution of the present application, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a memory, including several requests for a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in the computer device) to perform part or all of the steps of the above-mentioned method of the various embodiments of the present application.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the above embodiments may be implemented by a program that instructs associated hardware, the program may be stored in a computer readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disk Memory, magnetic disk Memory, tape Memory, or any other medium that can be used for carrying or storing data that is readable by a computer.
The foregoing describes in detail a method, an apparatus, an electronic device, and a storage medium for cleaning government affair data, which are disclosed in the embodiments of the present application, and specific examples are applied herein to illustrate the principles and embodiments of the present application, where the foregoing description of the embodiments is only for helping to understand the method and core ideas of the present application. Meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. The method for cleaning government affair data is characterized by comprising the following steps:
acquiring an initial index table, and acquiring an analysis rule corresponding to the initial index table; the initial index table comprises at least one index and index data corresponding to each index, and the analysis rule is determined according to historical government affair data; the analysis rule comprises at least one strong rule and/or at least one weak rule;
performing data cleaning on the index data in the initial index table according to the analysis rule;
if the first index data which does not accord with the strong rule exists in the initial index table, performing non-warehousing processing on the first index data;
if the second index data which does not accord with the weak rule exists in the initial index table, generating a warning mark corresponding to the second index data, and warehousing the second index data and the corresponding warning mark.
2. The method of claim 1, wherein after the obtaining the initial metrics table and obtaining the analysis rules, the method further comprises:
preprocessing the index data in the initial index table to obtain preprocessed index data;
The step of cleaning the data of the index data in the initial index table according to the analysis rule includes:
and cleaning the data of the preprocessed index data according to the analysis rule.
3. The method of claim 1, wherein prior to the obtaining an initial indicator table and obtaining the analysis rule corresponding to the initial indicator table, the method further comprises:
configuring at least one index in response to the configuration instruction;
collecting index data corresponding to each index according to the at least one index;
generating an initial index table according to the at least one index and index data corresponding to each index, and storing the initial index table.
4. The method of claim 1, wherein after said subjecting the first index data to non-binning, the method further comprises:
determining a first index corresponding to the first index data;
re-acquiring third index data corresponding to the first index, wherein the third index data accords with the strong rule;
and generating a target index table according to the third index data, the second index data and the corresponding warning identifier and the index data conforming to the analysis rule in the initial index table.
5. The method of claim 4, wherein the re-acquiring the third index data corresponding to the first index comprises:
generating a first form to be filled in according to the first index;
transmitting the first form to be filled in to a first terminal; the first terminal is a terminal for uploading the initial index table;
receiving a first form which is sent by the first terminal and is completed to be filled;
and extracting third index data corresponding to the first index from the filled first form.
6. The method of claim 1, wherein after the generating the warning identifier corresponding to the second index data, the method further comprises:
generating warning information according to the second index data and the corresponding warning mark;
the warning information is sent to a first terminal; the first terminal is a terminal for uploading the initial index table; the warning information is used for prompting that the second index data does not accord with the weak rule.
7. The method of claim 1, wherein after the obtaining the initial indicator table and obtaining the analysis rule corresponding to the initial indicator table, the method further comprises:
Checking the at least one index contained in the initial index table and index data corresponding to each index to judge and determine whether abnormal information exists in the initial index table; the abnormal information comprises the absence of indexes of the initial index table and/or the absence of index data corresponding to the presence of indexes of the initial index table;
if the initial index table has abnormal information, outputting a check report according to the abnormal information in the initial index table, and sending the check report to a first terminal; the check report is used for indicating the first terminal equipment to upload the missing index and the corresponding index data, and/or is used for indicating the first terminal equipment to upload the missing index data;
receiving the supplementary index data sent by the first terminal;
the step of cleaning the data of the index data in the initial index table according to the analysis rule includes:
and cleaning the index data in the initial index table and the supplementary index data according to the analysis rule.
8. The utility model provides a belt cleaning device of government affairs data which characterized in that includes:
The table acquisition module is used for acquiring an initial index table and acquiring an analysis rule corresponding to the initial index table; the initial index table comprises at least one index and index data corresponding to each index, and the analysis rule is determined according to historical government affair data; the analysis rule comprises at least one strong rule and/or at least one weak rule;
the cleaning module is used for cleaning the data of the index data in the initial index table according to the analysis rule;
the first processing module is used for performing non-warehousing processing on the first index data if the first index data which does not accord with the strong rule exists in the initial index table;
and the second processing module is used for generating a warning mark corresponding to the second index data if the second index data which does not accord with the weak rule exists in the initial index table, and carrying out warehouse entry processing on the second index data and the corresponding warning mark.
9. An electronic device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to implement the method of any of claims 1 to 7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method according to any of claims 1 to 7.
CN202311478018.8A 2023-11-07 2023-11-07 Government affair data cleaning method and device, electronic equipment and storage medium Pending CN117520324A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311478018.8A CN117520324A (en) 2023-11-07 2023-11-07 Government affair data cleaning method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311478018.8A CN117520324A (en) 2023-11-07 2023-11-07 Government affair data cleaning method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117520324A true CN117520324A (en) 2024-02-06

Family

ID=89743184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311478018.8A Pending CN117520324A (en) 2023-11-07 2023-11-07 Government affair data cleaning method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117520324A (en)

Similar Documents

Publication Publication Date Title
CN108052618B (en) Data management method and device
CN112231333A (en) Ecological environment data sharing and exchanging method and system
CN104756106A (en) Characterizing data sources in a data storage system
CN103605651A (en) Data processing showing method based on on-line analytical processing (OLAP) multi-dimensional analysis
CN111400288A (en) Data quality inspection method and system
CN111078512A (en) Alarm record generation method and device, alarm equipment and storage medium
CN112817958A (en) Electric power planning data acquisition method and device and intelligent terminal
CN114880405A (en) Data lake-based data processing method and system
CN112926852A (en) Atmospheric ecological environment analysis method based on data fusion
CN111414410A (en) Data processing method, device, equipment and storage medium
CN113806343B (en) Evaluation method and system for Internet of vehicles data quality
CN113672609A (en) Method for generating resident pregnancy model label based on multi-source data fusion
CN111813773B (en) Power grid meter reading data storage method, uploading method, device and storage device
CN113159118A (en) Logistics data index processing method, device, equipment and storage medium
CN111125045B (en) Lightweight ETL processing platform
CN110502529B (en) Data processing method, device, server and storage medium
CN112486841A (en) Method and device for checking data collected by buried point
CN112416904A (en) Electric power data standardization processing method and device
CN111831528A (en) Computer system log association method and related device
CN117520324A (en) Government affair data cleaning method and device, electronic equipment and storage medium
CN115983582A (en) Data analysis method and energy consumption management system
CN117520386A (en) Index query method, system, electronic device and storage medium
CN111444254B (en) SKL system file format conversion method and system
CN110597899B (en) Project expense management method and system
CN114428813A (en) Data statistics method, device, equipment and storage medium based on report platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination