CN117762954A

CN117762954A - Automatic data management method

Info

Publication number: CN117762954A
Application number: CN202311538206.5A
Authority: CN
Inventors: 苏毓腾; 刘妃妃; 龙文武; 张乐志; 郑李梨; 冯钊; 骆斐; 赵欢; 张志中; 杨定; 梁豪辉; 贾俊涛; 余昌应; 周昊
Original assignee: Shenzhen Qhdata Service Co ltd
Current assignee: Shenzhen Qhdata Service Co ltd
Priority date: 2023-11-17
Filing date: 2023-11-17
Publication date: 2024-03-26

Abstract

The invention belongs to the technical field of automatic data processing, and discloses an automatic data treatment method, which is based on a database of a statistical bureau and comprises the following steps: data acquisition, data writing and cleaning, alignment of data table fields, data management and data display. According to the invention, only the changed data record can be updated without affecting the record which does not need to be changed, so that the risk of data inconsistency is reduced; the method adopts a cyclic traversal mode to update the data in the database in batches, so that the requirement of manual intervention is reduced, the efficiency is improved, and the risk of manual errors is reduced, particularly when a large-scale data version is processed; the method adopts a data cleaning strategy in the data updating process, and utilizes a cyclic traversal method to propose repeated, unimportant and erroneous contents in the data, thereby improving the overall management of the data and providing a stable environment for subsequent batch updating.

Description

Automatic data management method

Technical Field

The invention belongs to the technical field of data automation processing, and particularly relates to a data automation treatment method.

Background

In the present information age, data acquisition, cleaning, integration, version control and presentation are critical to government departments, financial institutions and other organizations that need to manage large amounts of statistical report data, and conventional data management methods often require extensive manual intervention and processing, which is difficult to handle for large-scale data management and changes. Thus, there is a need for an automated data governance approach to improve the efficiency and quality of data management.

Disclosure of Invention

The invention aims to provide a data automation treatment method for solving the problems in the background technology.

In order to achieve the above object, the present invention provides the following technical solutions: a method of automated data governance, the method based on a statistical office database, comprising the steps of:

s1, acquiring data: comparing the data quantity in the report database with the data quantity in the data source, and if the data quantity is different, updating the report database in batches to obtain all data in the statistical office database;

if the data quantity is the same, carrying out MD5 encryption processing on a plurality of columns of data in each report by taking a report period as a unit, and then carrying out batch updating on a report database;

s2, data writing and cleaning: after data is acquired from a source, writing the data into a report database in a data stream form, writing historical data into an original layer of the report database, cleaning the data through a preset data cleaning strategy, and writing the cleaned data into a warehouse layer;

s3, aligning fields of a data table: the unified business element dictionary aligns the annual fields according to the field matching similarity by using a decision tree algorithm, if the field matching degree is more than 95% in the table structure of the same report in different years, the unified business element dictionary defaults to the same field, otherwise, the unified business element dictionary is written into a temporary table for machine operation and maintenance;

s4, data management: updating rural area division codes, industry codes, accounting standards and product classification codes for one year in daily life according to a statistical official network, finding corresponding tables to be modified from a blood-margin relation table after detecting the updating of a d im table by a system, and automatically updating data by the tables;

s5, data display: and carrying out large-screen cockpit projection display on the statistical report.

Preferably, the specific steps of the MD5 encryption process are as follows: and acquiring first encrypted data, carrying out MD5 encryption processing on a plurality of columns of data in each report of the data source by taking a report period as a unit, acquiring second encrypted data, comparing the first encrypted data with the second encrypted data by a polling method or a dichotomy, and if the similarity of the comparison result is small, determining that a new data report is added in the data source or an old data report is updated.

Preferably, before aligning the data table fields, the unified business element dictionary needs to integrate statistical business terms to distinguish synonyms and meanings, and the method further comprises the following steps:

and arranging the fields in the enterprise statistical report and the data corresponding to each field into a database report, manually downloading the structure of the statistical direct-reporting system statement, automatically arranging the structure of the table according to the number of the statement, and outputting the integrated files of different years.

Preferably, the data cleansing strategy is divided into the following steps:

s1, creating a data pool, linking a database source, importing data, and classifying;

s2, checking the classified data by using a function, eliminating repeated data, and merging;

s3, selecting more than three keywords for searching, and removing data with wrong content;

s4, repeating the steps 2 and 3 twice, traversing the processed data, and screening out data with wrong format.

Preferably, the machine operation and maintenance is based on artificial intelligence and machine learning and is used for replacing the traditional artificial operation and maintenance, the machine operation and maintenance uses cloud storage as a medium, data which are removed through alignment of data table fields are automatically arranged and tidied, the machine learning algorithm is utilized for analyzing the data, the reason of data errors is found, and feedback is carried out.

Preferably, the batch update of the data is as follows:

s1, backing up all data in a database and uploading the data to a cloud server at regular intervals;

s2, uploading all data in the report database to the database through cyclic traversal, and performing addition, deletion and verification operation on the data;

s3, carrying out equivalent replacement on the data with the same data quantity, and uploading the replaced and updated data part to the cloud server again;

s4, periodically updating the data encryption key, and setting a data comparison group.

Preferably, the data comparison group is a part with the same data in each batch update, and only data except the data comparison group is updated when only adding and deleting a plurality of parts are involved in the data update.

Preferably, the data comparison group is based on the data composition which is generated after each data batch update and is used for participating in the deletion and correction.

The beneficial effects of the invention are as follows:

the method is beneficial to maintaining the consistency of the data by comparing the data of the new version and the old version and executing the increment updating, and only the changed data record can be updated without influencing the record which does not need to be changed, thereby reducing the risk of data inconsistency; the method adopts a cyclic traversal mode to update the data in the database in batches, so that the requirement of manual intervention is reduced, the efficiency is improved, and the risk of manual errors is reduced, particularly when a large-scale data version is processed; the method adopts a data cleaning strategy in the data updating process, and utilizes a cyclic traversal method to propose repeated, unimportant and erroneous contents in the data, thereby improving the overall management of the data and providing a stable environment for subsequent batch updating.

In conclusion, the data automation treatment method can improve the efficiency and quality of data management, reduce the risk of data inconsistency, promote the automation processing and visual display of data, and provide a beneficial tool and method for the decision making and data management of organizations.

Drawings

FIG. 1 is a schematic flow chart of the method of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, an embodiment of the present invention provides a data automation treatment method, which is based on a database of a statistical office, and includes the following steps:

The method is based on the automatic data processing method, greatly improves the data processing efficiency, adopts the echelon configuration of acquisition, cleaning and treatment for batch updating of the database of the statistical bureau, orderly classifies, sorts and filters the data, and completely overcomes the defects of low efficiency and low data treatment quality caused by manual processing.

In this embodiment, the specific steps of the MD5 encryption process are as follows: and acquiring first encrypted data, carrying out MD5 encryption processing on a plurality of columns of data in each report of the data source by taking a report period as a unit, acquiring second encrypted data, comparing the first encrypted data with the second encrypted data by a polling method or a dichotomy, and if the similarity of the comparison result is small, determining that a new data report is added in the data source or an old data report is updated.

Through data acquisition and MD5 encryption processing, the consistency of the data in the report database and a data source is ensured, any inconsistency is detected and processed, so that the integrity and consistency of the data are improved, the data are protected through the MD5 encryption processing, sensitive information is more difficult to access or tamper maliciously, the data security is improved, and the data is prevented from being revealed or modified by unauthorized visitors.

In this embodiment, before the unified service element dictionary, before the alignment of the data table fields, the statistical service terms are integrated, and the synonym words are distinguished, and the steps further include:

Before alignment of the data table fields, a unified business element dictionary is needed, which comprises fields in an enterprise statistical report and data corresponding to each field, and the data is filled into the data report after arrangement, so that the arrangement is beneficial to the high efficiency of alignment of the data table fields.

In this embodiment, the data cleansing strategy includes the following steps:

Data cleansing is the discovery and correction of identifiable errors in data to preserve the integrity, uniqueness, and consistency of the data.

In this embodiment, the machine operation and maintenance is based on artificial intelligence and machine learning, and is used to replace traditional artificial operation and maintenance, the machine operation and maintenance uses cloud storage as a medium, and automatically arranges and sorts the data removed by aligning the fields of the data table, and uses the machine learning algorithm to analyze the data, find out the cause of the data error, and perform feedback.

The machine operation and maintenance replaces the manual operation and maintenance, has higher efficient processing speed, and when the matching similarity is aligned to the field of the past year, if the matching degree is lower than 95%, the field is written into the temporary table and the machine operation and maintenance is performed.

In this embodiment, the batch update of data includes the following steps:

The batch updating of the data adopts traversal with higher reliability, the data is protected from the emergency by backup and uploading to the cloud server, and the batch updating can improve the updating speed of the data.

In this embodiment, the data comparison group is the same part of data in each batch update, and only data except the data comparison group is updated when only adding and deleting a plurality of parts are involved in the data update.

The control group can mark the database which is updated and changed in the database, and the selection is not moved in the cycle traversal, so that the data batch updating speed is improved.

In this embodiment, the data comparison group is based on the data composition for participating in the addition, deletion and verification generated after each batch of data update.

It is noted that relational terms such as first and second, and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims

1. A method for automated data governance based on a statistical office database, characterized by: the method comprises the following steps:

s4, data management: updating country region codes, industry codes, accounting standards and product classification codes for one year in daily life according to a statistical official network, finding corresponding tables to be modified from a blood-margin relation table after detecting dim table updating by a system, and automatically updating data by the tables;

Through data acquisition and MD5 encryption processing in S1, data in a report database is ensured to be consistent with a data source, any inconsistent part is detected and processed, so that the integrity and consistency of the data are improved, the data are protected through MD5 encryption processing, sensitive information is more difficult to access or tamper with maliciously, the data security is improved, and the data is ensured not to be leaked or modified by unauthorized visitors.

2. A method of automated data governance according to claim 1 and wherein: the MD5 encryption process comprises the following specific steps: and acquiring first encrypted data, carrying out MD5 encryption processing on a plurality of columns of data in each report of the data source by taking a report period as a unit, acquiring second encrypted data, comparing the first encrypted data with the second encrypted data by a polling method or a dichotomy, and if the similarity of the comparison result is small, determining that a new data report is added in the data source or an old data report is updated.

3. A method of automated data governance according to claim 1 and wherein: before the unified business element dictionary, before the alignment of the data table fields, integrating statistical business terms to distinguish synonymous words, the steps further comprise:

4. A method of automated data governance according to claim 1 and wherein: the data cleaning strategy comprises the following steps:

5. A method of automated data governance according to claim 1 and wherein: the machine operation and maintenance is based on artificial intelligence and machine learning and is used for replacing traditional artificial operation and maintenance, the machine operation and maintenance takes cloud storage as a medium, data which are removed through alignment of data table fields are automatically arranged and tidied, the machine learning algorithm is utilized for analyzing the data, the cause of data errors is found out, and feedback is carried out.

6. A method of automated data governance according to claim 1 and wherein: the batch updating steps of the data are as follows:

7. The method for automated data governance of claim 6, wherein: the data comparison group is the same data part in each batch update, and only the data except the data comparison group is updated when only a plurality of parts are added and deleted during the data update.

8. The method for automated data governance of claim 8, wherein: the data comparison group is composed based on data which is generated after each data batch update and is used for participating in the addition, deletion and examination.