CN117762954A - Automatic data management method - Google Patents

Automatic data management method Download PDF

Info

Publication number
CN117762954A
CN117762954A CN202311538206.5A CN202311538206A CN117762954A CN 117762954 A CN117762954 A CN 117762954A CN 202311538206 A CN202311538206 A CN 202311538206A CN 117762954 A CN117762954 A CN 117762954A
Authority
CN
China
Prior art keywords
data
report
database
updating
statistical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311538206.5A
Other languages
Chinese (zh)
Inventor
苏毓腾
刘妃妃
龙文武
张乐志
郑李梨
冯钊
骆斐
赵欢
张志中
杨定
梁豪辉
贾俊涛
余昌应
周昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qhdata Service Co ltd
Original Assignee
Shenzhen Qhdata Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qhdata Service Co ltd filed Critical Shenzhen Qhdata Service Co ltd
Priority to CN202311538206.5A priority Critical patent/CN117762954A/en
Publication of CN117762954A publication Critical patent/CN117762954A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of automatic data processing, and discloses an automatic data treatment method, which is based on a database of a statistical bureau and comprises the following steps: data acquisition, data writing and cleaning, alignment of data table fields, data management and data display. According to the invention, only the changed data record can be updated without affecting the record which does not need to be changed, so that the risk of data inconsistency is reduced; the method adopts a cyclic traversal mode to update the data in the database in batches, so that the requirement of manual intervention is reduced, the efficiency is improved, and the risk of manual errors is reduced, particularly when a large-scale data version is processed; the method adopts a data cleaning strategy in the data updating process, and utilizes a cyclic traversal method to propose repeated, unimportant and erroneous contents in the data, thereby improving the overall management of the data and providing a stable environment for subsequent batch updating.

Description

Automatic data management method
Technical Field
The invention belongs to the technical field of data automation processing, and particularly relates to a data automation treatment method.
Background
In the present information age, data acquisition, cleaning, integration, version control and presentation are critical to government departments, financial institutions and other organizations that need to manage large amounts of statistical report data, and conventional data management methods often require extensive manual intervention and processing, which is difficult to handle for large-scale data management and changes. Thus, there is a need for an automated data governance approach to improve the efficiency and quality of data management.
Disclosure of Invention
The invention aims to provide a data automation treatment method for solving the problems in the background technology.
In order to achieve the above object, the present invention provides the following technical solutions: a method of automated data governance, the method based on a statistical office database, comprising the steps of:
s1, acquiring data: comparing the data quantity in the report database with the data quantity in the data source, and if the data quantity is different, updating the report database in batches to obtain all data in the statistical office database;
if the data quantity is the same, carrying out MD5 encryption processing on a plurality of columns of data in each report by taking a report period as a unit, and then carrying out batch updating on a report database;
s2, data writing and cleaning: after data is acquired from a source, writing the data into a report database in a data stream form, writing historical data into an original layer of the report database, cleaning the data through a preset data cleaning strategy, and writing the cleaned data into a warehouse layer;
s3, aligning fields of a data table: the unified business element dictionary aligns the annual fields according to the field matching similarity by using a decision tree algorithm, if the field matching degree is more than 95% in the table structure of the same report in different years, the unified business element dictionary defaults to the same field, otherwise, the unified business element dictionary is written into a temporary table for machine operation and maintenance;
s4, data management: updating rural area division codes, industry codes, accounting standards and product classification codes for one year in daily life according to a statistical official network, finding corresponding tables to be modified from a blood-margin relation table after detecting the updating of a d im table by a system, and automatically updating data by the tables;
s5, data display: and carrying out large-screen cockpit projection display on the statistical report.
Preferably, the specific steps of the MD5 encryption process are as follows: and acquiring first encrypted data, carrying out MD5 encryption processing on a plurality of columns of data in each report of the data source by taking a report period as a unit, acquiring second encrypted data, comparing the first encrypted data with the second encrypted data by a polling method or a dichotomy, and if the similarity of the comparison result is small, determining that a new data report is added in the data source or an old data report is updated.
Preferably, before aligning the data table fields, the unified business element dictionary needs to integrate statistical business terms to distinguish synonyms and meanings, and the method further comprises the following steps:
and arranging the fields in the enterprise statistical report and the data corresponding to each field into a database report, manually downloading the structure of the statistical direct-reporting system statement, automatically arranging the structure of the table according to the number of the statement, and outputting the integrated files of different years.
Preferably, the data cleansing strategy is divided into the following steps:
s1, creating a data pool, linking a database source, importing data, and classifying;
s2, checking the classified data by using a function, eliminating repeated data, and merging;
s3, selecting more than three keywords for searching, and removing data with wrong content;
s4, repeating the steps 2 and 3 twice, traversing the processed data, and screening out data with wrong format.
Preferably, the machine operation and maintenance is based on artificial intelligence and machine learning and is used for replacing the traditional artificial operation and maintenance, the machine operation and maintenance uses cloud storage as a medium, data which are removed through alignment of data table fields are automatically arranged and tidied, the machine learning algorithm is utilized for analyzing the data, the reason of data errors is found, and feedback is carried out.
Preferably, the batch update of the data is as follows:
s1, backing up all data in a database and uploading the data to a cloud server at regular intervals;
s2, uploading all data in the report database to the database through cyclic traversal, and performing addition, deletion and verification operation on the data;
s3, carrying out equivalent replacement on the data with the same data quantity, and uploading the replaced and updated data part to the cloud server again;
s4, periodically updating the data encryption key, and setting a data comparison group.
Preferably, the data comparison group is a part with the same data in each batch update, and only data except the data comparison group is updated when only adding and deleting a plurality of parts are involved in the data update.
Preferably, the data comparison group is based on the data composition which is generated after each data batch update and is used for participating in the deletion and correction.
The beneficial effects of the invention are as follows:
the method is beneficial to maintaining the consistency of the data by comparing the data of the new version and the old version and executing the increment updating, and only the changed data record can be updated without influencing the record which does not need to be changed, thereby reducing the risk of data inconsistency; the method adopts a cyclic traversal mode to update the data in the database in batches, so that the requirement of manual intervention is reduced, the efficiency is improved, and the risk of manual errors is reduced, particularly when a large-scale data version is processed; the method adopts a data cleaning strategy in the data updating process, and utilizes a cyclic traversal method to propose repeated, unimportant and erroneous contents in the data, thereby improving the overall management of the data and providing a stable environment for subsequent batch updating.
In conclusion, the data automation treatment method can improve the efficiency and quality of data management, reduce the risk of data inconsistency, promote the automation processing and visual display of data, and provide a beneficial tool and method for the decision making and data management of organizations.
Drawings
FIG. 1 is a schematic flow chart of the method of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides a data automation treatment method, which is based on a database of a statistical office, and includes the following steps:
s1, acquiring data: comparing the data quantity in the report database with the data quantity in the data source, and if the data quantity is different, updating the report database in batches to obtain all data in the statistical office database;
if the data quantity is the same, carrying out MD5 encryption processing on a plurality of columns of data in each report by taking a report period as a unit, and then carrying out batch updating on a report database;
s2, data writing and cleaning: after data is acquired from a source, writing the data into a report database in a data stream form, writing historical data into an original layer of the report database, cleaning the data through a preset data cleaning strategy, and writing the cleaned data into a warehouse layer;
s3, aligning fields of a data table: the unified business element dictionary aligns the annual fields according to the field matching similarity by using a decision tree algorithm, if the field matching degree is more than 95% in the table structure of the same report in different years, the unified business element dictionary defaults to the same field, otherwise, the unified business element dictionary is written into a temporary table for machine operation and maintenance;
s4, data management: updating rural area division codes, industry codes, accounting standards and product classification codes for one year in daily life according to a statistical official network, finding corresponding tables to be modified from a blood-margin relation table after detecting the updating of a d im table by a system, and automatically updating data by the tables;
s5, data display: and carrying out large-screen cockpit projection display on the statistical report.
The method is based on the automatic data processing method, greatly improves the data processing efficiency, adopts the echelon configuration of acquisition, cleaning and treatment for batch updating of the database of the statistical bureau, orderly classifies, sorts and filters the data, and completely overcomes the defects of low efficiency and low data treatment quality caused by manual processing.
In this embodiment, the specific steps of the MD5 encryption process are as follows: and acquiring first encrypted data, carrying out MD5 encryption processing on a plurality of columns of data in each report of the data source by taking a report period as a unit, acquiring second encrypted data, comparing the first encrypted data with the second encrypted data by a polling method or a dichotomy, and if the similarity of the comparison result is small, determining that a new data report is added in the data source or an old data report is updated.
Through data acquisition and MD5 encryption processing, the consistency of the data in the report database and a data source is ensured, any inconsistency is detected and processed, so that the integrity and consistency of the data are improved, the data are protected through the MD5 encryption processing, sensitive information is more difficult to access or tamper maliciously, the data security is improved, and the data is prevented from being revealed or modified by unauthorized visitors.
In this embodiment, before the unified service element dictionary, before the alignment of the data table fields, the statistical service terms are integrated, and the synonym words are distinguished, and the steps further include:
and arranging the fields in the enterprise statistical report and the data corresponding to each field into a database report, manually downloading the structure of the statistical direct-reporting system statement, automatically arranging the structure of the table according to the number of the statement, and outputting the integrated files of different years.
Before alignment of the data table fields, a unified business element dictionary is needed, which comprises fields in an enterprise statistical report and data corresponding to each field, and the data is filled into the data report after arrangement, so that the arrangement is beneficial to the high efficiency of alignment of the data table fields.
In this embodiment, the data cleansing strategy includes the following steps:
s1, creating a data pool, linking a database source, importing data, and classifying;
s2, checking the classified data by using a function, eliminating repeated data, and merging;
s3, selecting more than three keywords for searching, and removing data with wrong content;
s4, repeating the steps 2 and 3 twice, traversing the processed data, and screening out data with wrong format.
Data cleansing is the discovery and correction of identifiable errors in data to preserve the integrity, uniqueness, and consistency of the data.
In this embodiment, the machine operation and maintenance is based on artificial intelligence and machine learning, and is used to replace traditional artificial operation and maintenance, the machine operation and maintenance uses cloud storage as a medium, and automatically arranges and sorts the data removed by aligning the fields of the data table, and uses the machine learning algorithm to analyze the data, find out the cause of the data error, and perform feedback.
The machine operation and maintenance replaces the manual operation and maintenance, has higher efficient processing speed, and when the matching similarity is aligned to the field of the past year, if the matching degree is lower than 95%, the field is written into the temporary table and the machine operation and maintenance is performed.
In this embodiment, the batch update of data includes the following steps:
s1, backing up all data in a database and uploading the data to a cloud server at regular intervals;
s2, uploading all data in the report database to the database through cyclic traversal, and performing addition, deletion and verification operation on the data;
s3, carrying out equivalent replacement on the data with the same data quantity, and uploading the replaced and updated data part to the cloud server again;
s4, periodically updating the data encryption key, and setting a data comparison group.
The batch updating of the data adopts traversal with higher reliability, the data is protected from the emergency by backup and uploading to the cloud server, and the batch updating can improve the updating speed of the data.
In this embodiment, the data comparison group is the same part of data in each batch update, and only data except the data comparison group is updated when only adding and deleting a plurality of parts are involved in the data update.
The control group can mark the database which is updated and changed in the database, and the selection is not moved in the cycle traversal, so that the data batch updating speed is improved.
In this embodiment, the data comparison group is based on the data composition for participating in the addition, deletion and verification generated after each batch of data update.
It is noted that relational terms such as first and second, and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims (8)

1. A method for automated data governance based on a statistical office database, characterized by: the method comprises the following steps:
s1, acquiring data: comparing the data quantity in the report database with the data quantity in the data source, and if the data quantity is different, updating the report database in batches to obtain all data in the statistical office database;
if the data quantity is the same, carrying out MD5 encryption processing on a plurality of columns of data in each report by taking a report period as a unit, and then carrying out batch updating on a report database;
s2, data writing and cleaning: after data is acquired from a source, writing the data into a report database in a data stream form, writing historical data into an original layer of the report database, cleaning the data through a preset data cleaning strategy, and writing the cleaned data into a warehouse layer;
s3, aligning fields of a data table: the unified business element dictionary aligns the annual fields according to the field matching similarity by using a decision tree algorithm, if the field matching degree is more than 95% in the table structure of the same report in different years, the unified business element dictionary defaults to the same field, otherwise, the unified business element dictionary is written into a temporary table for machine operation and maintenance;
s4, data management: updating country region codes, industry codes, accounting standards and product classification codes for one year in daily life according to a statistical official network, finding corresponding tables to be modified from a blood-margin relation table after detecting dim table updating by a system, and automatically updating data by the tables;
s5, data display: and carrying out large-screen cockpit projection display on the statistical report.
Through data acquisition and MD5 encryption processing in S1, data in a report database is ensured to be consistent with a data source, any inconsistent part is detected and processed, so that the integrity and consistency of the data are improved, the data are protected through MD5 encryption processing, sensitive information is more difficult to access or tamper with maliciously, the data security is improved, and the data is ensured not to be leaked or modified by unauthorized visitors.
2. A method of automated data governance according to claim 1 and wherein: the MD5 encryption process comprises the following specific steps: and acquiring first encrypted data, carrying out MD5 encryption processing on a plurality of columns of data in each report of the data source by taking a report period as a unit, acquiring second encrypted data, comparing the first encrypted data with the second encrypted data by a polling method or a dichotomy, and if the similarity of the comparison result is small, determining that a new data report is added in the data source or an old data report is updated.
3. A method of automated data governance according to claim 1 and wherein: before the unified business element dictionary, before the alignment of the data table fields, integrating statistical business terms to distinguish synonymous words, the steps further comprise:
and arranging the fields in the enterprise statistical report and the data corresponding to each field into a database report, manually downloading the structure of the statistical direct-reporting system statement, automatically arranging the structure of the table according to the number of the statement, and outputting the integrated files of different years.
4. A method of automated data governance according to claim 1 and wherein: the data cleaning strategy comprises the following steps:
s1, creating a data pool, linking a database source, importing data, and classifying;
s2, checking the classified data by using a function, eliminating repeated data, and merging;
s3, selecting more than three keywords for searching, and removing data with wrong content;
s4, repeating the steps 2 and 3 twice, traversing the processed data, and screening out data with wrong format.
5. A method of automated data governance according to claim 1 and wherein: the machine operation and maintenance is based on artificial intelligence and machine learning and is used for replacing traditional artificial operation and maintenance, the machine operation and maintenance takes cloud storage as a medium, data which are removed through alignment of data table fields are automatically arranged and tidied, the machine learning algorithm is utilized for analyzing the data, the cause of data errors is found out, and feedback is carried out.
6. A method of automated data governance according to claim 1 and wherein: the batch updating steps of the data are as follows:
s1, backing up all data in a database and uploading the data to a cloud server at regular intervals;
s2, uploading all data in the report database to the database through cyclic traversal, and performing addition, deletion and verification operation on the data;
s3, carrying out equivalent replacement on the data with the same data quantity, and uploading the replaced and updated data part to the cloud server again;
s4, periodically updating the data encryption key, and setting a data comparison group.
7. The method for automated data governance of claim 6, wherein: the data comparison group is the same data part in each batch update, and only the data except the data comparison group is updated when only a plurality of parts are added and deleted during the data update.
8. The method for automated data governance of claim 8, wherein: the data comparison group is composed based on data which is generated after each data batch update and is used for participating in the addition, deletion and examination.
CN202311538206.5A 2023-11-17 2023-11-17 Automatic data management method Pending CN117762954A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311538206.5A CN117762954A (en) 2023-11-17 2023-11-17 Automatic data management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311538206.5A CN117762954A (en) 2023-11-17 2023-11-17 Automatic data management method

Publications (1)

Publication Number Publication Date
CN117762954A true CN117762954A (en) 2024-03-26

Family

ID=90318917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311538206.5A Pending CN117762954A (en) 2023-11-17 2023-11-17 Automatic data management method

Country Status (1)

Country Link
CN (1) CN117762954A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060248058A1 (en) * 2005-04-28 2006-11-02 Feng Andrew A Method, apparatus, and system for unifying heterogeneous data sources for access from online applications
CN104021132A (en) * 2013-12-08 2014-09-03 郑州正信科技发展股份有限公司 Method and system for verification of consistency of backup data of host database and backup database
CN110569298A (en) * 2019-09-12 2019-12-13 成都中科大旗软件股份有限公司 data docking and visualization method and system
CN112650731A (en) * 2020-12-22 2021-04-13 浪潮云信息技术股份公司 Theme library construction method and system based on data governance
CN112699175A (en) * 2021-01-15 2021-04-23 广州汇智通信技术有限公司 Data management system and method thereof
CN113486008A (en) * 2021-06-30 2021-10-08 平安信托有限责任公司 Data blood margin analysis method, device, equipment and storage medium
WO2022077166A1 (en) * 2020-10-12 2022-04-21 深圳晶泰科技有限公司 Data processing method and system for drug research and development

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060248058A1 (en) * 2005-04-28 2006-11-02 Feng Andrew A Method, apparatus, and system for unifying heterogeneous data sources for access from online applications
CN104021132A (en) * 2013-12-08 2014-09-03 郑州正信科技发展股份有限公司 Method and system for verification of consistency of backup data of host database and backup database
CN110569298A (en) * 2019-09-12 2019-12-13 成都中科大旗软件股份有限公司 data docking and visualization method and system
WO2022077166A1 (en) * 2020-10-12 2022-04-21 深圳晶泰科技有限公司 Data processing method and system for drug research and development
CN112650731A (en) * 2020-12-22 2021-04-13 浪潮云信息技术股份公司 Theme library construction method and system based on data governance
CN112699175A (en) * 2021-01-15 2021-04-23 广州汇智通信技术有限公司 Data management system and method thereof
CN113486008A (en) * 2021-06-30 2021-10-08 平安信托有限责任公司 Data blood margin analysis method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113010506B (en) Multi-source heterogeneous water environment big data management system
CN110019176B (en) Data management control system for improving success rate of data management service
US9201738B2 (en) Method, computer readable storage medium and computer system for obtaining snapshots of data
US7181461B2 (en) System and method for real time statistics collection for use in the automatic management of a database system
US7933932B2 (en) Statistics based database population
US20070299856A1 (en) Data aggregation
CN112199433A (en) Data management system for city-level data middling station
CN110442620B (en) Big data exploration and cognition method, device, equipment and computer storage medium
CN111400354B (en) Machine tool manufacturing BOM (Bill of Material) storage query and tree structure construction method based on MES (manufacturing execution System)
CN112000656A (en) Intelligent data cleaning method and device based on metadata
CN114780370A (en) Data correction method and device based on log, electronic equipment and storage medium
CN113326247A (en) Cloud data migration method and device and electronic equipment
CN115952160B (en) Data checking method
CN109871378A (en) The data acquisition and processing (DAP) method and system of big data platform
CN112783873A (en) Forest resource spatial data management method, device and equipment under one-to-many relationship
CN117762954A (en) Automatic data management method
CN109063063B (en) Data processing method and device based on multi-source data
CN111026760A (en) CDC data acquisition method based on multidimensional service time
CN115098585A (en) Automatic law and regulation data processing method and system based on big data
CN114860690A (en) Data migration method, device, equipment and storage medium
CN113111046A (en) Data management system based on main data drive
US20020178140A1 (en) Method for characterizing and storing data analyses in an analysis database
CN113934712B (en) Method, device and equipment for processing field model of industrial quality inspection data
CN117608536B (en) Gap data online template customization and supplementary recording system and method thereof
CN116501788B (en) Storehouse lake integrated data management and control platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination