CN112650745A - Data management system based on unified data resource pool - Google Patents

Data management system based on unified data resource pool Download PDF

Info

Publication number
CN112650745A
CN112650745A CN202011643272.5A CN202011643272A CN112650745A CN 112650745 A CN112650745 A CN 112650745A CN 202011643272 A CN202011643272 A CN 202011643272A CN 112650745 A CN112650745 A CN 112650745A
Authority
CN
China
Prior art keywords
data
module
management
resources
resource pool
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011643272.5A
Other languages
Chinese (zh)
Inventor
瞿建栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Huansen Intelligent Technology Suzhou Co Ltd
Original Assignee
Zhongke Huansen Intelligent Technology Suzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Huansen Intelligent Technology Suzhou Co Ltd filed Critical Zhongke Huansen Intelligent Technology Suzhou Co Ltd
Priority to CN202011643272.5A priority Critical patent/CN112650745A/en
Publication of CN112650745A publication Critical patent/CN112650745A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data management system based on a unified data resource pool, which comprises: the data access module is used for realizing complete access of data resources of the data source; the data processing module is used for carrying out standardized processing of extraction, cleaning and conversion on data resources; the data management and control module is used for carrying out full life cycle management of data standard management, metadata management and resource catalog management on data resources; the data resource pool module is used for carrying out classified management on the data resources; the data service module is used for performing service encapsulation on the data resources and providing data resource services; and the data sharing and exchanging module is used for performing cross-service, cross-application and cross-department information sharing on the data resources. The data management system disclosed by the invention converges and manages the data resources under the planning of the unified information resources to form a unified data resource pool, and provides unified data directory service for various data sharing service requirements.

Description

Data management system based on unified data resource pool
Technical Field
The invention belongs to the field of data processing systems, and particularly relates to a data management system based on a unified data resource pool.
Background
The occurrence of the emergency is usually difficult to predict and has large destructiveness, and the harm to the human society is always amazing, wherein the emergency disaster is caused by the emergency natural disasters such as earthquake, volcanic eruption, debris flow, tsunami, typhoon, flood and the like, and the emergency accident disaster caused by the human activities such as pollutant leakage, water body pollution, soil pollution and the like, and the emergency related to public safety and public health also accounts for a considerable proportion. Therefore, the emergency service has the characteristics of diversity and comprehensiveness.
In the face of multi-source heterogeneous data resources, the data resources are required to be converged and managed to form a unified data resource pool, and unified data directory service is provided for internal business systems and various data sharing service requirements of external government departments at all levels.
Disclosure of Invention
The invention aims to: and the multi-source heterogeneous data resources are completely accessed to form a uniform data resource pool, and uniform data directory service is provided for various data sharing service requirements.
In order to achieve the purpose, the invention adopts the following technical scheme: a data governance system based on a unified data resource pool, comprising:
the data access module is used for realizing complete access of data resources of the data source;
the data processing module is used for carrying out standardized processing of extraction, cleaning and conversion on data resources;
the data management and control module is used for carrying out full life cycle management of data standard management, metadata management and resource catalog management on data resources;
the data resource pool module is used for carrying out classified management on the data resources;
the data service module is used for performing service encapsulation on the data resources and providing data resource services;
and the data sharing and exchanging module is used for performing cross-service, cross-application and cross-department information sharing on the data resources.
As a further description of the above technical solution:
the data source comprises external associated department data, internal business department and transcription department data of the emergency management bureau, social internet public data and perception data.
As a further description of the above technical solution:
the access mode of the data source comprises a database access mode, a file access mode, an interface calling mode and a data exchange mode.
As a further description of the above technical solution:
the data access module comprises a data probing sub-module, a data reading sub-module and a data reconciliation sub-module.
As a further description of the above technical solution:
the data probing submodule is used for performing service probing, access mode probing, field probing, data set probing, problem data probing and data pushing on source data.
As a further description of the above technical solution:
the data reading sub-module is used for detecting whether the data extracted from the source system or the data read from the designated position is consistent with the data definition. Stopping access in case of inconsistency, and probing and defining data again; and performing further access in a consistent manner, decrypting and decompressing the data to generate a record ID acting on the full life cycle of the data, performing character set conversion on the data, and converting the data into a format meeting the data processing requirement.
As a further description of the above technical solution:
and the data reconciliation sub-module is used for checking and verifying the integrity, consistency and correctness of the data provider and the data access party at a reconciliation node. And if the data numbers respectively corresponding to the data provider and the data access party at a certain account-checking time point are inconsistent, recording the abnormity and giving an alarm.
As a further description of the above technical solution:
the data resource pool module comprises an original library and a resource library, wherein the original library identifies data according to sources and stores the data by adopting different storage mechanisms according to data types, and the data stored in the original library forms standard data after data processing operations of cleaning, conversion, correlation and comparison and stores the standard data in the resource library.
As a further description of the above technical solution:
the cleaning of the data processing operation in the resource library comprises dictionary table mapping, data deduplication processing and data null processing.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. in the invention, the data access module realizes complete access of multi-source heterogeneous data resources mainly through data exploration, data reading, data reconciliation and the like in the face of data resources such as data resources, perception data, government affair data, industry data, enterprise data and the like.
2. According to the method, the accessed data resources are classified and built according to the data use purpose, the resources are planned uniformly, and an emergency management data resource pool comprising an original pool, a resource pool, a subject pool, a special subject pool and the like is formed by organizing and mining the data resources in a standard uniform and flow standard manner, so that the requirement of landing and building special data of each unit service in emergency management is met, and data support is provided for comprehensive display, data service and leader decision.
3. In the invention, the data management system utilizes the functional modules such as the data access module, the data processing module, the data management and control module, the data service module, the data sharing and exchanging module and the like to realize the convergence and management of data resources such as data resources, perception data, government affair data, industry data, enterprise data and the like, form a uniform data resource pool and provide uniform data directory service for the internal business system and various data sharing service requirements of various external government affair departments.
Drawings
FIG. 1 is a general architecture diagram of a unified data resource pool based data governance system.
FIG. 2 is an architecture diagram of a data probing submodule in a data administration system based on a unified data resource pool.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-2, the present invention provides a technical solution: a data governance system based on a unified data resource pool, comprising:
the data access module is used for realizing complete access of data resources of the data source;
the data processing module is used for carrying out standardized processing of extraction, cleaning and conversion on data resources;
the data management and control module is used for carrying out full life cycle management of data standard management, metadata management and resource catalog management on data resources;
the data resource pool module is used for carrying out classified management on the data resources;
the data service module is used for performing service encapsulation on the data resources and providing data resource services;
and the data sharing and exchanging module is used for performing cross-service, cross-application and cross-department information sharing on the data resources.
The data source comprises external associated department data, internal business department and transcription department data of the emergency management bureau, social internet public data and perception data. The data of the external association department and the data of the internal business department and the transcription department of the emergency management bureau can be accessed into the data buffer area through the business system by adopting the access preposition, thereby realizing the data access.
The access mode of the data source comprises a database access mode, a file access mode, an interface calling mode and a data exchange mode. For the existing established system access, the system which has been applied and integrated adopts a one-time full access mode to carry out data migration, and continues to use a system full data extraction and increment synchronization mode to carry out access; and planning a newly built system to carry out data access and exchange according to the real-time requirement. The data acquisition access strategy supports three modes of interactive access, batch data access and real-time data access.
The data access module comprises a data probing sub-module, a data reading sub-module and a data reconciliation sub-module.
The data probing submodule is used for performing service probing, access mode probing, field probing, data set probing, problem data probing and data pushing on source data. The data exploration sub-module conducts multi-dimensional exploration on the storage position, the providing mode, the total amount and the updating condition of the source data, the business meaning, the field format semantics and the value distribution, the data structure, the data quality and the like so as to achieve the purpose of recognizing the data and provide a basis for data definition.
The data reading sub-module is used for detecting whether the data extracted from the source system or the data read from the designated position is consistent with the data definition. Stopping access in case of inconsistency, and probing and defining data again; and performing further access in a consistent manner, decrypting and decompressing the data to generate a record ID acting on the full life cycle of the data, performing character set conversion on the data, and converting the data into a format meeting the data processing requirement.
And the data reconciliation sub-module is used for checking and verifying the integrity, consistency and correctness of the data provider and the data access party at a reconciliation node. And if the data numbers respectively corresponding to the data provider and the data access party at a certain account-checking time point are inconsistent, recording the abnormity and giving an alarm.
The data resource pool module comprises an original library and a resource library, wherein the original library identifies data according to sources and stores the data by adopting different storage mechanisms according to data types, and the data stored in the original library forms standard data after data processing operations of cleaning, conversion, correlation and comparison and stores the standard data in the resource library. The original library is used as a bridge between each data source system and the data resource pool, has the storage capacity of various data types, can be compatible with structured data, semi-structured data and unstructured data, and can be stored in different sources, types and time according to different data domain definitions. For the structured data, the data tables in the original library and the data tables provided by the source service system keep a one-to-one corresponding relationship, and some necessary fields such as extraction time, data source identification and the like are added in the data extraction process, so that the data extraction time and the source system are conveniently backtracked in the original library, and the representation of the blood relationship of the upper layer data is supported. Data in a source service system with large data storage amount and large daily increment are extracted and stored in a partitioned mode according to a specific rule, so that the data reading speed in the later period can be increased, and the computing resources consumed in the data management process can be reduced. The structural design of the resource library is basically consistent with that of the original library and is stored in the finest mode on the granularity; in terms of data content, standard data after the original library is standardized and dirty data generated by cleaning are stored, so that feedback to a source business department is facilitated, the data quality is promoted, and meanwhile risks caused by mistaken cleaning are reduced.
The cleaning of the data processing operation in the resource library comprises dictionary table mapping, data deduplication processing and data null processing. And generating a cleaning rule according to the data element, and uniformly finishing the mapping of the dictionary table. Data deduplication: by specifying data deduplication conditions, duplicate data is flushed out, leaving non-duplicate data. Data null processing: and filling the null value to ensure the correctness of subsequent data processing. The filling of null values needs to be done according to the service requirements.
The working principle is as follows: the data management system comprises a data access module, a data processing module, a data resource pool, a data management and control module, a data service module and a data sharing and exchanging module. The data access module is mainly used for realizing complete access of multi-source heterogeneous data resources through data exploration, data reading, data reconciliation and the like. The data processing module is mainly used for realizing the standardized processing of the data resources through the extraction, cleaning, conversion and the like of the data resources. The data management and control module mainly realizes the full life cycle management of the data resources through data standard management, metadata management, resource catalog management and the like. The data resource pool realizes the hierarchical classification management of data mainly through the construction of a resource library, a subject library, a special subject library and the like, and the asset value of the data resources is further improved. The data service is formed by the service encapsulation of the data resource, and the capability of providing the data resource service to the outside is formed. The data sharing exchange mainly realizes the information sharing of cross-service, cross-application and cross-department.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (9)

1. A data governance system based on a unified data resource pool, comprising:
the data access module is used for realizing complete access of data resources of the data source;
the data processing module is used for carrying out standardized processing of extraction, cleaning and conversion on data resources;
the data management and control module is used for carrying out full life cycle management of data standard management, metadata management and resource catalog management on data resources;
the data resource pool module is used for carrying out classified management on the data resources;
the data service module is used for performing service encapsulation on the data resources and providing data resource services;
and the data sharing and exchanging module is used for performing cross-service, cross-application and cross-department information sharing on the data resources.
2. The data governance system based on the unified data resource pool according to claim 1, wherein the data sources comprise external association department data, emergency management bureau internal business department and transcription department data, social internet public data, perception data.
3. The system of claim 1, wherein the data source access modes include a database access mode, a file access mode, an interface call mode, and a data exchange mode.
4. The unified data resource pool based data governance system according to claim 1, wherein the data access module comprises a data probing sub-module, a data reading sub-module and a data reconciliation sub-module.
5. The unified data resource pool based data governance system according to claim 4, wherein the data probing sub-module is configured to perform traffic probing, access mode probing, field probing, data set probing, problem data probing, and data pushing on source data.
6. The unified data resource pool based data governance system according to claim 4, wherein said data reading sub-module is configured to detect whether data extracted from the source system or read from a designated location is consistent with the data definition. Stopping access in case of inconsistency, and probing and defining data again; and performing further access in a consistent manner, decrypting and decompressing the data to generate a record ID acting on the full life cycle of the data, performing character set conversion on the data, and converting the data into a format meeting the data processing requirement.
7. The unified data resource pool based data governance system according to claim 4, wherein said data reconciliation sub-module is configured to verify and verify the integrity, consistency and correctness of the data provider and the data accessor at a reconciliation node. And if the data numbers respectively corresponding to the data provider and the data access party at a certain account-checking time point are inconsistent, recording the abnormity and giving an alarm.
8. The data governance system based on the unified data resource pool according to claim 1, wherein the data resource pool module comprises an original library and a resource library, the original library identifies data according to sources and stores the data by adopting different storage mechanisms according to data types, and the data stored in the original library forms standard data after being subjected to data processing operations of cleaning, conversion, association and comparison and stores the standard data in the resource library.
9. The unified data resource pool based data governance system according to claim 8, wherein said repository data handling operation cleaning comprises dictionary table mapping, data deduplication processing, and data deduplication processing.
CN202011643272.5A 2020-12-30 2020-12-30 Data management system based on unified data resource pool Pending CN112650745A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011643272.5A CN112650745A (en) 2020-12-30 2020-12-30 Data management system based on unified data resource pool

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011643272.5A CN112650745A (en) 2020-12-30 2020-12-30 Data management system based on unified data resource pool

Publications (1)

Publication Number Publication Date
CN112650745A true CN112650745A (en) 2021-04-13

Family

ID=75367917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011643272.5A Pending CN112650745A (en) 2020-12-30 2020-12-30 Data management system based on unified data resource pool

Country Status (1)

Country Link
CN (1) CN112650745A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157795A (en) * 2021-05-18 2021-07-23 国网宁夏电力有限公司 Power grid regulation and control operation multi-source data modeling and management system suitable for mobile application
CN113535707A (en) * 2021-08-05 2021-10-22 南京华飞数据技术有限公司 Method for managing personnel information data based on big data
CN116450620A (en) * 2023-06-12 2023-07-18 中国科学院空天信息创新研究院 Database design method and system for multi-source multi-domain space-time reference data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193858A (en) * 2017-03-28 2017-09-22 福州金瑞迪软件技术有限公司 Towards the intelligent Service application platform and method of multi-source heterogeneous data fusion
CN109344133A (en) * 2018-08-27 2019-02-15 成都四方伟业软件股份有限公司 A kind of data administer driving data and share exchange system and its working method
CN110781236A (en) * 2019-10-29 2020-02-11 山西云时代技术有限公司 Method for constructing government affair big data management system
CN111897863A (en) * 2020-07-31 2020-11-06 珠海市新德汇信息技术有限公司 Multi-source heterogeneous data fusion and convergence method
CN112035438A (en) * 2020-09-01 2020-12-04 江苏风云科技服务有限公司 Government affair big data platform system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193858A (en) * 2017-03-28 2017-09-22 福州金瑞迪软件技术有限公司 Towards the intelligent Service application platform and method of multi-source heterogeneous data fusion
CN109344133A (en) * 2018-08-27 2019-02-15 成都四方伟业软件股份有限公司 A kind of data administer driving data and share exchange system and its working method
CN110781236A (en) * 2019-10-29 2020-02-11 山西云时代技术有限公司 Method for constructing government affair big data management system
CN111897863A (en) * 2020-07-31 2020-11-06 珠海市新德汇信息技术有限公司 Multi-source heterogeneous data fusion and convergence method
CN112035438A (en) * 2020-09-01 2020-12-04 江苏风云科技服务有限公司 Government affair big data platform system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113157795A (en) * 2021-05-18 2021-07-23 国网宁夏电力有限公司 Power grid regulation and control operation multi-source data modeling and management system suitable for mobile application
CN113535707A (en) * 2021-08-05 2021-10-22 南京华飞数据技术有限公司 Method for managing personnel information data based on big data
CN116450620A (en) * 2023-06-12 2023-07-18 中国科学院空天信息创新研究院 Database design method and system for multi-source multi-domain space-time reference data
CN116450620B (en) * 2023-06-12 2023-09-12 中国科学院空天信息创新研究院 Database design method and system for multi-source multi-domain space-time reference data

Similar Documents

Publication Publication Date Title
CN112650745A (en) Data management system based on unified data resource pool
CN107301250B (en) Multi-source database collaborative backup method
US20120054174A1 (en) Geospatial database integration using business models
CN105139281A (en) Method and system for processing big data of electric power marketing
CN111984709A (en) Visual big data middle station-resource calling and algorithm
CN111460045A (en) Modeling method, model, computer device and storage medium for data warehouse construction
Qureshi et al. Towards efficient big data and data analytics: a review
CN103838847A (en) Data organization method oriented to sea-cloud collaboration network computing network
CN102929664A (en) Conventional data exchange method based on XSD structure
CN112527774A (en) Data center building method and system and storage medium
CN103678712A (en) Disaster information spatial-temporal database
CN112579563B (en) Power grid big data-based warehouse visualization modeling system and method
CN104252345A (en) Complex object management method and system in cloud environment
CN113360676A (en) Method and device for determining potential relation of enterprise based on knowledge graph
US11068646B2 (en) Merging documents based on document schemas
US8375011B2 (en) Safe multi-stream versioning in a metadata repository
CN115640300A (en) Big data management method, system, electronic equipment and storage medium
CN113094039B (en) Automatic code generation system based on database table
CN112084177B (en) Data pool application method and device based on data acquisition treatment and mining analysis
CN111753000A (en) Water supply network information system
CN111723253A (en) Data blood relationship query method and query system based on graph database
Jiang Investigation on the construction of urban intelligent emergency management system based on data mining technology
Chen et al. Survey on open source frameworks for big data analytics
Mao Construction of Intelligent Vocational Management Information System with R Programming
US20230185786A1 (en) Detect data standardization gaps

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination