CN117076463B - Multi-source data aggregation storage system for smart city - Google Patents

Multi-source data aggregation storage system for smart city Download PDF

Info

Publication number
CN117076463B
CN117076463B CN202311330365.6A CN202311330365A CN117076463B CN 117076463 B CN117076463 B CN 117076463B CN 202311330365 A CN202311330365 A CN 202311330365A CN 117076463 B CN117076463 B CN 117076463B
Authority
CN
China
Prior art keywords
data
library
entity
conversion
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311330365.6A
Other languages
Chinese (zh)
Other versions
CN117076463A (en
Inventor
赵凌园
江雨韩
李魏
山鑫
张焰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huantian Smart Technology Co ltd
Original Assignee
Huantian Smart Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huantian Smart Technology Co ltd filed Critical Huantian Smart Technology Co ltd
Priority to CN202311330365.6A priority Critical patent/CN117076463B/en
Publication of CN117076463A publication Critical patent/CN117076463A/en
Application granted granted Critical
Publication of CN117076463B publication Critical patent/CN117076463B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Abstract

The invention discloses a smart city multi-source data convergence storage system, which comprises a data convergence system and a data storage system; the data aggregation system is used for extracting information of data from a data source, cleaning the data, correcting errors and converting the structure; the data aggregation system comprises a conversion constructor module, a task constructor module, a data conversion and task processing module, a database connection management module, a server and cluster management module, a resource library management module and a system monitoring module; the data storage system comprises a solid model and a DIKW model; the data aggregation storage method based on the middle platform technology establishes a solid data base for smart city construction, and realizes the mining and utilization of the data value to a greater extent.

Description

Multi-source data aggregation storage system for smart city
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a smart city multi-source data aggregation storage system.
Background
In recent years, in a large environment of policy guidance and technical development, smart city construction is being vigorously developed in various places. Multiple cities recognize that smart cities are important manifestations of urban energy level and core competitiveness, and the management of various data in construction is treated as a key problem for smart city construction at the same time. The novel smart city construction taking the data as the support becomes an important means for solving the city problem, improving the city management capability and promoting the high-quality development of the city.
There are also a number of technical solutions and means for obtaining data, managing data, and applying data. However, under the large scale of cities, the data types are various and the application range is wide, so how to more effectively perform aggregation management on multi-source data, and further provide more powerful support for various business applications in smart cities, and the method is a breakthrough difficulty in various technical schemes.
The existing data management system based on the middle platform technology can be mostly divided into four layers: infrastructure layer, data layer, management layer, service layer, wherein:
the infrastructure layer is composed of a server, a cloud server, a network communication facility, a data acquisition terminal and the like, and provides the running environment and data acquisition and other capabilities of the whole data management system;
the data layer is composed of various databases and multi-source data, and the databases mainly comprise a relational database, a non-relational database, a spatial database and the like; the multi-source data sources comprise urban basic space-time data, data collected by each service system, urban management operation data, internet of things data and the like, and the data sources are very wide and the data formats are numerous; particularly, under the large-scale range of cities, how to effectively gather and form an organic whole data layer for massive data generated by aspects of city planning, construction, management, operation and the like is a key problem of the research of the prior art;
The management layer is represented by a data management platform and mainly faces daily management maintenance and data use, functions of data browsing, data processing, data extraction, data updating, historical version creation and the like are realized through a data management system, and the data management system mainly manages and stores original data in a database;
the service layer is the embodiment of the capability of the data center, provides standard data service through the encapsulation of the interface, and is called by each business application of the upper layer to form unified northbound data capability, thereby being convenient for data management and application development.
The patent with publication number of CN113783873A, namely a data management platform based on data center technology, is written in, and through a central processing module, a data acquisition module, a data center module and a data security module of the platform, the influence on the security of actual data caused by the falsification of the service time of a system can be avoided, but the complexity of multi-source data aggregation management is not solved, and the data support service cannot be provided for upper-layer application.
The method for managing the data of the full life cycle of the application, which aims at mass data acquisition, calculation, storage and management, is provided by writing in a patent with publication number of CN112527774A, namely a data center building method, a system and a storage medium, so that a manager of the data application can be informed of complex dependency relations among the data, the application and the system, and further the data can be effectively managed. The method can cope with the situation of large data volume, but cannot well solve the situations of complex data types and large data sources.
The patent with publication number CN111209269A, namely a smart city big data management system, is written to provide a big data center, which can form a city brain database, thereby opening a communication interface to provide service to the outside and assist decision application. The method has the advantages that the steps of data aggregation, management and application are in series, but the problem of multi-source heterogeneous of data cannot be well solved, and the value of the data is not mined deeply.
In the construction of smart cities, the data involved are extremely large and complex, and reflect the development and change of cities to some extent, and if the multi-source data can be applied to aspects of smart cities through governance and deep mining, the value of the data can be better reflected. The existing smart city data management technology is mainly based on a computer and network communication technology, uses a traditional network communication infrastructure to support the bottom layer, adopts a MySQL, postgis database system and the like to store data, uses a data center to manage and provide service, and further builds upper-layer application. Although the platform technology is adopted, unified data collection, storage, management and standard service provision to upper-layer application are realized, huge and complicated data cannot be quickly and effectively gathered, deep mining of relations among multi-source data cannot be carried out, and the phenomenon of data island still exists.
1. The existing conventional treatment means is mature for the data of a single service, but when facing the whole city, various data cannot be organically combined for unified treatment, and the value of the data cannot be utilized by mining to the greatest extent naturally;
2. the existing data convergence scheme is usually focused on single type of data or one type of data in a certain service direction, and the convergence system can rapidly and effectively collect and converge urban multi-source big data, and can well cope with the problems of various data acquisition modes, complex data formats, huge data volume and the like;
3. the conventional storage means has a thick barrier between databases, the connection between multi-source data is difficult to realize from physics, and the data is well managed and applied naturally. From the perspective of data, the entity model and the knowledge graph modeling are constructed, the entity and the DIKW four-library are abstracted, the urban space big data system is formed, and the storage of the data is jointly optimized from the theory and the physics, so that powerful support is provided for data management and application.
Disclosure of Invention
The invention aims to provide a smart city multi-source data aggregation storage system, which is used for solving the problems that various data cannot be organically combined and treated uniformly and the value of the data cannot be utilized to the greatest extent in the prior art; focusing on a single type of data or a type of data in a certain business direction; there is no way to manage and apply data well.
In order to solve the technical problems, the invention adopts the following technical scheme:
a multi-source data convergence storage system of a smart city comprises a data convergence system and a data storage system; the data aggregation system is used for extracting information of data from a data source, cleaning the data, correcting errors and converting the structure, and loading data meeting the standard into the whole process of the database; the data aggregation system comprises a conversion constructor module, a task constructor module, a data conversion and task processing module, a database connection management module, a server and cluster management module, a resource library management module and a system monitoring module;
the data storage system comprises a solid model and a DIKW model; the entity model comprises a basic entity and a management entity;
the aggregate storage system comprises the following steps:
step S1, acquiring a data source;
step S2, classifying and judging source data through a database connection management module, a server, a cluster management module and a data conversion and task processing module in the data aggregation system; if the data is the original data, the data processing is required to be carried out on the original data; processing the original data into standard data, and then carrying out data warehousing; if the data is standard data, directly carrying out data warehouse entry;
Step S3, marking data of the completed data into a base are used for a base;
s4, after passing through the base library, modeling the knowledge graph of the data;
s5, data abstraction for completing knowledge graph modeling is an entity library, an index library and a knowledge library;
and S6, calling the data application in the entity library, the index library and the knowledge library.
According to the technical scheme, the conversion builder module is used for carrying out corresponding processing and operation on the data record in the data extraction, conversion and loading stages; the conversion builder module specifically includes: reading the file, filtering the output row, cleaning the data, and loading the data into a database. According to the technical scheme, the task builder module comprises a plurality of conversion builder modules, and the task builder module is used for executing complete data aggregation tasks.
According to the technical scheme, the data conversion and task processing module is used for different stages of big data extraction, conversion and loading, and rapidly designs and maintains a complex workflow of data extraction, conversion and loading; the data conversion and task processing module comprises data input, data output, data processing, data inspection and general operation;
the data input is used for accessing different types of data sources and inputting the data sources into a data stream of the convergence system; the data output is used for outputting the data flow of the convergence system to a designated position in a certain format; the data processing is used for correspondingly processing the imported data, converting the imported data into a form required by a user and outputting the form to the next step; the data verification is used to verify the rows or fields on some computational basis so that they ensure the consistency of the data;
The universal operation is mainly used for supporting the extraction and cleaning of the data flow by the natural resource big data intelligent management convergence system; thereby obtaining data that meets the expectations of the user.
According to the technical scheme, the database connection management module is used for configuring the convergence system;
setting corresponding access modes and connection parameters according to the selected database types, wherein the access modes comprise setting information such as database names, ports, user name passwords and the like, and selecting available access modes in a list;
the supported database is configured to comprise the name of database connection, the data type of connection and the access mode; wherein the database type is a database type selected from a database list to be connected, comprising: oracle, mySQL, MS Access, MS SQL Server, postgreSQL, IBM DB2, sybase; and setting corresponding access modes and connection parameters according to the selected database types, wherein the access modes comprise setting information such as database names, ports, user name passwords and the like, and selecting available access modes in a list.
According to the technical scheme, the server and the cluster management module are used for accelerating the execution speed and providing support when the server is down;
The server and cluster management module comprises a master server and a plurality of slave servers; a master server and a plurality of slave servers form a master-slave structure; the master does not process specific tasks and is only responsible for distributing tasks and collecting operation results; after the master gate node receives the request, the task is divided into a plurality of parts and is delivered to the slave gate for execution, after the slave is executed, the result is delivered to the master for summarization, and then the master returns the result.
According to the technical scheme, the resource library management module is used for storing all information of extraction, conversion and loading in the relational database and creating by newly creating a database connection.
According to the technical scheme, the system monitoring module is used for outputting feedback information of the operation process when the conversion or task is executed.
According to the technical scheme, the entity model specifically comprises the following steps:
a1, creating a conceptual model for an objective object by objectifying the object or phenomenon in the real world and classifying and abstracting related objects with common characteristics to form conceptual cognition;
step A2, based on the cognition and knowledge of space, after understanding the real world through conceptualization, providing basis for understanding related concepts and analyzing the structure, and combining the theory related to the cognition and modeling of geographic space, further abstracting, decomposing and refining the concept model to form a logic data model; a world reflecting full spatial cognition;
And step A3, finally, converting the real world into a form which can be recognized by a machine, and recording data and attributes of the entity model in a specific physical storage model in the computer so as to reflect, express and recognize the real world.
According to the technical scheme, the DIKW model comprises a basic library, an index library, a knowledge library and a model library; the base library is the base of a city big data system and comprises entity class data and scene class data; the entity class data comprises objectified data and non-objectified data; the objectified data comprises houses, roads, enterprises and people; the non-objectified data comprises a remote sensing image and a three-dimensional street view;
the index library is based on an entity library, and urban images are engraved from different dimensions through various statistical data and index data, so that urban signs are expressed periodically, and urban information can be quantized and objectively expressed;
the knowledge base is a digital expression of business rules and industry experience, is expressed as a city business rule base, is used for expressing city operation rules and provides support for dynamic prediction decisions of city application;
the model library is an industry model assembly library facing the field and is a comprehensive application to an entity library, an index library and a knowledge library; aiming at specific problems and urban demands of cities, the entity and index data can be deeply processed according to the demands by combining corresponding algorithms, the current urban signs are expressed in a more digital mode, and the prediction decision result is obtained more objectively by combining business judgment logic provided by a knowledge base.
Compared with the prior art, the invention has the following beneficial effects:
1. the convergence of multi-source data, including access of a service system and access of various original data, the existing data management system can not deal with a plurality of data sources, and the problems of acquisition modes, data formats, data storage modes, data quantity and the like all raise questions on the robustness degree of the system; the multisource convergence system built in the invention well solves the problems by means of conversion, task processing, database connection management, server and cluster configuration and the like, and the initial stage of data management is greatly optimized, so that data can be continuously converged into the system.
2. The existing conventional database is stored and cannot be applied to deep data; according to the invention, through the construction of the entity library and the knowledge graph modeling, and the construction of the entity library, the index library, the model library and the knowledge library, a full-space and multi-level urban space-time big data system is formed, and the values of basic data materialization, strategic data indication, operation rule visualization, cognitive reasoning intellectualization and deeper mining of data are realized.
3. The data aggregation storage method based on the middle platform technology establishes a solid data base for smart city construction, and realizes the mining and utilization of the data value to a greater extent.
Drawings
FIG. 1 is a block diagram of a system of the present invention;
FIG. 2 is a block diagram of a data aggregation system according to the present invention;
FIG. 3 is a block diagram of a DIKW system of the present invention;
FIG. 4 is a diagram of the physical model logic structure of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, a smart city multi-source data aggregation storage system includes a data aggregation system and a data storage system; the data aggregation system is used for extracting information of data from a data source, cleaning the data, correcting errors and converting the structure, and loading data meeting the standard into the whole process of the database; the data aggregation system comprises a conversion constructor module, a task constructor module, a data conversion and task processing module, a database connection management module, a server and cluster management module, a resource library management module and a system monitoring module;
Further, the data meeting the standard specifically includes: the data processed by the multi-source data aggregation storage system is in accordance with the standard.
Further, the database includes data types of: a file, a relational database, a GIS database or a network service platform.
The data storage system comprises a solid model and a DIKW model; the entity model comprises a basic entity and a management entity;
specifically, the DIKW model is a solid modeling theoretical method, and is used for constructing and supporting model libraries of smart city entities, indexes, models and the like, so as to support the operation of smart cities.
The entity model is a theoretical method for constructing a smart city entity library; the method has the function of associating real world entities (such as physical basic entities actually existing in urban space, such as rivers, roads, buildings, offices and the like, and abstract management entities closely related to urban business, such as companies, departments, legal persons and the like) with attribute information, business processes and the like; meanwhile, the entity is connected with the entity through a DIKW model, so that abstract modeling and digital storage are carried out on the whole city.
The aggregate storage system comprises the following steps:
step S1, acquiring a data source;
step S2, classifying and judging source data through a database connection management module, a server, a cluster management module and a data conversion and task processing module in the data aggregation system; if the data is the original data, the data processing is required to be carried out on the original data; processing the original data into standard data, and then carrying out data warehousing; if the data is standard data, directly carrying out data warehouse entry;
Step S3, marking data of the completed data into a base are used for a base;
s4, after passing through the base library, modeling the knowledge graph of the data;
s5, data for completing knowledge graph modeling are input into an entity library;
and S6, calling the data application in the entity library.
1. The convergence of multi-source data, including access of a service system and access of various original data, the existing data management system can not deal with a plurality of data sources, and the problems of acquisition modes, data formats, data storage modes, data quantity and the like all raise questions on the robustness degree of the system; the multisource convergence system built in the invention well solves the problems by means of conversion, task processing, database connection management, server and cluster configuration and the like, and the initial stage of data management is greatly optimized, so that data can be continuously converged into the system.
2. The existing conventional database is stored and cannot be applied to deep data; according to the invention, through the construction of the entity library and the knowledge graph modeling, and the construction of the entity library, the index library, the model library and the knowledge library, a full-space and multi-level urban space-time big data system is formed, and the values of basic data materialization, strategic data indication, operation rule visualization, cognitive reasoning intellectualization and deeper mining of data are realized.
3. The data aggregation storage method based on the middle platform technology establishes a solid data base for smart city construction, and realizes the mining and utilization of the data value to a greater extent.
Example two
This embodiment is a further refinement of embodiment one.
The conversion constructor module is used for carrying out corresponding processing and operation on the data record in the data extraction, conversion and loading stages.
The conversion builder module specifically includes: read file, filter output row, data cleansing or load data into a database. The steps in the conversion are connected by node connections defining a unidirectional path allowing data to flow from one step to another, the data formats supported by the conversion process include gdb, csv, mdb, shp, xml, excel, txt, etc.
The task builder module includes a plurality of transformation builder modules for performing a complete data convergence task.
Specifically, the complete data convergence task is composed of a plurality of task items and conversion items. Because the conversion is performed in parallel, a task that can be performed in series is required to process the operations, and a task includes a plurality of task items and conversion items, which are performed in an order determined by the node connection of the conversion items and the execution result of each task item.
The data conversion and task processing module is used for different stages of big data extraction, conversion and loading, and rapidly designing and maintaining a complex workflow of data extraction, conversion and loading; the data conversion and task processing module comprises data input, data output, data processing, data inspection and general operation;
the conversion construction and task construction module is a module tool for realizing specific data aggregation, the data conversion and task processing module is an editable graphical interface tool for enabling a user to visually operate the two module tools, and the whole aggregation task is presented;
firstly, selecting corresponding conversion construction and task construction which are already set, creating a workflow through a graphical interface, and then executing the workflow to finish data aggregation.
The data input is used for accessing different types of data sources and inputting the data sources into a data stream of the convergence system; the data output is used for outputting the data flow of the convergence system to a designated position in a certain format, such as CSV, DAT, DBF, MDB, ODB ++ which is usually supported by a database in a certain format; the specified location refers to a specified database;
the data processing is used for carrying out corresponding processing on the imported data, namely completing conversion construction and task construction workflow through a graphical interface tool, converting the data into a form required by a user, (usually a form required by a user platform end, such as MDB) and outputting the data to the next step; the data check is used for verifying rows or fields (such as verifying whether the space data has complete attribute information, verifying whether the data in the compressed package is complete, verifying whether the data table head of the database is complete and the format is correct, etc.) on the basis of some calculation so as to ensure the consistency of the data;
The universal operation is mainly used for supporting the extraction and cleaning of the data flow by the natural resource big data intelligent management convergence system; thereby obtaining data that meets the expectations of the user.
The database connection management module is used for configuring a database supported by the convergence system, and the configuration comprises the name of database connection, the data type of connection and the access mode; wherein the database type is a database type selected from a database list to be connected, comprising: oracle, mySQL, MS Access, MS SQL Server, postgreSQL, IBM DB2, sybase; and setting corresponding access modes and connection parameters according to the selected database types, wherein the access modes comprise setting information such as database names, ports, user name passwords and the like, and selecting available access modes in a list.
The server and the cluster management module are used for accelerating the execution speed and providing support when the server is down;
the server and cluster management module comprises a master server and a plurality of slave servers; a master server and a plurality of slave servers form a master-slave structure; the master does not process specific tasks and is only responsible for distributing tasks and collecting operation results; after the master gate node receives the request, the task is divided into a plurality of parts and is delivered to the slave gate for execution, after the slave is executed, the result is delivered to the master for summarization, and then the master returns the result.
The resource base management module is used for storing all the information of extraction, conversion and loading in the relational database and creating by creating a database connection.
The system monitoring module is used for outputting feedback information of the running process when the conversion or task is executed.
The entity model is specifically as follows:
a1, creating a conceptual model for an objective object by objectifying the object or phenomenon in the real world and classifying and abstracting related objects with common characteristics to form conceptual cognition;
step A2, based on the cognition and knowledge of space, after understanding the real world through conceptualization, providing basis for understanding related concepts and analyzing the structure, and combining the theory related to the cognition and modeling of geographic space, further abstracting, decomposing and refining the concept model to form a logic data model; a world reflecting full spatial cognition;
and step A3, finally, converting the real world into a form which can be recognized by a machine, and recording data and attributes of the entity model in a specific physical storage model in the computer so as to reflect, express and recognize the real world.
The DIKW model comprises a basic library, an index library, a knowledge library and a model library; the base library is the base of a city big data system and comprises entity class data and scene class data; the entity class data comprises objectified data and non-objectified data; the objectified data comprises houses, roads, enterprises and people; the non-objectified data comprises a remote sensing image and a three-dimensional street view;
The index library is based on an entity library, and urban images are engraved from different dimensions through various statistical data and index data, so that urban signs are expressed periodically, and urban information can be quantized and objectively expressed;
the knowledge base is a digital expression of business rules and industry experience, is expressed as a city business rule base, is used for expressing city operation rules and provides support for dynamic prediction decisions of city application;
the model library is an industry model assembly library facing the field and is a comprehensive application to an entity library, an index library and a knowledge library; aiming at specific problems and urban demands of cities, the entity and index data can be deeply processed according to the demands by combining corresponding algorithms, the current urban signs are expressed in a more digital mode, and the prediction decision result is obtained more objectively by combining business judgment logic provided by a knowledge base.
The entities are divided into basic entities and management entities: the basic entity data is entity data (such as mountain, river, building, vehicle and the like) extracted from space-time basic data, urban running state sensing and other source data; the management entity data refers to entity data (such as company, legal person, property and commission office) extracted from urban business domain and demand to meet existing and future business.
In the application of smart city business, taking building entity as an example, the smart city can relate to the business of the whole process of early planning, preliminary design, building design, construction and property management, and the related data almost comprise space vector shp data, three-dimensional skp white film, engineering BIM data, three-dimensional inclination and the like; the entity model takes full life cycle service as a guide, and defines and manages the original data related to each entity through the knowledge graph model, so that the original database is associated and recombined, and the service application requirement is met.
All entities should contain entity identification tables and attribute tables in other aspects, the identification tables share one entity code with other attribute tables, and the entity code is used for constructing the internal association relationship of the entities. The association relationship between entities is expressed by establishing an association between entity identification tables. The entity modeling step firstly defines various entities according to smart city business, designs an identification table, determines original data and the original table related to the entities, establishes a relation between corresponding various space data, file data, structured data and other data according to a model through entity coding and the identification table, and forms an entity model library.
Example III
The invention is characterized in that:
1. and (3) gathering: data aggregation system
The multi-source data convergence system establishes and updates various data through a working method based on metadata. And supporting information extraction, data cleaning, error correction and structure conversion from a data source, and loading data meeting certain standards into the whole processes of a file, a relational database, a GIS database or a network service platform and the like. The system builds data gathering and processing tasks through a visual platform tool, and solves the problems of complex and massive data automation processing, data format conversion, database input, migration and the like.
As shown in fig. 2, the convergence system mainly comprises a conversion constructor, a task constructor, conversion and task processing, database connection management, server and cluster configuration, resource library management, system operation and maintenance monitoring and other modules:
1) A conversion construction module; the conversion is a basic function of the data convergence system, and the whole process from the input to the processing and output of the data is constructed by utilizing various conversion tool sets provided by the system aiming at different data sources and input modes. The transformation builder provides a visual transformation design interface. The conversion performs corresponding processing and operation on the data records in each stage of data extraction, conversion and loading, and the conversion comprises one or more steps, such as reading files, filtering output lines, cleaning data or loading data into a database. The steps in the conversion are connected by node connections defining a unidirectional path allowing data to flow from one step to another, the data formats supported by the conversion process include gdb, csv, mdb, shp, xml, excel, txt, etc.
2) A task builder module; the task is composed of a plurality of conversion constructions, can execute complete data convergence tasks, and the task constructor provides a visual task design interface. Because the conversion is performed in parallel, a task that can be performed in series is required to process the operations, and a task includes a plurality of tasks and conversion items, which are performed in an order determined by the node connection of the conversion items and the execution result of each task item. The main components of the task comprise task items and node connection among the tasks, including multi-path execution, backtracking, parallel execution and other modes, including task item results and the like.
3) A data conversion and task processing tool set; the graphical interface tool is used for different stages of big data extraction, conversion and loading, and is used for rapidly designing and maintaining a complex workflow of data extraction, conversion and loading. The graphical interface tool provides a graphical user interface for creating/editing tasks or transformations, and may also be used as execution/debugging tasks or transformations. The method is divided according to the division of functions and mainly comprises the following types of data input, data output, data processing, data inspection, general operation and the like.
The data input realizes access to different types of data sources and is input into the data flow of the convergence system; the data output realizes that the data flow of the convergence system is output to a designated position in a certain format; the data processing realizes that the imported data is correspondingly processed to be converted into a form required by a user and output to the next step; the data verification implementation validates the rows or fields on a computational basis so that they ensure consistency of the data.
The general operation tool set mainly completes the operations of character string replacement, cutting, case conversion, sorting, judging whether files and folders exist, path acquisition, file reading and writing, copying, moving and the like, and the operations of database table query, connection, merging, sorting, filtering, SQL script execution and the like. The part of the tool set can effectively support the extraction and cleaning of the data flow by the natural resource big data intelligent management convergence system, so that the data meeting the user expectation is obtained.
4) A database connection management module; the data fusion and convergence conversion and task can process the data in the relational database, or save the processing result to the relational database, save the connection to the database through database connection management, and select the corresponding database connection when the task or conversion needs to be used.
The database connection is mainly used for configuring databases supported by the convergence system, and comprises names of the database connection, data types of the connection, access modes and the like. The name of the connection sets a unique name within the scope of the job or conversion. The database type is a database type selected from a database list to be connected, and the supported database comprises: oracle, mySQL, MS Access, MS SQL Server, postgreSQL, IBM DB2, sybase, etc., and according to the selected database type, setting corresponding Access modes and connection parameters, wherein the Access modes comprise setting information such as database name, port, user name password, etc., and selecting available Access modes in a list.
5) A server and a cluster management module; the cluster can accelerate the execution speed and can also continue to be tried under the condition that part of servers are down. The cluster environment is composed of a master server and a plurality of slave servers, is similar to a master-slave structure, and is different in that the master does not process specific tasks and is only responsible for distributing and collecting operation results of the tasks. After receiving the request, the Master gate node divides the task into a plurality of parts and gives the parts to the slave gate for execution, and after the slave is executed, the result is given to the mater for summarization, and then the Master returns the result. The clusters have the following advantages: 1. the multi-server operation accelerates the processing speed, and is more obvious for the operation of large data volume. 2. And the server can prevent single-point failure, and other servers can also operate after one server fails.
6) A resource library management module; the resource library is a different kind of system operation supporting library defined by different personnel in the space-time data aggregation system, and the basic elements are the same, including the same user interface and storing the same metadata. Common repositories include database repositories and file repositories. The database resource library is to store all the information of extraction, conversion and loading in the relational database, and can be created by only creating one database connection, and a database resource library dialog box can be used for creating tables and indexes in the resource library. The file resource library is defined under a file directory, and the file directory comprises zip files, web services, FTP services and the like.
7) A system monitoring module; the monitoring module mainly provides a log function, and the feedback information of the running process is output when the conversion or task execution is realized, so that the log has great use for monitoring programs and debugging.
2. And (3) storing: data storage system
The storage system performs entity modeling and entity data extraction on various types of data except for a conventional original database, associates the various types of data by taking an entity as a center, and constructs an association relationship between the entity and the entity to form an entity database; and the Knowledge graph modeling is also adopted, and a DIKW pyramid hierarchy system is combined with a space Information cognition and decision system for the full life cycle service of the smart city, so that a 'DIKW' model integrating Data aggregation (Data), information analysis (Information), knowledge discovery (knowledges) and smart service (Wisdom) is formed, and the Data is made into assets to support subsequent other intelligent application.
A solid model; the entity data is divided into two main categories: a base entity and a management entity. The basic entity refers to an entity object which is physically existing in the urban space and can be uniquely identified, and is derived from space-time basic data, management data, urban running state sensing and other source data; the system is divided into 9 large entity types such as a zone, a land block, a building, a traffic facility, municipal facilities, public facilities, greenbelts and squares, water systems and facilities, sensing facilities and the like. The management entity mainly refers to basic elements such as people, events and the like, and the data is derived from business data of all business systems operated in the city.
The design thought of the entity model is as follows: by objectifying things or phenomena in the real world, classifying and abstracting related objects with common characteristics to form conceptual cognition, and establishing a conceptual model for objective things; based on the cognitive experience and knowledge of the space, after the real world of the complex chaos is understood through conceptualization, a basis is provided for the understanding and structural analysis of related concepts, and a conceptual model is further abstracted, decomposed and refined by combining the theory related to the geospatial cognition and modeling to form a logic data model which reflects the world of the whole space cognition; finally, the real world is converted into a form which can be recognized by a machine, and the data and the attributes of the entity model are recorded in a specific physical storage model in the computer, so that the real world is reflected, expressed and perceived. The cognition thought and model of the geographic entity are specifically as follows:
The basic entity data is entity data extracted from space-time basic data, management data, intelligent city running state sensing and other source data; the management entity data refers to entity data extracted from urban business domains and demands to meet the existing and future business. All entities should contain entity identification tables and attribute tables in other aspects, the identification tables share one entity code with other attribute tables, and the entity code is used for constructing the internal association relationship of the entities. The association relationship between entities is expressed by establishing an association between entity identification tables.
The data entity is a basic unit of the whole space and geographic world, and the whole space information model is used for abstracting and modeling the real world to form a model for describing, expressing and managing various space-time entity information in the dynamic and complex real world (from micro to macro). Therefore, the design of the data entity model has the following characteristics:
1) Full scale features: breaks through the geographical space category of the traditional GIS with the earth as a reference, and expands the scale of the space information to microscopic and macroscopic spaces.
2) Full-class features: various tangible and intangible types of entities in the real world can be described and expressed, including static and dynamic objects, phenomena, processes, events, and the like.
3) Full dynamic characteristics: each entity in the data world constructed and described by the full spatial information model may be dynamic (including location, attributes, morphology, behavior, etc.) and have a lifecycle of its existence.
4) Full attribute feature: the multi-element characteristics of the entity such as time, space, morphology, property, relationship, cognition, behavior and the like need to be comprehensively described and expressed;
according to the requirements, the design of the entity model takes the object identification table as a core, establishes the association relationship between the entity and the BIM metadata table, the object relationship table, the basic information table, the object space relationship table, the object supplementary information table and the space data table, and establishes the association relationship between the entity identification tables. Full-space information data modeling of full scale, full type, full dynamic and full attribute is realized; the solid model logic structure is shown in fig. 4.
A DIKW model; based on a DIKW model, four libraries of entities, indexes, models and knowledge (shown in figure 3) are constructed to form a full-space and multi-level urban big data system, so as to realize the materialization of basic data, the indication of strategic data, the visualization of operation rules and the intellectualization of cognitive reasoning, and the value of platform data is deeply mined, and the method is described in detail as follows:
1) A base library; the basic library is the basis of a city big data system and is composed of entity class data and scene class data, wherein the entity class data comprises objectified data of houses, roads, enterprises, population and the like, and the scene class data covers non-objectified data of remote sensing images, three dimensions, street views and the like.
2) An index library; the index library is based on the entity library, and can be used for more quantitatively and objectively expressing the urban information by engraving urban portraits from different dimensions through various statistical data and index data and periodically expressing the urban signs.
3) A knowledge base; the knowledge base is a digital expression of business rules and industry experience, is expressed as a city business rule base, can more objectively express city operation rules, and provides support for dynamic prediction decisions of city application.
4) A model library; the model library is an industry model assembly library facing the field and is a comprehensive application to an entity library, an index library and a knowledge library. Aiming at specific problems and urban demands of cities, the entity and index data can be deeply processed according to the demands by combining corresponding algorithms, the current urban signs are expressed in a more digital mode, and the prediction decision result is obtained more objectively by combining business judgment logic provided by a knowledge base.
The multi-source data aggregation system provided by the invention can effectively collect and aggregate various data related to smart city construction, and solves the problems of multiple data sources, complex formats, multiple aggregation means, large data volume and the like.
The invention provides entity modeling and entity data extraction, which carries out materialization on various types of data, correlates the various types of data by taking an entity as a center, constructs the association relationship between the entity and the entity to form an entity database, and assets the data to support other follow-up intelligent application.
The knowledge graph modeling method and the four libraries of entity, index, model and knowledge are built, so that a full-space and multi-level urban big data system is formed, the materialization of basic data, the indication of strategic data, the visualization of operation rules and the intellectualization of cognitive reasoning are realized, and the value of data can be deeply mined.
Noun interpretation:
the Knowledge Graph (knowledgegraph) is an important supporting technology for data mining and Knowledge reasoning in the times of big data and artificial intelligence, and is originally originated from book information boundaries, and is widely applied to the fields of Internet search engines, electronic commerce intelligent recommendation, military information analysis and the like at present. In order to describe and analyze the city in which we are, it is highly necessary to construct a knowledge graph between various objects in the city to better support the application of the upper-layer intelligent decision class. More than 80% of various information of the city is related to the space position, so that the participation of geographic entities is not separated in the process of constructing the city knowledge graph, and the city space-time information knowledge graph based on the geographic entities is finally constructed.
The DIKW system is a system of data, information, knowledge and intelligence. Each layer giving certain characteristics than the next. The data layer is the most basic. The information layer adds content. The knowledge layer adds "how to use", and the intelligent layer adds "when to use". Thus, the DIKW system is a model to let us know the limits of analysis, importance and conceptual work. DIKW systems are commonly used for information science and knowledge management.
The geographic entity is a natural geographic unit and an artificial facility which occupy a certain spatial position in the real world and have the same attribute or complete function, and mainly comprises basic geographic elements such as a political region, a road, a river, a green land, a building and the like.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A smart city multi-source data aggregation storage system, characterized in that: the system comprises a data aggregation system and a data storage system; the data aggregation system is used for extracting information of data from a data source, cleaning the data, correcting errors and converting the structure, and loading data meeting the standard into the whole process of the database; the data aggregation system comprises a conversion constructor module, a task constructor module, a data conversion and task processing module, a database connection management module, a server and cluster management module, a resource library management module and a system monitoring module;
the conversion constructor module is used for carrying out corresponding processing and operation on the data record in the data extraction, conversion and loading stages; the task builder module is used for executing a complete data convergence task; the data conversion and task processing module is used for different stages of big data extraction, conversion and loading, and rapidly designing and maintaining a complex workflow of data extraction, conversion and loading; the database connection management module is used for configuring the convergence system; the server and the cluster management module are used for accelerating the execution speed and providing support when the server is down; the resource library management module is used for storing all the information of extraction, conversion and loading in the relational database and creating by creating a database connection; the system monitoring module is used for outputting feedback information of the running process during conversion or task execution;
The data storage system comprises a solid model and a DIKW model; the entity model comprises a basic entity and a management entity;
the entity model is established by the following specific steps:
a1, creating a conceptual model for an objective object by objectifying the object or phenomenon in the real world and classifying and abstracting related objects with common characteristics to form conceptual cognition;
step A2, based on the cognition and knowledge of space, after understanding the real world through conceptualization, providing basis for understanding related concepts and analyzing the structure, and combining the theory related to the cognition and modeling of geographic space, further abstracting, decomposing and refining the concept model to form a logic data model; a world reflecting full spatial cognition;
step A3, finally, converting the real world into a form which can be identified by a machine, recording data and attributes of the entity model in a specific physical storage model in a computer, so as to reflect, express and cognize the real world;
the DIKW model comprises a basic library, an index library, a knowledge library and a model library; the base library is the base of a city big data system and comprises entity class data and scene class data; the entity class data comprises objectified data and non-objectified data; the objectified data comprises houses, roads, enterprises and people; the non-objectified data comprises a remote sensing image and a three-dimensional street view;
The index library is based on an entity library, and urban images are engraved from different dimensions through various statistical data and index data, so that urban signs are expressed periodically, and urban information can be quantized and objectively expressed;
the knowledge base is a digital expression of business rules and industry experience, is expressed as a city business rule base, is used for expressing city operation rules and provides support for dynamic prediction decisions of city application;
the model library is an industry model assembly library facing the field and is a comprehensive application to an entity library, an index library and a knowledge library;
the aggregate storage system comprises the following steps:
step S1, acquiring a data source;
step S2, classifying and judging source data through a database connection management module, a server, a cluster management module and a data conversion and task processing module in the data aggregation system; if the data is the original data, the data processing is required to be carried out on the original data; processing the original data into standard data, and then carrying out data warehousing; if the data is standard data, directly carrying out data warehouse entry;
step S3, standard data of the completed data warehouse entry is used for a basic warehouse;
s4, after passing through the base library, modeling the knowledge graph of the data;
S5, data abstraction for completing knowledge graph modeling is an entity library, an index library and a knowledge library;
and S6, calling the data application in the entity library, the index library and the knowledge library.
2. A smart city multi-source data collection and storage system as recited in claim 1, wherein: the conversion builder module specifically includes: reading the file, filtering the output row, cleaning the data, and loading the data into a database.
3. A smart city multi-source data collection storage system as claimed in claim 2, wherein: the task builder module includes a plurality of conversion builder modules.
4. A smart city multi-source data collection and storage system as recited in claim 1, wherein: the data conversion and task processing module comprises data input, data output, data processing, data inspection and general operation;
the data input is used for accessing different types of data sources and inputting the data sources into a data stream of the convergence system; the data output is used for outputting the data flow of the convergence system to a designated position in a certain format; the data processing is used for correspondingly processing the imported data, converting the imported data into a form required by a user and outputting the form to the next step; data verification is used to verify a row or field on a computational basis so that it ensures consistency of the data.
5. A smart city multi-source data collection and storage system as recited in claim 1, wherein: setting corresponding access modes and connection parameters according to the selected database type, wherein the access modes comprise setting database names, ports and user name password information, and selecting available access modes in a list;
the supported database is configured to comprise the name of database connection, the data type of connection and the access mode; wherein the database type is a database type selected from a database list to be connected, comprising: oracle, mySQL, MS Access, MS SQL Server, postgreSQL, IBM DB2, sybase; and setting a corresponding access mode and connection parameters according to the selected database type, wherein the access mode comprises setting the database name, the port and the user name password information.
6. A smart city multi-source data collection and storage system as recited in claim 1, wherein:
the server and cluster management module comprises a master server and a plurality of slave servers; a master server and a plurality of slave servers form a master-slave structure; the master does not process specific tasks and is only responsible for distributing tasks and collecting operation results; after the master gate node receives the request, the task is divided into a plurality of parts and is delivered to the slave gate for execution, after the slave is executed, the result is delivered to the master for summarization, and then the master returns the result.
CN202311330365.6A 2023-10-16 2023-10-16 Multi-source data aggregation storage system for smart city Active CN117076463B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311330365.6A CN117076463B (en) 2023-10-16 2023-10-16 Multi-source data aggregation storage system for smart city

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311330365.6A CN117076463B (en) 2023-10-16 2023-10-16 Multi-source data aggregation storage system for smart city

Publications (2)

Publication Number Publication Date
CN117076463A CN117076463A (en) 2023-11-17
CN117076463B true CN117076463B (en) 2023-12-29

Family

ID=88706400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311330365.6A Active CN117076463B (en) 2023-10-16 2023-10-16 Multi-source data aggregation storage system for smart city

Country Status (1)

Country Link
CN (1) CN117076463B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106855962A (en) * 2015-12-09 2017-06-16 星际空间(天津)科技发展有限公司 A kind of method for building government affairs big data platform
CN111178742A (en) * 2019-12-25 2020-05-19 北京高诚科技发展有限公司 Comprehensive traffic cooperative operation system and method based on multi-level index system
CN111885643A (en) * 2020-07-11 2020-11-03 佛山市海协科技有限公司 Multi-source heterogeneous data fusion method applied to smart city
CN111935124A (en) * 2020-08-04 2020-11-13 佛山市海协科技有限公司 Multi-source heterogeneous data compression method applied to smart city
CN113220911A (en) * 2021-05-25 2021-08-06 中国农业科学院农业信息研究所 Agricultural multi-source heterogeneous data analysis and mining method and application thereof
CN114780733A (en) * 2021-12-31 2022-07-22 海南大学 DIKW atlas-based intelligent patent modification method, auxiliary response method and system
CN116089625A (en) * 2022-12-14 2023-05-09 海南电网有限责任公司三亚供电局 Cable channel panorama management and control system based on multidimensional information perception
CN116431824A (en) * 2023-03-31 2023-07-14 海口金政信息科技有限公司 River basin water environment problem expert consultation system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11100483B2 (en) * 2017-09-29 2021-08-24 Intel Corporation Hierarchical data information
US10776337B2 (en) * 2018-07-06 2020-09-15 International Business Machines Corporation Multi-dimensional knowledge index and application thereof

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106855962A (en) * 2015-12-09 2017-06-16 星际空间(天津)科技发展有限公司 A kind of method for building government affairs big data platform
CN111178742A (en) * 2019-12-25 2020-05-19 北京高诚科技发展有限公司 Comprehensive traffic cooperative operation system and method based on multi-level index system
CN111885643A (en) * 2020-07-11 2020-11-03 佛山市海协科技有限公司 Multi-source heterogeneous data fusion method applied to smart city
CN111935124A (en) * 2020-08-04 2020-11-13 佛山市海协科技有限公司 Multi-source heterogeneous data compression method applied to smart city
CN113220911A (en) * 2021-05-25 2021-08-06 中国农业科学院农业信息研究所 Agricultural multi-source heterogeneous data analysis and mining method and application thereof
CN114780733A (en) * 2021-12-31 2022-07-22 海南大学 DIKW atlas-based intelligent patent modification method, auxiliary response method and system
CN116089625A (en) * 2022-12-14 2023-05-09 海南电网有限责任公司三亚供电局 Cable channel panorama management and control system based on multidimensional information perception
CN116431824A (en) * 2023-03-31 2023-07-14 海口金政信息科技有限公司 River basin water environment problem expert consultation system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Understanding the potential of emerging digital technologies for improving road safety;Mehran Eskandari Torbaghan et al.;《Accident Analysis & Prevention》;1-23 *
全空间信息***的空间认知模型研究;方成 等;《测绘与空间地理信息》;61-67 *
基于Kettle集群的ETL管理***的设计与实现;张懿;《中国优秀硕士学位论文全文数据库 信息科技辑》;I138-906 *

Also Published As

Publication number Publication date
CN117076463A (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN110347719B (en) Enterprise foreign trade risk early warning method and system based on big data
CN110866123B (en) Method for constructing data map based on data model and system for constructing data map
CN110781236A (en) Method for constructing government affair big data management system
Hor et al. A semantic graph database for BIM-GIS integrated information model for an intelligent urban mobility web application
CN108964996B (en) Urban and rural integrated information grid system and information sharing method based on same
CN110059150A (en) Hydraulic engineering Digital Archives System based on BIM+GIS
CN113434623B (en) Fusion method based on multi-source heterogeneous space planning data
CN105469204A (en) Reassembling manufacturing enterprise integrated evaluation system based on deeply integrated big data analysis technology
CN109542967A (en) Smart city data-sharing systems and method based on XBRL standard
Zhang et al. Research on the integration of heterogeneous information resources in university management informatization based on data mining algorithms
WO2023124191A1 (en) Depth map matching-based automatic classification method and system for medical data elements
CN105808853A (en) Engineering application oriented body establishment management and body data automatic obtaining method
CN112181960A (en) Intelligent operation and maintenance framework system based on AIOps
CN112699100A (en) Management and analysis system based on metadata
CN113987626A (en) Extensible building full life BIM modeling method
CN109710667A (en) A kind of shared realization method and system of the multisource data fusion based on big data platform
CN115438199A (en) Knowledge platform system based on smart city scene data middling platform technology
CN110598074A (en) Method and system for organizing and managing uniform resources related to scientific and technological consultation big data
CN113254517A (en) Service providing method based on internet big data
CN110990907B (en) Feature-resource knowledge-based three-level optimization method for manufacturability of marine diesel engine heavy parts
CN117076463B (en) Multi-source data aggregation storage system for smart city
CN111784192A (en) Industrial park emergency plan executable system based on dynamic evolution
Niu Optimization of teaching management system based on association rules algorithm
CN112860653A (en) Government affair information resource catalog management method and system
CN115858498A (en) Five-dimensional space-time distributed database construction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant