CN113282692A - Big data sharing method and device for smart city - Google Patents

Big data sharing method and device for smart city Download PDF

Info

Publication number
CN113282692A
CN113282692A CN202110561578.4A CN202110561578A CN113282692A CN 113282692 A CN113282692 A CN 113282692A CN 202110561578 A CN202110561578 A CN 202110561578A CN 113282692 A CN113282692 A CN 113282692A
Authority
CN
China
Prior art keywords
data
structured
lake
smart city
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110561578.4A
Other languages
Chinese (zh)
Inventor
齐维潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110561578.4A priority Critical patent/CN113282692A/en
Publication of CN113282692A publication Critical patent/CN113282692A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses big data sharing method of smart city, including: acquiring space-time big data collected by a sensor, wherein the space-time big data comprises structured data and unstructured data; performing resource description framework RDF and related ontology description on the space-time big data to realize semantic level association among different space-time big data; carrying out data source marking on the correlated space-time big data, and storing the marked space-time big data into a data lake, wherein the data lake comprises a service layer and a source data layer; splitting the data lake into a plurality of subdata lakes based on different example requirements of the smart city; performing semi-structured processing on each subdata lake to generate each semi-structured smart city instance base, and storing the smart city instance base to a service layer of the data lake; and receiving a service driving request of the government affair cloud, searching the semi-structured smart city instance base associated with the service driving, and sharing the final searching result to the government affair cloud.

Description

Big data sharing method and device for smart city
Technical Field
The application relates to the technical field of data processing, in particular to a big data sharing method and device for a smart city.
Background
The City Information Modeling (CIM) is a digital expression and description of various entity targets and space-time states of the ground, underground, indoor and outdoor of a City, reflects City planning, construction, development and operation, and can be used for City planning decision, City construction, City management and other works.
CIM is a concept with a large span, and relates to industries including various industries such as planning, China and soil, traffic, water conservancy, security, civil air defense, environmental protection, cultural relic protection, energy and gas and the like and all fields related to smart cities.
At present, the CIM is lack of systematic intensive research at home and abroad, and according to the view point in the article of the City information model related technology development review under the intelligent city background, the primary analysis of the basic characteristics of the CIM from three words forming the CIM can be tried: firstly, City, wherein the CIM is to cover the City scale, the City can be instantiated as a City or a City area, a garden, a community, a courtyard, etc., but the description capability of the CIM on the modeling object should be at City level; secondly, Information is Information, Information contained in the CIM covers various space and time dimensions and can support various urban applications, and the Information in the CIM can describe various physical or human entities of the city and has the characteristics of multi-tense, multi-type, multi-granularity level, multi-source and the like; finally, Modeling, i.e., CIM, organizes, simulates, analyzes, and expresses the above information as needed based on certain rules and methods, and further, aggregates intelligence by fusing, mining, and abstracting new knowledge.
From the current development of CIM, CIM is mainly closely related to techniques such as BIM (Building Information Modeling), GIS (Geographic Information System), IOT (Internet of Things), and the like, and meanwhile, it is inevitably required to apply to new-generation Information techniques such as cloud computing and big data.
In the prior art, a CIM system needs a highly-concurrent and highly-reliable cloud system for supporting due to large data volume, and can share different information with government clouds of different functional departments, but due to diversity and complexity of data, data sharing can involve sending various redundant data to the government clouds together, for example, a traffic part only needs a congestion situation in a certain area during peak hours of work, but the received data is global 24-hour GIS data, which easily causes network congestion and data resource waste.
Disclosure of Invention
The embodiment of the application provides a big data sharing method for a smart city, which is used for solving the problems that network congestion and data resource waste are easily caused by big data sharing in a smart city scene in the prior art.
The embodiment of the invention provides a big data sharing method of a smart city, which comprises the following steps:
the method comprises the steps that a cloud server obtains space-time big data collected by a sensor, wherein the space-time big data comprises structured data and unstructured data;
performing resource description framework RDF and related ontology description on the space-time big data to realize semantic level association among different space-time big data;
carrying out data source marking on the correlated space-time big data, and storing the marked space-time big data into a data lake, wherein the data lake comprises a service layer and a source data layer;
dividing the data lake into a plurality of subdata lakes based on different example requirements of the smart city, wherein each subdata lake corresponds to each smart city example one by one;
performing semi-structured processing on each subdata lake to generate each semi-structured smart city instance base, and storing the semi-structured smart city instance base to a service layer of the data lake;
and receiving a service driving request of a government cloud, retrieving the semi-structured smart city instance base associated with the service driving, and sharing a final retrieval result to the government cloud.
Optionally, splitting the data lake into a plurality of child data lakes, including:
and splitting the data lake into the plurality of subdata lakes based on the time incidence relation, the spatial incidence relation and the semantic incidence relation, wherein the data sources of different subdata lakes are different in type.
Optionally, the retrieving the semi-structured smart city instance base associated with the business driver includes:
splitting the semi-structured smart city instance base into structured data and unstructured data;
searching the unstructured data in a full-text search and relation map mode to obtain a first search result;
searching the structured data by using a K-V relational search mode to obtain a second search result;
and merging the first retrieval result and the second retrieval result to obtain a third retrieval result, wherein the third retrieval result is the final retrieval result.
Optionally, performing semi-structured processing on each sub-data lake, including:
acquiring structured data and unstructured data of each subdata lake;
decomposing the structured data into a plurality of structured fields according to the size of 32Byte, setting the serial number of each field, the mark P and the end character E of the structured data, and forming a structured message with the plurality of structured fields;
decomposing unstructured data into a plurality of unstructured fields according to the size of 64 bytes, setting a serial number of each field, a mark M and an end character N of the unstructured data, and forming an unstructured message with the plurality of unstructured fields;
and combining the structured message and the unstructured message to generate a semi-structured message.
Optionally, performing resource description framework RDF and related ontology description on the spatio-temporal big data, including:
reading the marked data set of the space-time big data from an HDFS file system in a data lake;
splitting the data set into sub data sets, and distributing the sub data sets to different cloud nodes according to a distributed computing rule;
performing RDF and related ontology description on the subdata sets on each cloud node synchronously;
and rewriting the described result into the HDFS.
Optionally, if the smart city instance requirement is a building information model BIM, splitting the data lake into a plurality of sub-data lakes, including:
and splitting the data lake according to a data source attribution relationship, wherein the data source attribution relationship comprises a BIM energy consumption attribution, a BIM safety attribution and a BIM flow attribution.
Optionally, if the smart city instance requirement is a geographic information system GIS type, splitting the data lake into a plurality of sub-data lakes, including:
and splitting the data lake according to a geographical attribution relationship, wherein the geographical attribution relationship comprises provincial attribution, block attribution and road attribution.
Optionally, if the smart city instance requirement is an industrial internet of things (IoT) type, splitting the data lake into a plurality of sub-data lakes, including:
splitting the data lake according to the type of IoT acquisition equipment, wherein the type of IoT acquisition equipment comprises a camera, a temperature and humidity sensor and a safety early warning sensor.
The embodiment of the present invention further provides an apparatus, which is characterized by comprising a memory and a processor, wherein the memory stores computer executable instructions, and the processor implements the method when executing the computer executable instructions on the memory.
According to the method provided by the embodiment of the invention, the space-time big data is stored in the data lake, the data lake is divided into a plurality of high-correlation subdata lakes based on different smart city examples, the data in the subdata lakes are subjected to semi-structured processing, different types of data are stored in a normalized mode, the storage space is saved, and after the government cloud needs related example data, the corresponding example data are only sent to the government cloud, so that the redundancy of data transmission is reduced, the data transmission efficiency is improved, and the network resources are saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below.
FIG. 1 is a schematic diagram illustrating a process of big data sharing in a smart city according to an embodiment;
FIG. 2 is a data lake architecture topology diagram of a smart city;
FIG. 3 is a schematic diagram of data lake splitting for a Smart city;
FIG. 4 is a diagram illustrating the hardware components of the apparatus according to one embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
The smart city can be divided into four levels, the first level is a perception layer, namely data are collected through various city nerve endings (Internet of things equipment), such as various cameras, temperature sensors, humidity sensors, water pressure sensors, mobile terminals and the like, are responsible for acquiring different types of data at different times and different places, the second layer is a communication layer and is responsible for uploading and summarizing the data acquired at different times and different places according to a certain communication protocol, the third layer is a platform layer, after receiving the data, the fourth layer is an application layer, and after useful data is extracted, that is, the services are provided for the aspects of the city, such as intelligent traffic, intelligent buildings, intelligent medical treatment, and intelligent power.
The core of the smart city lies in breaking an information island, so that massive data can be connected, stored and inquired, how to construct a large enough cloud storage, and how to provide a knowledge graph for a decision maker in a short time is a very key core problem.
In the embodiment of the present invention, a network architecture of a smart city cloud service may be divided into four layers, where the first layer is a sensing layer, and the sensing layer includes various IoT devices for acquiring different types of data at different times and different addresses, for example, GIS data, BIM data, and various types of IoT data (for example, temperature, humidity, and image data acquired by a temperature and humidity sensor, a video acquisition terminal, and the like). In the awareness layer, each IOT device is in a multi-Mode Heterogeneous Wireless Network (MHWN) Network. In the embodiment of the invention, the MHWN network is a multi-mode heterogeneous network. The multi-mode heterogeneous network is a network comprising a plurality of types of nodes and a plurality of types of relations, different types of networks which are overlapped with each other are fused together, so that the service diversity requirement of a future terminal is met, and the MHWN network is configured to be capable of dynamically selecting and switching among a plurality of communication networks according to the dynamic and differentiated requirements of communication or services. The second layer is an aggregation layer, namely an access layer, the aggregation layer comprises a plurality of Data Transmission Systems (DTSs), each DTS is responsible for summarizing data reported by all IoT devices in a current cell and transmitting the data to the core layer, the aggregation layer can use a radio technology of a software defined SDN to merge a plurality of communication networks, and a data transmission unit can realize SDR function through one or more communication in the plurality of communication networks, so that a software module can run in an MHWN network. The third layer is a core layer, the core layer is a virtual networking structure of a network slice, that is, a network function virtualization NFV is configured to perform a hierarchical decision on reported data, wherein the NFV is a bottom layer platform architecture of the core layer. An access network of the MHWN network. The fourth layer is an application layer, and after the core layer is analyzed and decided, various applications are performed.
Fig. 1 is a flowchart of a smart city big data sharing method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
s101, a cloud server acquires space-time big data collected by a sensor, wherein the space-time big data comprises structured data and unstructured data;
in the embodiment of the invention, the large spatiotemporal data represents all types of data associated with time and space, and in the field of smart cities, the large CIM spatiotemporal data has various different data sources and data types, typically Geographic Information System (GIS) data, Building Information Model (BIM) data and internet of things (IoT) data. GIS data, BIM data and IoT data belong to different types of data in a smart city cloud network, are different in source, and can be acquired simultaneously or in a time-sharing manner, wherein each acquired data has time and spaceTime can be represented by time T, such as T0-Tn-1The N time is shown, and the region can be shown by different types of different formats such as administrative regions, streets, GPS longitude and latitude, and the like, for example: guangdong province->Shenzhen city->Southern mountain area->The Guangdong sea streets are marked with regions from big to small. Typical GIS data is more applicable to electronic maps, such as information of areas, streets, buildings, parks, schools, etc., typical BIM data is data of more subdivided dimensions of a building and a school, such as building energy consumption, ventilation, arrangement, etc., IoT data is a general term for all internet of things data, and for data applicable to smart cities, typical IoT data includes temperature sensors, humidity sensors, monitoring devices, etc.
The spatiotemporal big data comprises structured data (such as XML, TXT, GPS and the like) and unstructured data (such as images, audios and videos and the like). Collected and aggregated by various sensors (data sources) at the sensing layer for transmission to the cloud. Typical sensors include temperature and humidity sensors, GPS terminals, surveillance cameras, etc.
S102, performing resource description framework RDF and related ontology description on the space-time big data to realize semantic level association among different space-time big data;
the Resource Description Framework (RDF) is a Data model (Data model) expressed by using XML syntax, and is used to describe the characteristics of Web resources and the relationship between resources. RDF is a Recommendation (Recommendation) issued by W3C on 22/2 1999, and is primarily intended to provide an Infrastructure (Infrastructure) for metadata applications on the Web to enable metadata to be exchanged between applications on the Web to facilitate automated processing of network resources.
The RDF data model is a Syntax independent (Syntax neutral) representation. If the two RDF grammars correspond to the same material model, it means that the two RDF grammars have the same meaning, and conversely, if the two RDF grammars have the same meaning, the material models thereof should be the same. The base material model of RDF includes three Object types (Object types):
resource (Resource): all things described in the RDF notation are called resources, which may be a web site, may be a web page, may be only a portion of a web page, and even things that do not exist on the web, such as paper documents, things, people, etc. In RDF, resources are named as Uniform Resource Identifiers (URIs), and Uniform Resource Locators (URLs), Uniform Resource Names (URNs), are all subsets of URIs.
Properties (Properties): attributes are specific features or relationships that are used to describe a resource, each attribute having a specific meaning that defines its attribute Value (Value) and the form of the resource it describes, as well as relationships to other attributes. RDF (Property value) is conceptually the same as conventional (Attribute value).
Statement (Statements): a specific resource is described by a named attribute and a corresponding attribute value, which is called an RDF statement, wherein the resource is a Subject (Subject), the attribute is a Predicate (Predicate), the attribute value is a Subject (Object), and the Subject of the statement may be a character string, other material forms or a resource.
In the computer field, Ontology (Ontology) can describe knowledge at semantic level, and can be regarded as a general concept model for describing knowledge in a certain subject field.
In the embodiment of the invention, in order to satisfy semantic association relations of different data, the cloud server associates space-time big data, and specifically, describes data acquired by different data sources through a description context of the RDF-ontology, so as to derive association relations of different kinds of data, where typical association relations include an attribution relation (for example, a description that road congestion belongs to traffic) and a juxtaposition relation (for example, a nan mountainous region of shenzhen city and a dragon hilly region of shenzhen city), and the like.
For example, the semantic relationship after RDF-ontology description may be: "building A-12 floors-air conditioning power consumption-1000 joules/hour".
In one embodiment, because the data volume is large, a distributed computation rule is required to perform RDF and ontology description, which may specifically be:
reading the marked data set of the space-time big data from an HDFS file system in a data lake;
splitting the data set into sub data sets, and distributing the sub data sets to different cloud nodes according to a distributed computing rule; distributed storage is a data storage technology, which uses disk space on each machine in an enterprise through a network and forms a virtual storage device with these distributed storage resources, and data is distributed and stored in each corner of the enterprise.
Performing RDF and related ontology description on the subdata sets on each cloud node synchronously;
and rewriting the described result into the HDFS.
S103, carrying out data source marking on the associated space-time big data, and storing the marked space-time big data into a data lake, wherein the data lake comprises a service layer and a source data layer;
after semantic association, the cloud server performs data source marking on the space-time big data, that is, different space-time big data are marked according to data sources, for example, an image is shot by a monitoring camera, so that the data source of the image data is the monitoring camera. The method for marking the space-time big data according to the data source has the advantages that different space-time data combinations often have certain economic value and use value, for example, the GPS data of a GPS terminal and the video data of a monitoring camera are combined, so that the emergency situation of which time appears in which place can be accurately positioned, the data combination can not leave the GPS data source and the monitoring camera data source, and similarly, the data combination of other scenes also needs to mark the data source in advance in the mass data to pre-judge different use scenes in advance.
Unlike the concept of data warehouses, the concept of data lakes or datalakes was originally proposed by large data vendors, and seemingly the data was carried on top of inexpensive storage hardware based on the outwardly expandable HDFS (Hadoop distributed file system). But the larger the amount of data, the more different kinds of storage are needed. Eventually, all enterprise data may be considered big data, but not all enterprise data is suitable for being deposited on top of an inexpensive HDFS cluster. One part of the value of the data lake is to gather different kinds of data together, and the other part of the value is to perform data analysis without a predefined model. Today's big data architectures are scalable and can provide users with more and more real-time analytics. The data lake architecture is oriented to information storage of multiple data sources, including the Internet of things. Big data analysis or archiving may be handled or delivered to the requesting user by accessing the data lake.
Fig. 2 is a topological diagram of an architecture of a data lake, and as shown in fig. 2, the data lake mainly includes a source data layer and a business layer, and data storage and management and control are performed through a management unit and a calculation engine. The source data layer comprises various structured and unstructured source data, the service layer is applied to the smart city, the source data are stored to the service layer after being processed, and different data are shared to different smart city government affair clouds through interaction of different government affair clouds in the service layer and the smart city. The management of the data lake is mainly divided into four aspects of task management, access control, metadata management and data governance, and the computing engine is the embodiment of the processing capacity of the data lake and comprises batch processing, stream processing, interactive processing, machine learning processing and the like of big data.
S104, dividing the data lake into a plurality of sub-data lakes based on different example requirements of the smart city, wherein each sub-data lake corresponds to each smart city example one by one;
the data lake is divided into a plurality of sub data lakes, and the data lake may be divided into the plurality of sub data lakes based on the time association relationship, the spatial association relationship, and/or the semantic association relationship, wherein the data sources of different sub data lakes are different in type. A temporal relationship, such as all data over a period of time, or data changes of the same data source over successive periods of time; spatial associations, such as a number of buildings in a street; semantic association relations, such as attribution relations of different provinces-cities-districts, or parallel relations, etc.
In the embodiment of the invention, different example requirements of the smart city come from actual requirements of different government departments, for example, the traffic department requires to provide a traffic flow of a road at 7-9 points, the environmental protection department requires to consume electric energy of a building, the land planning department requires to plan grades of all hospitals in a certain area, and the like, and the requirements are different, so that required data combinations (data sets) and presentation modes are different, and therefore, the traditional data sets are not suitable to be sent to each government cloud, but the required data are required to be sent to the corresponding government clouds. Taking the traffic department as an example, the required data comprises the automobile traffic collected by a camera, the longitude and latitude and speed information of the automobile positioned by a GPS terminal, the map information provided by a GIS and the like, and the data sources required by the example come from the camera, the GPS terminal and the GIS equipment, and the time is appointed to be 7-9 points. Therefore, the data lake can be divided into sub-data lakes corresponding to the example, and the data sources of the sub-data lakes comprise the camera A, the GPS terminal B and the GIS equipment C. Similarly, other subdata lakes may correspond one-to-one to other smart city instance requirements.
The splitting process can be as shown in fig. 3, the data source of the data lake includes sensors a, B, C, D, E, f., and is split into 5 sub-data lakes according to the example of the smart city, wherein the data source of the sub-data lake 1 includes a, C, D; the data sources of the sub-data lake 2 comprise A, D and F. In the embodiment of the invention, the sub-data lake can be split again (secondary splitting) to form a secondary sub-data lake corresponding to a smaller smart city instance.
Optionally, if the smart city instance requirement is a building information model BIM, splitting the data lake into a plurality of subdata lakes, specifically:
and splitting the data lake according to a data source attribution relationship, wherein the data source attribution relationship comprises BIM energy consumption attribution (such as the electric energy consumption of the XX building), BIM safety attribution (such as the fire protection facility of the XX building) and BIM flow attribution (such as the number of people going in and out of a gate in the XX building).
Optionally, if the smart city instance requirement is a geographic information system GIS type, splitting the data lake into a plurality of subdata lakes, specifically:
and splitting the data lake according to a geographical attribution relationship, wherein the geographical attribution relationship comprises provincial attribution, block attribution and road attribution.
Optionally, if the smart city instance requirement is an industry internet of things IoT type, splitting the data lake into a plurality of sub-data lakes, specifically:
splitting the data lake according to the type of IoT acquisition equipment, wherein the type of IoT acquisition equipment comprises a camera, a temperature and humidity sensor and a safety early warning sensor.
S105, performing semi-structured processing on each sub-data lake to generate each semi-structured smart city instance base, and storing the smart city instance base to a service layer of the data lake;
in order to facilitate archiving and improve the storage efficiency as much as possible, the embodiment of the invention normalizes the structured data and the unstructured data, and defines a storage format to avoid the problems that the structured data and the unstructured data are stored separately, so that the storage space is discontinuous, and the storage space has a storage gap and cannot be stored continuously, thereby causing the waste of partial storage resources.
The concrete mode of semi-structured treatment is as follows:
acquiring structured data and unstructured data of each subdata lake;
decomposing the structured data into a plurality of structured fields according to the size of 32Byte, setting the serial number of each field, the mark P and the end character E of the structured data, and forming a structured message with the plurality of structured fields;
decomposing unstructured data into a plurality of unstructured fields according to the size of 64 bytes, setting a serial number of each field, a mark M and an end character N of the unstructured data, and forming an unstructured message with the plurality of unstructured fields;
and combining the structured message and the unstructured message to generate a semi-structured message, and storing the semi-structured message in a service layer.
S106, receiving a service driving request of a government cloud, retrieving the semi-structured smart city instance base associated with the service driving, and sharing a final retrieval result to the government cloud.
In the embodiment of the invention, aiming at unstructured data, the business scene of the smart city can be searched through keywords, namely, a method of full-text retrieval and a relational graph is adopted. The full-text retrieval mainly aims at document retrieval, and the relational graph is formed based on entities, attributes and relations in the source data, so that the relevance ranking of retrieval results is finally realized; full-text retrieval is largely divided into two processes, index creation and search of indexes. In the embodiment of the invention, a smart city element relation map is introduced and used as an index for retrieval, the relation map is obtained by extracting entities, attributes and association relations from a plurality of sensor source data and structured labeled data, and the affiliated relation map is obtained according to at least one of attribution association and spatial association among the smart city element data.
For structured data, it is typically stored in a cache in the form of K-V. The K-V storage is classified according to application scenes, the scene which is happening at present can be determined based on historical data, the scene classification subject is Key, and the specific scene data is Value. This model provides better real-time performance than the real-time phenotyping needs. The data can be searched by adopting a K-V searching mode.
In summary, in the embodiment of the present invention, the retrieving process includes:
splitting the semi-structured smart city instance base into structured data and unstructured data;
searching the unstructured data in a full-text search and relation map mode to obtain a first search result;
searching the structured data by using a K-V relational search mode to obtain a second search result;
and merging the first retrieval result and the second retrieval result to obtain a third retrieval result, wherein the third retrieval result is the final retrieval result.
According to the method provided by the embodiment of the invention, the space-time big data is stored in the data lake, the data lake is divided into a plurality of high-correlation subdata lakes based on different smart city examples, the data in the subdata lakes are subjected to semi-structured processing, different types of data are stored in a normalized mode, the storage space is saved, and after the government cloud needs the related example data, the corresponding example data are only sent to government affair clouds of different intelligent departments, so that the redundancy of data transmission is reduced, the data transmission efficiency is improved, and the network resources are saved.
The embodiment of the present invention further includes an apparatus, which is characterized by comprising a memory and a processor, wherein the memory stores computer executable instructions, and the processor implements the method when executing the computer executable instructions on the memory.
Embodiments of the present invention also provide a computer-readable storage medium having stored thereon computer-executable instructions for performing the method in the foregoing embodiments.
The embodiment of the invention also provides a device which comprises a memory and a processor, wherein the memory is stored with computer executable instructions, and the processor realizes the method when running the computer executable instructions on the memory.
FIG. 4 is a diagram illustrating the hardware components of the apparatus according to one embodiment. It will be appreciated that fig. 4 only shows a simplified design of the device. In practical applications, the apparatuses may also respectively include other necessary elements, including but not limited to any number of input/output systems, processors, controllers, memories, etc., and all apparatuses that can implement the big data management method of the embodiments of the present application are within the protection scope of the present application.
The memory includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), which is used for storing instructions and data.
The input system is for inputting data and/or signals and the output system is for outputting data and/or signals. The output system and the input system may be separate devices or may be an integral device.
The processor may include one or more processors, for example, one or more Central Processing Units (CPUs), and in the case of one CPU, the CPU may be a single-core CPU or a multi-core CPU. The processor may also include one or more special purpose processors, which may include GPUs, FPGAs, etc., for accelerated processing.
The memory is used to store program codes and data of the network device.
The processor is used for calling the program codes and data in the memory and executing the steps in the method embodiment. Specifically, reference may be made to the description of the method embodiment, which is not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable system. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).
The above is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A big data sharing method for a smart city is characterized by comprising the following steps:
the method comprises the steps that a cloud server obtains space-time big data collected by a sensor, wherein the space-time big data comprises structured data and unstructured data;
performing resource description framework RDF and related ontology description on the space-time big data to realize semantic level association among different space-time big data;
carrying out data source marking on the correlated space-time big data, and storing the marked space-time big data into a data lake, wherein the data lake comprises a service layer and a source data layer;
dividing the data lake into a plurality of subdata lakes based on different example requirements of the smart city, wherein each subdata lake corresponds to each smart city example one by one;
performing semi-structured processing on each subdata lake to generate each semi-structured smart city instance base, and storing the semi-structured smart city instance base to a service layer of the data lake;
and receiving a service driving request of a government cloud, retrieving the semi-structured smart city instance base associated with the service driving, and sharing a final retrieval result to the government cloud.
2. The method of claim 1, wherein splitting the data lake into a plurality of subdata lakes comprises:
and splitting the data lake into the plurality of subdata lakes based on the time incidence relation, the spatial incidence relation and the semantic incidence relation, wherein the data sources of different subdata lakes are different in type.
3. The method of claim 1 or 2, wherein the retrieving the semi-structured smart city instance base associated with a business driver comprises:
splitting the semi-structured smart city instance base into structured data and unstructured data;
searching the unstructured data in a full-text search and relation map mode to obtain a first search result;
searching the structured data by using a K-V relational search mode to obtain a second search result;
and merging the first retrieval result and the second retrieval result to obtain a third retrieval result, wherein the third retrieval result is the final retrieval result.
4. The method of claim 1, wherein semi-structuring each of the subdata lakes comprises:
acquiring structured data and unstructured data of each subdata lake;
decomposing the structured data into a plurality of structured fields according to the size of 32Byte, setting the serial number of each field, the mark P and the end character E of the structured data, and forming a structured message with the plurality of structured fields;
decomposing unstructured data into a plurality of unstructured fields according to the size of 64 bytes, setting a serial number of each field, a mark M and an end character N of the unstructured data, and forming an unstructured message with the plurality of unstructured fields;
and combining the structured message and the unstructured message to generate a semi-structured message.
5. The method of claim 1, wherein performing Resource Description Framework (RDF) and related ontology description on the spatiotemporal big data comprises:
reading the marked data set of the space-time big data from an HDFS file system in a data lake;
splitting the data set into sub data sets, and distributing the sub data sets to different cloud nodes according to a distributed computing rule;
performing RDF and related ontology description on the subdata sets on each cloud node synchronously;
and rewriting the described result into the HDFS.
6. The method of claim 1, wherein splitting the data lake into a plurality of subdata lakes if the smart city instance requirement is Building Information Model (BIM) comprises:
and splitting the data lake according to a data source attribution relationship, wherein the data source attribution relationship comprises a BIM energy consumption attribution, a BIM safety attribution and a BIM flow attribution.
7. The method of claim 1, wherein splitting the data lake into a plurality of subdata lakes if the smart city instance requirements are of Geographic Information System (GIS) type comprises:
and splitting the data lake according to a geographical attribution relationship, wherein the geographical attribution relationship comprises provincial attribution, block attribution and road attribution.
8. The method of claim 1, wherein splitting the data lake into a plurality of child data lakes if the smart city instance demand is an industrial internet of things (IoT) type comprises:
splitting the data lake according to the type of IoT acquisition equipment, wherein the type of IoT acquisition equipment comprises a camera, a temperature and humidity sensor and a safety early warning sensor.
9. An apparatus comprising a memory having computer-executable instructions stored thereon and a processor that, when executing the computer-executable instructions on the memory, implements the method of any of claims 1 to 8.
CN202110561578.4A 2021-05-22 2021-05-22 Big data sharing method and device for smart city Withdrawn CN113282692A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110561578.4A CN113282692A (en) 2021-05-22 2021-05-22 Big data sharing method and device for smart city

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110561578.4A CN113282692A (en) 2021-05-22 2021-05-22 Big data sharing method and device for smart city

Publications (1)

Publication Number Publication Date
CN113282692A true CN113282692A (en) 2021-08-20

Family

ID=77280813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110561578.4A Withdrawn CN113282692A (en) 2021-05-22 2021-05-22 Big data sharing method and device for smart city

Country Status (1)

Country Link
CN (1) CN113282692A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961622A (en) * 2021-10-20 2022-01-21 康佳集团股份有限公司 Data fusion method and device for Internet of things equipment, intelligent terminal and storage medium
CN114416792A (en) * 2022-01-11 2022-04-29 中国人民解放军国防科技大学 Probability-based data stream processing method and system
CN114553487A (en) * 2022-01-22 2022-05-27 郑州工程技术学院 Access control method and system based on map
CN115204269A (en) * 2022-06-15 2022-10-18 南通市测绘院有限公司 Urban management data fusion method and system based on space-time reference
CN115439618A (en) * 2022-04-28 2022-12-06 朱俊丰 Intelligent park information model system
TWI799349B (en) * 2022-09-15 2023-04-11 國立中央大學 Using Ontology to Integrate City Models and IoT Open Standards for Smart City Applications

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961622A (en) * 2021-10-20 2022-01-21 康佳集团股份有限公司 Data fusion method and device for Internet of things equipment, intelligent terminal and storage medium
CN114416792A (en) * 2022-01-11 2022-04-29 中国人民解放军国防科技大学 Probability-based data stream processing method and system
CN114553487A (en) * 2022-01-22 2022-05-27 郑州工程技术学院 Access control method and system based on map
CN115439618A (en) * 2022-04-28 2022-12-06 朱俊丰 Intelligent park information model system
CN115204269A (en) * 2022-06-15 2022-10-18 南通市测绘院有限公司 Urban management data fusion method and system based on space-time reference
CN115204269B (en) * 2022-06-15 2024-03-12 南通市测绘院有限公司 Urban treatment data fusion method and system based on space-time reference
TWI799349B (en) * 2022-09-15 2023-04-11 國立中央大學 Using Ontology to Integrate City Models and IoT Open Standards for Smart City Applications

Similar Documents

Publication Publication Date Title
CN113282692A (en) Big data sharing method and device for smart city
Feng et al. A survey on trajectory data mining: Techniques and applications
Xu et al. Participatory sensing-based semantic and spatial analysis of urban emergency events using mobile social media
Li et al. Geomatics for smart cities-concept, key techniques, and applications
JP7300797B2 (en) Fusion of scalable spatio-temporal density data
Xu et al. Application of a graph convolutional network with visual and semantic features to classify urban scenes
Hu et al. Spatial data infrastructures
CN111708778B (en) Big data management method and system
EP2902913A1 (en) Device management apparatus and device search method
Mazhar Rathore et al. Advanced computing model for geosocial media using big data analytics
Ding et al. SeaCloudDM: a database cluster framework for managing and querying massive heterogeneous sensor sampling data
CN111726403B (en) Cross-cloud-platform big data management method and system
Xiang et al. Flood Markup Language–A standards-based exchange language for flood risk communication
George et al. Real-time spatio-temporal event detection on geotagged social media
CN111209323A (en) Spatial geographic information big data processing system
Silva et al. Applications of geospatial big data in the Internet of Things
Rehman et al. Building socially-enabled event-enriched maps
CN114925043A (en) Application method and device based on space-time grid block data and electronic equipment
Ali et al. Enabling spatial digital twins: Technologies, challenges, and future research directions
Badii et al. A smart city development kit for designing Web and mobile Apps
He et al. Perceiving commerial activeness over satellite images
Gubareva et al. Literature Review on the Smart City Resources Analysis with Big Data Methodologies
Saeed et al. Incorporating big data and IoT in intelligent ecosystems: state-of-the-arts, challenges and opportunities, and future directions
Shrivastava A review of spatial big data platforms, opportunities, and challenges
Bou et al. Streamingcube-based analytical framework for environmental data analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210820

WW01 Invention patent application withdrawn after publication