CN113961662A - Place name address data fusion method based on multi-source data - Google Patents

Place name address data fusion method based on multi-source data Download PDF

Info

Publication number
CN113961662A
CN113961662A CN202111288159.4A CN202111288159A CN113961662A CN 113961662 A CN113961662 A CN 113961662A CN 202111288159 A CN202111288159 A CN 202111288159A CN 113961662 A CN113961662 A CN 113961662A
Authority
CN
China
Prior art keywords
data
place name
standard
entities
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111288159.4A
Other languages
Chinese (zh)
Inventor
马正
黄勇
闾海荣
肖让
路喜
杨智博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Tuzhi Information Technology Co ltd
Original Assignee
Guizhou Tuzhi Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Tuzhi Information Technology Co ltd filed Critical Guizhou Tuzhi Information Technology Co ltd
Priority to CN202111288159.4A priority Critical patent/CN113961662A/en
Publication of CN113961662A publication Critical patent/CN113961662A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/909Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a place name address data fusion method based on multi-source data, which comprises the steps of collecting data, coding, constructing a standard library and fusing the data; by establishing a standard data fusion process, data of a plurality of departments can be fused, a unified and standard place name address is established, the problem of format diversity caused by the fact that social management elements are dispersed in different departments is solved, efficient management of mass address data is achieved, manual participation is greatly reduced, and data fusion time is saved.

Description

Place name address data fusion method based on multi-source data
Technical Field
The invention relates to the technical field of geographic information, in particular to a place name address data fusion method based on multi-source data.
Background
With the development of government services at all levels, various departments accumulate massive service data. In the process of promoting the open sharing of government data, due to different business emphasis, different business data structures and different standards of all departments, the generated business data can be analyzed and utilized only in the department, and an information isolated island is generated. With the development of digital cities, a place name address standard database with accurate position and wide coverage is built, which is the basis and the premise for developing other place name information services, such as place name voice query, place name website construction, electronic map development, an intelligent traffic information system, an emergency linkage system and the like; the method can realize information resource sharing linkage between government departments and different industries, exert the potential value of social treatment data and improve the social treatment level.
Analyzing the current situation of the place name and the address: (1) the data source is extensive: the place name address data relates to a plurality of functional departments, such as national, civil, real estate, administration, gas, industrial and commercial, statistics, quality inspection, local tax and the like; (2) the standards are not uniform, and the formats are various; the different work emphasis of each functional department is different, and the standard and standard formats of the place name and address data are also various, which is particularly shown in the file naming, the field setting and the dissatisfaction and diversity of the address description. Such as: the place name address data of the homeland department comes from a DWG format of a topographic map, and the data of the industrial and commercial department comes from a registration data EXCEL format; (3) lack of spatial data: in the collected place name address data of each functional department, only house number and place name data of a civil administration, an administrative department and a surveying and mapping department belong to GIS space data, place name address data of other departments are non-space data, and only simple address description information is needed to be spatialized; (4) poor compatibility, inability to implement data sharing: because of no uniform technical standard and specification of database building of the place name address library, the compatibility among different databases is poor, and the sharing of data resources cannot be realized. Therefore, there is an urgent need to develop a technology capable of fusing place name address data of multiple departments such as civil administration, real estate, administration, and homeland.
Currently, there is also a small amount of research on place name address data fusion techniques.
The patent with the application number of CN201911307558.3 discloses a multisource city space-time standard address fusion method based on geographic space portrait mining, which utilizes an ETL technology to perform extraction, cleaning, conversion and other processing on address related data, filters incomplete, wrong and repeated data, and fuses the data into city space-time standard address data with consistent granularity and uniform format.
The patent with the application number of CN201710645011.9 discloses a place name address data integration method based on multi-source data, which is characterized in that data in other formats are converted into shape format data, geographic element features are extracted, data are coded and converted, the data features are extracted, the topological relation among elements is integrated, face elements and grids are constructed, each element is matched and endowed with attribute information, and coordinates are unified to realize data integration. The data fusion step described in this patent includes: format conversion, data classification, coding, point, line, surface and feature skeleton extraction, topological relation integration, attribute giving and coordinate transformation, and converting data in other formats into data in a standard format on the whole, wherein the data in the standard format has category, attribute, topological relation and coding. The method is a step of integrating data, and the function of the step is to unify the data into a standard format, and the data fusion is not to associate multi-source and heterogeneous data. The latter is to reserve the respective structure and attribute values of the heterogeneous data, and associate the structure and attribute values with a unique spatial position or standard address, rather than remove or change the original attribute values of the heterogeneous data. Therefore, the patent technology cannot realize data fusion of multi-source and heterogeneous data association.
Most of the place name address data concordance methods disclosed by the patent technologies need to remove some attributes and information of original data, and are unified into standard format data. The method reduces the applicability of data, for example, multisource heterogeneous data from civil administration, administration and residential construction are unified into a standard format, and some fields with similar semantics are integrated into a standard field, so that the semantics reflecting the properties of a specific application scene are lost; because the same place name is called differently in different departments, the data use in the scene is inconvenient, and even errors occur.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a place name address data fusion method based on multi-source data, which can fuse place name address data of multiple departments such as civil administration, real estate and homeland and establish unified and standard place name address data; the method is realized by the following technical scheme:
a place name address data fusion method based on multi-source data comprises the following steps:
(1) collecting data: checking and collecting a place name address based on a grid employee of a nuclear collection system, wherein place name information comprises but is not limited to interest points, and address information comprises administrative divisions and street roadways;
(2) and (3) encoding: determining a coding specification, and finely managing to a room based on administrative division coding;
(3) constructing a standard library: and (5) warehousing the standard place name address data subjected to coding to form a standard library.
(4) Data fusion: and constructing an incidence relation of the multi-source data, wherein the incidence relation mainly comprises incidence, houses, events, real estate registration and incidence fusion of other civil data and map data.
Preferably, the data collection in step (1) mainly comprises: basic geographic class data and business application class data; wherein the basic geographic data mainly comprises: data such as electronic maps, satellite images, administrative districts and the like. The service application class data mainly comprises: the system comprises a current building, a real house, an administrative standard address, a POI, a city management grid and the like.
Preferably, the encoding in step (2) includes: according to place name classification and classification code compiling rules and police geographic information series regulations, a place name coding specification is formulated, and the structure composition of a standard address is defined as follows: province + city + county district + village and town community service center + village committee + natural village + street lane + group number of cell + name of cell + group name + number of building + name of building + unit + floor + apartment.
Preferably, the step of constructing a standard library in the step (3) comprises:
(3.1) basic grid division and coding: dividing basic grids, and endowing each basic grid with a unique code;
(3.2) building accurate floor plan: building construction is carried out, the name of the building is assigned by referring to the address data after the cleaning, and accurate map falling of the building data is realized based on the space coordinate;
(3.3) coding data warehousing: and warehousing the coded place name data to generate a coding library.
Preferably, the data fusion in step (4) comprises: and performing association fusion of population, house, event, real estate registration and other civil administration data and map data, and performing data association fusion by adopting a constructed knowledge graph.
Preferably, the data association and fusion by using the constructed knowledge graph includes the specific steps:
(4.1) constructing an entity: taking each level of structure names of grid codes and standard addresses as entities, and constructing the entities by using Node functions in python library py2 neo;
(4.2) constructing the relationship: constructing the relationship between entities by using a create _ relationship function in a python library py2 neo;
(4.3) adding an attribute: taking other fields in the original data as entity attributes to carry out association;
(4.4) constructing a knowledge graph: respectively constructing a knowledge graph by using functions of create _ graph nodes, create _ graph Rels and graph of the entities, the relationships among the entities and the attributes of the entities;
(4.5) visualization: the knowledge-graph is visualized using Neo4j graph databases.
The invention has the beneficial effects that:
1. the invention provides a data fusion process for establishing standards, which solves the problems that the current social governing elements are dispersed in different departments and the formats are diversified.
2. The invention adopts knowledge graph technology to perform data fusion, thereby reducing the workload of manual matching fusion; and the visualization effect is strong by using a Neo4j graph database for display. Meanwhile, manual matching module codes do not need to be specially compiled, relationships among entities, entity attributes and the like are displayed clearly and visually, complex and various association analyses can be processed, and knowledge reasoning, quick query and real-time calculation are supported.
3. The data storage mode is reconstructed from table storage to graph storage, and convenience of developing a question answering system, searching, recommending, predicting and other downstream applications is improved based on the formed knowledge graph.
Drawings
Fig. 1 is a map of the source of place name address data.
Fig. 2 is a flowchart of a place name address data fusion method based on multi-source data according to the present invention.
FIG. 3 is a flow chart of the association fusion in the method for fusing place name address data based on multi-source data according to the present invention.
Detailed Description
For the purpose of enhancing the understanding of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and examples, which are provided for the purpose of illustration only and are not intended to limit the scope of the present invention.
Example 1
A place name address data fusion method based on multi-source data comprises the following steps:
(1) collecting data: 8322 pieces of data including about 60+ field attributes are collected from stock addresses, administrative department data and building data;
(2) and (3) encoding: compiling standard place name address composition, coding and place name naming specifications according to related standard files, and coding data according to the specifications;
(3) constructing a standard library: and (5) warehousing the standard place name address data subjected to coding to form a standard library.
(4) Data fusion: and constructing a knowledge graph to perform data association fusion.
Further, the data collection in step (1) mainly comprises: basic geography class and business application class; wherein the basic geographical class data comprises: data such as electronic map, satellite image, administrative region, etc., and the service application data includes: the system comprises a current building, a real house, an administrative standard address, a POI, a city management grid and the like.
Further, the encoding in step (2) includes: according to the relevant standard of the standard address, the place name coding standard is established, and the structure of the standard address is defined as follows: province + city + county district + village and town community service center + village committee + natural village + street lane + group number of cell + name of cell + group name + number of building + name of building + unit + floor + apartment.
Further to the step (2), the adopted standard address-related specification comprises: GB/T2260-2007 code for administrative division of the people's republic of China; GB/T10114-2003 'administrative division code compilation rules below county level'; GB/T18521-2001 "geographical name Classification and Classification code compilation rules"; CH/Z9002-2007 Classification, description and coding rules of Place name and Address of digital City geospatial information public platform; GA/T1219-; GA/T XXX-201X Standard Address model for police geographic information (submission).
Further, the step of constructing the standard library in the step (3) comprises:
(3.1) basic grid division and coding: dividing urban and rural communities, administrative villages and other specific spatial regions into grid units capable of being seamlessly aggregated, taking the grid units as management units for basic social governance, and assigning unique codes to each basic grid;
(3.2) building accurate floor plan: building construction surfaces are carried out, the names of the building surfaces are assigned by referring to the cleaned address data, and accurate map falling of building data is realized based on space coordinates;
(3.3) coding data warehousing: putting the place name data coded in the step (2) into a database to generate a coding database;
further, the data fusion in the step (4) comprises: the association and fusion of population, house, event, real estate registration and other civil data and map data; and performing data association fusion by adopting a constructed knowledge graph.
Further, the data association and fusion by adopting the constructed knowledge graph comprises the following specific steps:
(4.1) constructing an entity: taking each level of structure names of grid codes and standard addresses as entities, and constructing the entities by using Node functions in python library py2 neo;
(4.2) constructing the relationship: constructing the relationship between entities by using a create _ relationship function in a python library py2 neo;
(4.3) adding an attribute: taking population, house, event, real estate registration and other civil affairs data as entity attributes to be associated;
(4.4) constructing a knowledge graph: respectively constructing a knowledge graph by using functions of create _ graph nodes, create _ graph Rels and graph of the entities, the relations and the attributes;
(4.5) visualization: and (4) finishing the associative fusion of the data based on the Neo4j visual knowledge graph.
It should be noted that the above embodiments are only for further illustration and understanding of the technical solutions of the present invention, and should not be understood as further limitations of the technical solutions of the present invention, and the invention with non-prominent essential features and significant advances made by those skilled in the art still belongs to the protection scope of the present invention.

Claims (9)

1. A place name address data fusion method based on multi-source data is characterized by comprising the following steps:
(1) collecting data: collecting multi-source place name addresses;
(2) and (3) encoding: determining a coding specification, and finely managing to a room based on administrative division coding;
(3) constructing a standard library: putting the standard place name address data subjected to coding into a database to form a standard database;
(4) data fusion: and constructing the incidence relation of the multi-source data.
2. The method for fusing place name address data based on multi-source data according to claim 1, wherein the data collection in step (1) comprises: basic geographical class data and business application class data.
3. The method for fusing place name address data based on multi-source data according to claim 1, wherein the encoding in step (2) comprises: according to the relevant standard of the standard address, a place name coding standard is established, and the structure of the standard address is defined.
4. The method for fusing place name and address data based on multi-source data according to claim 1, wherein the step of constructing the standard library in the step (3) comprises:
(3.1) basic grid division and coding: dividing basic grids, and endowing each basic grid with a unique code;
(3.2) building accurate floor plan: building construction is carried out, the name of the building is assigned by referring to the address data after the cleaning, and accurate map falling of the building data is realized based on the space coordinate;
(3.3) coding data warehousing: and warehousing the coded place name data to generate a coding library.
5. The method for multi-source data-based place name address data fusion according to claim 1, wherein the data fusion in the step (4) comprises: the association and fusion of population, house, event, real estate registration and other civil data and map data; and performing data association fusion by adopting a constructed knowledge graph.
6. The method for fusing place name and address data based on multi-source data according to claim 5, wherein the data association fusion is performed by adopting a constructed knowledge graph, and the specific steps comprise:
(4.1) constructing an entity: taking the structure names of each level of the grid codes and the standard addresses as entities to construct the entities;
(4.2) constructing the relationship: establishing a relation between entities;
(4.3) adding an attribute: taking other fields in the original data as entity attributes to carry out association;
(4.4) constructing a knowledge graph: constructing a knowledge graph by adopting the entities, the relationships among the entities and the attributes of the entities;
(4.5) visualization: the knowledge-graph is visualized using Neo4j graph databases.
7. The multi-source data-based place name address data fusion method according to claim 6, characterized in that in the step (4.1), an entity is constructed by using a Node function in python library py2 neo.
8. The multi-source data-based place name address data fusion method according to claim 6, wherein in the step (4.2), entity relations are constructed by using a create _ relationship function in python library py2 neo.
9. The multi-source data-based place name address data fusion method according to claim 6, characterized in that in the step (4.4), the entities, the relationships among the entities, and the attributes of the entities respectively use functions create _ graph nodes, create _ graph Rels, and graph.
CN202111288159.4A 2021-11-02 2021-11-02 Place name address data fusion method based on multi-source data Pending CN113961662A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111288159.4A CN113961662A (en) 2021-11-02 2021-11-02 Place name address data fusion method based on multi-source data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111288159.4A CN113961662A (en) 2021-11-02 2021-11-02 Place name address data fusion method based on multi-source data

Publications (1)

Publication Number Publication Date
CN113961662A true CN113961662A (en) 2022-01-21

Family

ID=79468910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111288159.4A Pending CN113961662A (en) 2021-11-02 2021-11-02 Place name address data fusion method based on multi-source data

Country Status (1)

Country Link
CN (1) CN113961662A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205584A (en) * 2022-11-21 2023-06-02 中国民航科学技术研究院 Civil aviation event association method based on unified space-time coding
CN116680648A (en) * 2023-03-24 2023-09-01 中乾思创(北京)科技有限公司 Service fusion data generation method and system for digital twin city

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205584A (en) * 2022-11-21 2023-06-02 中国民航科学技术研究院 Civil aviation event association method based on unified space-time coding
CN116205584B (en) * 2022-11-21 2023-08-22 中国民航科学技术研究院 Civil aviation event association method based on unified space-time coding
CN116680648A (en) * 2023-03-24 2023-09-01 中乾思创(北京)科技有限公司 Service fusion data generation method and system for digital twin city
CN116680648B (en) * 2023-03-24 2024-01-16 中乾思创(北京)科技有限公司 Service fusion data generation method and system for digital twin city

Similar Documents

Publication Publication Date Title
CN107092680B (en) Government affair information resource integration method based on geographic grids
WO2022012285A1 (en) Multi-source integrated multi-platform energy information management system
CN107526786A (en) The method and system that place name address date based on multi-source data is integrated
Yu et al. Multi-criteria satisfaction assessment of the spatial distribution of urban emergency shelters based on high-precision population estimation
Cheng et al. Urban land administration and planning in China: Opportunities and constraints of spatial data models
CN113961662A (en) Place name address data fusion method based on multi-source data
Zheng et al. Exploring both home-based and work-based jobs-housing balance by distance decay effect
Lang et al. Identification of “growth” and “shrinkage” pattern and planning strategies for shrinking cities based on a spatial perspective of the Pearl River Delta Region
CN112988715B (en) Construction method of global network place name database based on open source mode
Lv et al. Polycentric urban development and its determinants in China: A geospatial big data perspective
CN111125285B (en) Animal geographic division method based on species space distribution relation
CN110929984A (en) Urban standard grid informatization management method and system
CN116680648A (en) Service fusion data generation method and system for digital twin city
CN116522272A (en) Multi-source space-time data transparent fusion method based on urban information unit
CN111104449A (en) Multisource city space-time standard address fusion method based on geographic space portrait mining
Yang et al. Urban digital twin applications as a virtual platform of smart city
CN111813819B (en) Space-time big data-based place name and address online matching method
CN114692236A (en) Big data-oriented territorial space planning base map base number processing method
CN114661744B (en) Terrain database updating method and system based on deep learning
CN112508332A (en) Gradual rural settlement renovation partitioning method considering multidimensional characteristics
CN113626408B (en) City information database construction method and map display method
CN117435823A (en) Space-time data service method based on grid coding and industrial Internet platform
CN111382165A (en) Mobile homeland management system
CN114282847A (en) House full life cycle place name address system and construction method thereof
CN114896255A (en) Block data generation and display method and device based on space-time grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination