CN113918725A

CN113918725A - Construction method of knowledge graph in water affairs field

Info

Publication number: CN113918725A
Application number: CN202111011676.7A
Authority: CN
Inventors: 丛小飞; 左翔; 刘威风; 赵杏杏; 刘修恒
Original assignee: Nanjing Zhongyu Smart Water Conservation Research Institute Co ltd
Current assignee: Nanjing Zhongyu Smart Water Conservation Research Institute Co ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2022-01-11

Abstract

The invention discloses a method for constructing a river and lake health knowledge map, which comprises the following main steps of: on the basis of analyzing relevant water conservancy industry standards and types of river and lake health related data resources, respectively defining river and lake health metadata types and a knowledge service mode based on catalog classification, determining an ontology set of a river and lake health ontology model, determining attributes, mining and establishing relations between ontologies according to the attributes, and modeling the river and lake health ontology library model; through various means such as topic mining, remote supervision, cause and effect relationship extraction, more entities and association relationships are extracted from massive heterogeneous data resources, and an ontology base model is further supplemented and perfected: comprehensive calculation is carried out by adopting a concept similarity calculation algorithm based on common attributes and a similarity calculation algorithm based on an in-out chain set, so that entity redundancy is reduced, and knowledge fusion is realized; and a self-adaptive updating mechanism is established to realize semi-automatic updating of the river and lake health knowledge map.

Description

Construction method of knowledge graph in water affairs field

Technical Field

The invention belongs to the field of knowledge maps, and particularly relates to a construction method of a knowledge map in the water affairs field.

Background

With the continuous promotion of the urbanization process, the requirements of people on the urban water management are gradually increased. Because the water affair management work relates to a wide range, and the interaction mechanism between water affair objects and elements is complex, the scientific grasp of the water situation and the water environment condition of the urban river network, the comprehensive management of the water supply and drainage pipe network, the effective prediction of waterlogging risks and the reasonable formulation of a water affair scheduling decision scheme are realized by analyzing and mining massive heterogeneous data in the water affair field. However, after years of accumulation, water affair-related departments obtain massive real-time data and basic data through various sensing devices, and generate a large amount of business data and text data in the water affair work circulation process, and various water affair theme data generated on various governments or public websites, wherein the data are scattered and distributed in different systems and platforms. The data are collocated in a certain relation through technical means to form a data semantic network, and the water management work provides decision support and is a problem to be considered at present. For example, mass data is stored by using a distributed storage technology platform, but the platform cannot mine the connection between data, the data relevance and the interoperability are poor, and the sharing capability is insufficient. The knowledge graph can abstract and unify concepts, strengthen the relation between various objects and concepts, and perform system integration and intensive management on a complex data system. By constructing the knowledge graph facing the water affair field, the scientific management of the water affairs can be served, the intelligent water affair construction is supported, and the intelligent level of the water affair work is improved.

Disclosure of Invention

The invention aims to solve the technical problem of the prior art, provides a construction method of a knowledge map in the water service field, aims to establish a communication bridge between water service field data and knowledge, solves the problems of abundant, dispersed, fuzzy and unguided data in the field, provides knowledge support for water service management personnel to make decisions, and can accurately develop force and comprehensively develop aiming at the outstanding problems of water service in different periods, thereby constantly exploring a good strategy suitable for the national conditions of China.

In order to achieve the purpose, the invention specifically adopts the following technical scheme:

a construction method of a knowledge graph in the water service field is characterized by comprising the following steps:

step 1: before top-level knowledge map construction and knowledge extraction are carried out on water affair data, data are verified and noise is removed;

step 2: constructing a water affair domain knowledge graph top-level conceptual model based on a neo4j platform, and taking the conceptual model as a framework of the water affair domain knowledge graph;

and step 3: performing entity extraction and relation extraction from industry standards, various databases, government function department websites, hydrological water environment monitoring websites, public websites, internet of things data, remote sensing images and other heterogeneous data sources;

and 4, step 4: on the basis of data extraction, three groups of data with the same reference are hooked under the same concept, and entity alignment is completed by calculating the similarity between concept entities; the entity ternary group data is a triple comprising an entity-attribute value and an entity-relationship-entity;

and 5: the storage of knowledge is done based on the graph database of the neo4j platform.

The construction method of the knowledge graph in the water service field is characterized in that the step 1 specifically comprises the following steps:

(1) cleaning missing values, abnormal values, repeated values and dirty data in the text data type;

(2) processing data recorded by tables and pictures in the non-text data, and sorting the data into text data by using manual extraction or picture-to-character software;

(3) filtering random errors existing in the data;

(4) the sentences in the text data are organized into usable corpora by taking single sentence phrases as units.

The method for constructing the knowledge graph in the water service field is characterized in that the step 2 specifically comprises the following steps:

classifying the water affair objects in a grading way, and dividing two subclasses of a geographic position concept and an object facility concept under the water affair field concept;

the domain class contained in the concept of the geographic position is a qualitative result of a geographic area, and the domain class contained in the concept of the object facility is a water affair object which naturally exists or is constructed manually;

for the concept of geographic location, the geographic location area described by the geographic location concept is further divided into descriptive places and functional places according to whether the geographic location area has actual functions;

for the concept of object facilities, natural objects and engineering facilities are further distinguished according to natural existence or artificial construction.

The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the types of the data sources are divided into the following three types:

(1) structured data, consisting essentially of: excel tables, relational databases (e.g., Mysql, Oracle, Microsoft Access, etc.), object oriented databases (e.g., Db4o), and the like;

(2) the semi-structured data mainly come from Baidu encyclopedia, government function department websites, hydrologic water environment monitoring websites, public websites, Wikipedia and other websites, and data stored in Xml files;

(3) the unstructured data mainly refers to unstructured text data such as a water administration related unit text, documents, and the internet.

The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the structured data is extracted mainly in the following way:

(a1) connecting a database;

(a2) carrying out basic data initialization operation;

(a3) constructing SQL sentences and carrying out data query;

(a4) carrying out data type, structure and attribute conversion;

(a5) judging whether the data exists in a neo4j database, if so, returning to the step (a3), and if not, storing the data in the step (a6) (mainly judging that the information of the same node is a labels field in the neo4 j);

(a6) constructing a neo4j data storage statement, determining a superior-inferior relation by combining information extracted by an SQL statement, and creating a node;

(a7) and judging whether the query of the SQL statement is finished, if so, exiting the extraction program, and if not, returning to the step (a3) to continue constructing the SQL statement for data query.

The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the semi-structured data is extracted mainly in the following way:

(b1) firstly, opening a website through an Engine module of script, and sending a first crawling request through a Spider module;

(b2) the Engine module obtains a crawling link from the Spider module, and schedules in a scheduling request mode through the Scheduler module;

(b3) the Engine module requests the Scheduler module for the next link to be crawled, and simultaneously, the Engine module sends the task to the Downloader module for downloading;

(b4) after the page is downloaded, the Downloader module feeds the downloaded data back to the Engine module and delivers the downloaded data to the Spider module to analyze and process the crawled data;

(b5) storing the analyzed data into a file according to a specified format;

(b6) after repeating steps (b2) to (b5) until the Scheduler module has no more requests, the Engine module closes, ending the data crawl.

The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the unstructured data is extracted mainly in the following way:

(c1) searching a water affair field triple capable of embodying a preset relation in the established water affair field knowledge graph, and acquiring a training set for extracting the relation of the water affair field after aligning a corpus;

(c2) obtaining the expression of a sentence by using a neural network model, training the model, and obtaining a classifier for extracting the water affair field relation;

(c3) after the model accuracy is verified, named entity recognition is carried out on the new text, a water affair entity in a sentence is obtained, a new training sample is obtained, and the obtained model is used for carrying out relation extraction on the new training sample.

The construction method of the knowledge graph in the water service field is characterized in that in the step 4, the specific method is as follows:

(1) because the letters have capital and small cases and some special characters are added in the name of the database table sometimes, the character strings need to be screened and converted in the early stage, and concept words are screened and converted into lowercase letters by formulating regular expressions;

(2) assuming that for two concepts to be compared, a source string is set as a set a, a target string is set as a set b, and lengths are t1 and t2, respectively, these two are converted into matrices in the form of m [ t1+1, t2+1], and the first row and the first column are set as 0, 1, 2 … t2 and 0, 1, 2 … t 1. Setting the editing cost as cost;

(3) comparing each pair of characters in a (x takes 1 to t1) and b (y takes 1 to t 2);

(4) if a [ x ] is the same as b [ y ], cost is 0; if a [ x ] is different from b [ y ], cost is 1;

(5) each m [ x, y ] is equal to the minimum of:

A. moving m [ x, y ] to a unit cell right above, namely m [ x-1, y ] + 1;

B. moving m [ x, y ] to the positive left by one cell, i.e., m [ x, y-1] + 1;

C. shifting m [ x, y ] one cell to the left and right, respectively, and adding the value of cost, i.e., m [ x-1, y-1] + cost;

(6) iterating the 2 nd, 3 rd and 4 th steps, wherein m [ t1, t2] is the minimum editing distance after the two concept words are converted into the same, and max (t1, t2) is the maximum value of the lengths of the two character strings;

then, the similarity between the two strings a and b is:

in the above construction method of the knowledge graph in the water service field, in step 5, the node of the graph database based on the neo4j platform stored in the knowledge storage represents an entity in the network, and the edge represents a relationship, all data of each entity is stored and expanded through < Key, Value >, and the data import aspect uses a Cypher statement inside neo4j for import.

The invention has the beneficial effects that:

(1) the invention is used for storing and intelligently identifying knowledge in the water affair field, can solve the problems of dispersion, fuzziness, non-guidance and the like of the knowledge in the water affair field, and has the service capability of merging, inducing and collating the knowledge and providing self-learning.

(2) The traditional training set for extracting the relation of the water affair entity based on manual labeling needs a large amount of manpower, also needs to have professional knowledge in the water affair field, and almost has no training set for extracting the relation of the water affair field at present. The invention adopts the relation extraction based on the remote supervision method, automatically constructs a relation instance data set which can be used for the relation extraction, trains a relation extraction model by using the constructed data set, and is used for judging the relation of entities in a new sentence.

(3) The method for constructing the knowledge graph in the water affair field can more conveniently and efficiently extract the water affair structured data and the unstructured text data and the relation and connect the water affair object.

Drawings

FIG. 1 is a schematic view of the present invention.

Fig. 2 is a schematic diagram of a calculation flow.

FIG. 3 is a schematic diagram of hierarchical levels of water service objects.

FIG. 4 is a diagram of the framework of the Scapy crawler.

FIG. 5 is a schematic diagram of a remote supervised relationship extraction framework based on outlier detection.

FIG. 6 is a schematic diagram of a domain knowledge map for water utilities.

Detailed description of the preferred embodiments

Example one

The construction method of the knowledge graph in the water service field is characterized by comprising the following steps:

Example two

The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in the step 1, the following contents are specifically included:

(3) filtering random errors existing in the data;

EXAMPLE III

The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in the step 2, the following contents are specifically included:

Example four

The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that in step 3, the types of data sources are divided into the following three types:

EXAMPLE five

The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in step 3, the structured data is extracted mainly in the following manner:

(a1) connecting a database;

(a2) carrying out basic data initialization operation;

(a3) constructing SQL sentences and carrying out data query;

(a4) carrying out data type, structure and attribute conversion;

EXAMPLE six

The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in step 3, the semi-structured data is extracted mainly in the following manner:

(b5) storing the analyzed data into a file according to a specified format;

EXAMPLE seven

The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in step 3, the unstructured data is extracted mainly in the following manner:

Example eight

The method for constructing the knowledge graph in the water service field in this embodiment is characterized in that, in the step 4, the specific method is as follows:

(5) each m [ x, y ] is equal to the minimum of:

A. moving m [ x, y ] to a unit cell right above, namely m [ x-1, y ] + 1;

B. moving m [ x, y ] to the positive left by one cell, i.e., m [ x, y-1] + 1;

then, the similarity between the two strings a and b is:

example nine

In the method for constructing a knowledge graph in the water service field according to this embodiment, in step 5, a graph database based on a neo4j platform for storing knowledge is characterized in that stored nodes of the graph database represent entities in a network, edges of the graph database represent relationships, all data of each entity are stored and expanded through < Key and Value >, and data import is conducted by using a Cypher statement inside neo4 j.

Based on the above embodiment, the present invention has the following advantages: (1) the invention is used for storing and intelligently identifying knowledge in the water affair field, can solve the problems of dispersion, fuzziness, non-guidance and the like of the knowledge in the water affair field, and has the service capability of merging, inducing and collating the knowledge and providing self-learning; (2) the traditional training set for extracting the relation of the water affair entity based on manual labeling needs a large amount of manpower, also needs to have professional knowledge in the water affair field, and almost has no training set for extracting the relation of the water affair field at present. The invention adopts the relation extraction based on the remote supervision method, automatically constructs a relation instance data set which can be used for the relation extraction, trains a relation extraction model by using the constructed data set, and is used for judging the relation of entities in a new sentence; (3) the method for constructing the knowledge graph in the water affair field can more conveniently and efficiently extract the water affair structured data and the unstructured text data and the relation and connect the water affair object.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A construction method of a knowledge graph in the water service field is characterized by comprising the following steps:

2. The method for constructing a knowledge graph in the water service field according to claim 1, wherein the step 1 specifically comprises the following steps:

(3) filtering random errors existing in the data;

3. The method for constructing a knowledge graph in the water service field according to claim 1, wherein the step 2 specifically comprises the following steps:

4. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the types of the data sources are divided into the following three types:

(1) structuring the data; (2) semi-structured data; (3) unstructured data.

5. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the structured data is extracted mainly by adopting the following method:

(a1) connecting a database;

(a2) carrying out basic data initialization operation;

(a3) constructing SQL sentences and carrying out data query;

(a4) carrying out data type, structure and attribute conversion;

(a5) judging whether the data exists in a neo4j database, if so, returning to the step (a3), and otherwise, storing the data in the step (a 6);

6. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the semi-structured data is extracted mainly by adopting the following method:

(b5) storing the analyzed data into a file according to a specified format;

7. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the unstructured data is extracted mainly by the following method:

8. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 4, the specific method is as follows:

(5) each m [ x, y ] is equal to the minimum of:

A. moving m [ x, y ] to a unit cell right above, namely m [ x-1, y ] + 1;

B. moving m [ x, y ] to the positive left by one cell, i.e., m [ x, y-1] + 1;

then, the similarity between the two strings a and b is:

9. the method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 5, the node of the graph database based on the neo4j platform stored in the knowledge storage represents an entity in the network, the edge represents a relationship, all data of each entity is stored and expanded through < Key, Value >, and the data import aspect uses a Cypher statement inside neo4j for import.