CN113918725A - Construction method of knowledge graph in water affairs field - Google Patents

Construction method of knowledge graph in water affairs field Download PDF

Info

Publication number
CN113918725A
CN113918725A CN202111011676.7A CN202111011676A CN113918725A CN 113918725 A CN113918725 A CN 113918725A CN 202111011676 A CN202111011676 A CN 202111011676A CN 113918725 A CN113918725 A CN 113918725A
Authority
CN
China
Prior art keywords
data
constructing
water
concept
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111011676.7A
Other languages
Chinese (zh)
Inventor
丛小飞
左翔
刘威风
赵杏杏
刘修恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Zhongyu Smart Water Conservation Research Institute Co ltd
Original Assignee
Nanjing Zhongyu Smart Water Conservation Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Zhongyu Smart Water Conservation Research Institute Co ltd filed Critical Nanjing Zhongyu Smart Water Conservation Research Institute Co ltd
Priority to CN202111011676.7A priority Critical patent/CN113918725A/en
Publication of CN113918725A publication Critical patent/CN113918725A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for constructing a river and lake health knowledge map, which comprises the following main steps of: on the basis of analyzing relevant water conservancy industry standards and types of river and lake health related data resources, respectively defining river and lake health metadata types and a knowledge service mode based on catalog classification, determining an ontology set of a river and lake health ontology model, determining attributes, mining and establishing relations between ontologies according to the attributes, and modeling the river and lake health ontology library model; through various means such as topic mining, remote supervision, cause and effect relationship extraction, more entities and association relationships are extracted from massive heterogeneous data resources, and an ontology base model is further supplemented and perfected: comprehensive calculation is carried out by adopting a concept similarity calculation algorithm based on common attributes and a similarity calculation algorithm based on an in-out chain set, so that entity redundancy is reduced, and knowledge fusion is realized; and a self-adaptive updating mechanism is established to realize semi-automatic updating of the river and lake health knowledge map.

Description

Construction method of knowledge graph in water affairs field
Technical Field
The invention belongs to the field of knowledge maps, and particularly relates to a construction method of a knowledge map in the water affairs field.
Background
With the continuous promotion of the urbanization process, the requirements of people on the urban water management are gradually increased. Because the water affair management work relates to a wide range, and the interaction mechanism between water affair objects and elements is complex, the scientific grasp of the water situation and the water environment condition of the urban river network, the comprehensive management of the water supply and drainage pipe network, the effective prediction of waterlogging risks and the reasonable formulation of a water affair scheduling decision scheme are realized by analyzing and mining massive heterogeneous data in the water affair field. However, after years of accumulation, water affair-related departments obtain massive real-time data and basic data through various sensing devices, and generate a large amount of business data and text data in the water affair work circulation process, and various water affair theme data generated on various governments or public websites, wherein the data are scattered and distributed in different systems and platforms. The data are collocated in a certain relation through technical means to form a data semantic network, and the water management work provides decision support and is a problem to be considered at present. For example, mass data is stored by using a distributed storage technology platform, but the platform cannot mine the connection between data, the data relevance and the interoperability are poor, and the sharing capability is insufficient. The knowledge graph can abstract and unify concepts, strengthen the relation between various objects and concepts, and perform system integration and intensive management on a complex data system. By constructing the knowledge graph facing the water affair field, the scientific management of the water affairs can be served, the intelligent water affair construction is supported, and the intelligent level of the water affair work is improved.
Disclosure of Invention
The invention aims to solve the technical problem of the prior art, provides a construction method of a knowledge map in the water service field, aims to establish a communication bridge between water service field data and knowledge, solves the problems of abundant, dispersed, fuzzy and unguided data in the field, provides knowledge support for water service management personnel to make decisions, and can accurately develop force and comprehensively develop aiming at the outstanding problems of water service in different periods, thereby constantly exploring a good strategy suitable for the national conditions of China.
In order to achieve the purpose, the invention specifically adopts the following technical scheme:
a construction method of a knowledge graph in the water service field is characterized by comprising the following steps:
step 1: before top-level knowledge map construction and knowledge extraction are carried out on water affair data, data are verified and noise is removed;
step 2: constructing a water affair domain knowledge graph top-level conceptual model based on a neo4j platform, and taking the conceptual model as a framework of the water affair domain knowledge graph;
and step 3: performing entity extraction and relation extraction from industry standards, various databases, government function department websites, hydrological water environment monitoring websites, public websites, internet of things data, remote sensing images and other heterogeneous data sources;
and 4, step 4: on the basis of data extraction, three groups of data with the same reference are hooked under the same concept, and entity alignment is completed by calculating the similarity between concept entities; the entity ternary group data is a triple comprising an entity-attribute value and an entity-relationship-entity;
and 5: the storage of knowledge is done based on the graph database of the neo4j platform.
The construction method of the knowledge graph in the water service field is characterized in that the step 1 specifically comprises the following steps:
(1) cleaning missing values, abnormal values, repeated values and dirty data in the text data type;
(2) processing data recorded by tables and pictures in the non-text data, and sorting the data into text data by using manual extraction or picture-to-character software;
(3) filtering random errors existing in the data;
(4) the sentences in the text data are organized into usable corpora by taking single sentence phrases as units.
The method for constructing the knowledge graph in the water service field is characterized in that the step 2 specifically comprises the following steps:
classifying the water affair objects in a grading way, and dividing two subclasses of a geographic position concept and an object facility concept under the water affair field concept;
the domain class contained in the concept of the geographic position is a qualitative result of a geographic area, and the domain class contained in the concept of the object facility is a water affair object which naturally exists or is constructed manually;
for the concept of geographic location, the geographic location area described by the geographic location concept is further divided into descriptive places and functional places according to whether the geographic location area has actual functions;
for the concept of object facilities, natural objects and engineering facilities are further distinguished according to natural existence or artificial construction.
The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the types of the data sources are divided into the following three types:
(1) structured data, consisting essentially of: excel tables, relational databases (e.g., Mysql, Oracle, Microsoft Access, etc.), object oriented databases (e.g., Db4o), and the like;
(2) the semi-structured data mainly come from Baidu encyclopedia, government function department websites, hydrologic water environment monitoring websites, public websites, Wikipedia and other websites, and data stored in Xml files;
(3) the unstructured data mainly refers to unstructured text data such as a water administration related unit text, documents, and the internet.
The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the structured data is extracted mainly in the following way:
(a1) connecting a database;
(a2) carrying out basic data initialization operation;
(a3) constructing SQL sentences and carrying out data query;
(a4) carrying out data type, structure and attribute conversion;
(a5) judging whether the data exists in a neo4j database, if so, returning to the step (a3), and if not, storing the data in the step (a6) (mainly judging that the information of the same node is a labels field in the neo4 j);
(a6) constructing a neo4j data storage statement, determining a superior-inferior relation by combining information extracted by an SQL statement, and creating a node;
(a7) and judging whether the query of the SQL statement is finished, if so, exiting the extraction program, and if not, returning to the step (a3) to continue constructing the SQL statement for data query.
The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the semi-structured data is extracted mainly in the following way:
(b1) firstly, opening a website through an Engine module of script, and sending a first crawling request through a Spider module;
(b2) the Engine module obtains a crawling link from the Spider module, and schedules in a scheduling request mode through the Scheduler module;
(b3) the Engine module requests the Scheduler module for the next link to be crawled, and simultaneously, the Engine module sends the task to the Downloader module for downloading;
(b4) after the page is downloaded, the Downloader module feeds the downloaded data back to the Engine module and delivers the downloaded data to the Spider module to analyze and process the crawled data;
(b5) storing the analyzed data into a file according to a specified format;
(b6) after repeating steps (b2) to (b5) until the Scheduler module has no more requests, the Engine module closes, ending the data crawl.
The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the unstructured data is extracted mainly in the following way:
(c1) searching a water affair field triple capable of embodying a preset relation in the established water affair field knowledge graph, and acquiring a training set for extracting the relation of the water affair field after aligning a corpus;
(c2) obtaining the expression of a sentence by using a neural network model, training the model, and obtaining a classifier for extracting the water affair field relation;
(c3) after the model accuracy is verified, named entity recognition is carried out on the new text, a water affair entity in a sentence is obtained, a new training sample is obtained, and the obtained model is used for carrying out relation extraction on the new training sample.
The construction method of the knowledge graph in the water service field is characterized in that in the step 4, the specific method is as follows:
(1) because the letters have capital and small cases and some special characters are added in the name of the database table sometimes, the character strings need to be screened and converted in the early stage, and concept words are screened and converted into lowercase letters by formulating regular expressions;
(2) assuming that for two concepts to be compared, a source string is set as a set a, a target string is set as a set b, and lengths are t1 and t2, respectively, these two are converted into matrices in the form of m [ t1+1, t2+1], and the first row and the first column are set as 0, 1, 2 … t2 and 0, 1, 2 … t 1. Setting the editing cost as cost;
(3) comparing each pair of characters in a (x takes 1 to t1) and b (y takes 1 to t 2);
(4) if a [ x ] is the same as b [ y ], cost is 0; if a [ x ] is different from b [ y ], cost is 1;
(5) each m [ x, y ] is equal to the minimum of:
A. moving m [ x, y ] to a unit cell right above, namely m [ x-1, y ] + 1;
B. moving m [ x, y ] to the positive left by one cell, i.e., m [ x, y-1] + 1;
C. shifting m [ x, y ] one cell to the left and right, respectively, and adding the value of cost, i.e., m [ x-1, y-1] + cost;
(6) iterating the 2 nd, 3 rd and 4 th steps, wherein m [ t1, t2] is the minimum editing distance after the two concept words are converted into the same, and max (t1, t2) is the maximum value of the lengths of the two character strings;
then, the similarity between the two strings a and b is:
Figure BDA0003239152860000071
in the above construction method of the knowledge graph in the water service field, in step 5, the node of the graph database based on the neo4j platform stored in the knowledge storage represents an entity in the network, and the edge represents a relationship, all data of each entity is stored and expanded through < Key, Value >, and the data import aspect uses a Cypher statement inside neo4j for import.
The invention has the beneficial effects that:
(1) the invention is used for storing and intelligently identifying knowledge in the water affair field, can solve the problems of dispersion, fuzziness, non-guidance and the like of the knowledge in the water affair field, and has the service capability of merging, inducing and collating the knowledge and providing self-learning.
(2) The traditional training set for extracting the relation of the water affair entity based on manual labeling needs a large amount of manpower, also needs to have professional knowledge in the water affair field, and almost has no training set for extracting the relation of the water affair field at present. The invention adopts the relation extraction based on the remote supervision method, automatically constructs a relation instance data set which can be used for the relation extraction, trains a relation extraction model by using the constructed data set, and is used for judging the relation of entities in a new sentence.
(3) The method for constructing the knowledge graph in the water affair field can more conveniently and efficiently extract the water affair structured data and the unstructured text data and the relation and connect the water affair object.
Drawings
FIG. 1 is a schematic view of the present invention.
Fig. 2 is a schematic diagram of a calculation flow.
FIG. 3 is a schematic diagram of hierarchical levels of water service objects.
FIG. 4 is a diagram of the framework of the Scapy crawler.
FIG. 5 is a schematic diagram of a remote supervised relationship extraction framework based on outlier detection.
FIG. 6 is a schematic diagram of a domain knowledge map for water utilities.
Detailed description of the preferred embodiments
Example one
The construction method of the knowledge graph in the water service field is characterized by comprising the following steps:
step 1: before top-level knowledge map construction and knowledge extraction are carried out on water affair data, data are verified and noise is removed;
step 2: constructing a water affair domain knowledge graph top-level conceptual model based on a neo4j platform, and taking the conceptual model as a framework of the water affair domain knowledge graph;
and step 3: performing entity extraction and relation extraction from industry standards, various databases, government function department websites, hydrological water environment monitoring websites, public websites, internet of things data, remote sensing images and other heterogeneous data sources;
and 4, step 4: on the basis of data extraction, three groups of data with the same reference are hooked under the same concept, and entity alignment is completed by calculating the similarity between concept entities; the entity ternary group data is a triple comprising an entity-attribute value and an entity-relationship-entity;
and 5: the storage of knowledge is done based on the graph database of the neo4j platform.
Example two
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in the step 1, the following contents are specifically included:
(1) cleaning missing values, abnormal values, repeated values and dirty data in the text data type;
(2) processing data recorded by tables and pictures in the non-text data, and sorting the data into text data by using manual extraction or picture-to-character software;
(3) filtering random errors existing in the data;
(4) the sentences in the text data are organized into usable corpora by taking single sentence phrases as units.
EXAMPLE III
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in the step 2, the following contents are specifically included:
classifying the water affair objects in a grading way, and dividing two subclasses of a geographic position concept and an object facility concept under the water affair field concept;
the domain class contained in the concept of the geographic position is a qualitative result of a geographic area, and the domain class contained in the concept of the object facility is a water affair object which naturally exists or is constructed manually;
for the concept of geographic location, the geographic location area described by the geographic location concept is further divided into descriptive places and functional places according to whether the geographic location area has actual functions;
for the concept of object facilities, natural objects and engineering facilities are further distinguished according to natural existence or artificial construction.
Example four
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that in step 3, the types of data sources are divided into the following three types:
(1) structured data, consisting essentially of: excel tables, relational databases (e.g., Mysql, Oracle, Microsoft Access, etc.), object oriented databases (e.g., Db4o), and the like;
(2) the semi-structured data mainly come from Baidu encyclopedia, government function department websites, hydrologic water environment monitoring websites, public websites, Wikipedia and other websites, and data stored in Xml files;
(3) the unstructured data mainly refers to unstructured text data such as a water administration related unit text, documents, and the internet.
EXAMPLE five
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in step 3, the structured data is extracted mainly in the following manner:
(a1) connecting a database;
(a2) carrying out basic data initialization operation;
(a3) constructing SQL sentences and carrying out data query;
(a4) carrying out data type, structure and attribute conversion;
(a5) judging whether the data exists in a neo4j database, if so, returning to the step (a3), and if not, storing the data in the step (a6) (mainly judging that the information of the same node is a labels field in the neo4 j);
(a6) constructing a neo4j data storage statement, determining a superior-inferior relation by combining information extracted by an SQL statement, and creating a node;
(a7) and judging whether the query of the SQL statement is finished, if so, exiting the extraction program, and if not, returning to the step (a3) to continue constructing the SQL statement for data query.
EXAMPLE six
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in step 3, the semi-structured data is extracted mainly in the following manner:
(b1) firstly, opening a website through an Engine module of script, and sending a first crawling request through a Spider module;
(b2) the Engine module obtains a crawling link from the Spider module, and schedules in a scheduling request mode through the Scheduler module;
(b3) the Engine module requests the Scheduler module for the next link to be crawled, and simultaneously, the Engine module sends the task to the Downloader module for downloading;
(b4) after the page is downloaded, the Downloader module feeds the downloaded data back to the Engine module and delivers the downloaded data to the Spider module to analyze and process the crawled data;
(b5) storing the analyzed data into a file according to a specified format;
(b6) after repeating steps (b2) to (b5) until the Scheduler module has no more requests, the Engine module closes, ending the data crawl.
EXAMPLE seven
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in step 3, the unstructured data is extracted mainly in the following manner:
(c1) searching a water affair field triple capable of embodying a preset relation in the established water affair field knowledge graph, and acquiring a training set for extracting the relation of the water affair field after aligning a corpus;
(c2) obtaining the expression of a sentence by using a neural network model, training the model, and obtaining a classifier for extracting the water affair field relation;
(c3) after the model accuracy is verified, named entity recognition is carried out on the new text, a water affair entity in a sentence is obtained, a new training sample is obtained, and the obtained model is used for carrying out relation extraction on the new training sample.
Example eight
The method for constructing the knowledge graph in the water service field in this embodiment is characterized in that, in the step 4, the specific method is as follows:
(1) because the letters have capital and small cases and some special characters are added in the name of the database table sometimes, the character strings need to be screened and converted in the early stage, and concept words are screened and converted into lowercase letters by formulating regular expressions;
(2) assuming that for two concepts to be compared, a source string is set as a set a, a target string is set as a set b, and lengths are t1 and t2, respectively, these two are converted into matrices in the form of m [ t1+1, t2+1], and the first row and the first column are set as 0, 1, 2 … t2 and 0, 1, 2 … t 1. Setting the editing cost as cost;
(3) comparing each pair of characters in a (x takes 1 to t1) and b (y takes 1 to t 2);
(4) if a [ x ] is the same as b [ y ], cost is 0; if a [ x ] is different from b [ y ], cost is 1;
(5) each m [ x, y ] is equal to the minimum of:
A. moving m [ x, y ] to a unit cell right above, namely m [ x-1, y ] + 1;
B. moving m [ x, y ] to the positive left by one cell, i.e., m [ x, y-1] + 1;
C. shifting m [ x, y ] one cell to the left and right, respectively, and adding the value of cost, i.e., m [ x-1, y-1] + cost;
(6) iterating the 2 nd, 3 rd and 4 th steps, wherein m [ t1, t2] is the minimum editing distance after the two concept words are converted into the same, and max (t1, t2) is the maximum value of the lengths of the two character strings;
then, the similarity between the two strings a and b is:
Figure BDA0003239152860000131
example nine
In the method for constructing a knowledge graph in the water service field according to this embodiment, in step 5, a graph database based on a neo4j platform for storing knowledge is characterized in that stored nodes of the graph database represent entities in a network, edges of the graph database represent relationships, all data of each entity are stored and expanded through < Key and Value >, and data import is conducted by using a Cypher statement inside neo4 j.
Based on the above embodiment, the present invention has the following advantages: (1) the invention is used for storing and intelligently identifying knowledge in the water affair field, can solve the problems of dispersion, fuzziness, non-guidance and the like of the knowledge in the water affair field, and has the service capability of merging, inducing and collating the knowledge and providing self-learning; (2) the traditional training set for extracting the relation of the water affair entity based on manual labeling needs a large amount of manpower, also needs to have professional knowledge in the water affair field, and almost has no training set for extracting the relation of the water affair field at present. The invention adopts the relation extraction based on the remote supervision method, automatically constructs a relation instance data set which can be used for the relation extraction, trains a relation extraction model by using the constructed data set, and is used for judging the relation of entities in a new sentence; (3) the method for constructing the knowledge graph in the water affair field can more conveniently and efficiently extract the water affair structured data and the unstructured text data and the relation and connect the water affair object.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. A construction method of a knowledge graph in the water service field is characterized by comprising the following steps:
step 1: before top-level knowledge map construction and knowledge extraction are carried out on water affair data, data are verified and noise is removed;
step 2: constructing a water affair domain knowledge graph top-level conceptual model based on a neo4j platform, and taking the conceptual model as a framework of the water affair domain knowledge graph;
and step 3: performing entity extraction and relation extraction from industry standards, various databases, government function department websites, hydrological water environment monitoring websites, public websites, internet of things data, remote sensing images and other heterogeneous data sources;
and 4, step 4: on the basis of data extraction, three groups of data with the same reference are hooked under the same concept, and entity alignment is completed by calculating the similarity between concept entities; the entity ternary group data is a triple comprising an entity-attribute value and an entity-relationship-entity;
and 5: the storage of knowledge is done based on the graph database of the neo4j platform.
2. The method for constructing a knowledge graph in the water service field according to claim 1, wherein the step 1 specifically comprises the following steps:
(1) cleaning missing values, abnormal values, repeated values and dirty data in the text data type;
(2) processing data recorded by tables and pictures in the non-text data, and sorting the data into text data by using manual extraction or picture-to-character software;
(3) filtering random errors existing in the data;
(4) the sentences in the text data are organized into usable corpora by taking single sentence phrases as units.
3. The method for constructing a knowledge graph in the water service field according to claim 1, wherein the step 2 specifically comprises the following steps:
classifying the water affair objects in a grading way, and dividing two subclasses of a geographic position concept and an object facility concept under the water affair field concept;
the domain class contained in the concept of the geographic position is a qualitative result of a geographic area, and the domain class contained in the concept of the object facility is a water affair object which naturally exists or is constructed manually;
for the concept of geographic location, the geographic location area described by the geographic location concept is further divided into descriptive places and functional places according to whether the geographic location area has actual functions;
for the concept of object facilities, natural objects and engineering facilities are further distinguished according to natural existence or artificial construction.
4. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the types of the data sources are divided into the following three types:
(1) structuring the data; (2) semi-structured data; (3) unstructured data.
5. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the structured data is extracted mainly by adopting the following method:
(a1) connecting a database;
(a2) carrying out basic data initialization operation;
(a3) constructing SQL sentences and carrying out data query;
(a4) carrying out data type, structure and attribute conversion;
(a5) judging whether the data exists in a neo4j database, if so, returning to the step (a3), and otherwise, storing the data in the step (a 6);
(a6) constructing a neo4j data storage statement, determining a superior-inferior relation by combining information extracted by an SQL statement, and creating a node;
(a7) and judging whether the query of the SQL statement is finished, if so, exiting the extraction program, and if not, returning to the step (a3) to continue constructing the SQL statement for data query.
6. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the semi-structured data is extracted mainly by adopting the following method:
(b1) firstly, opening a website through an Engine module of script, and sending a first crawling request through a Spider module;
(b2) the Engine module obtains a crawling link from the Spider module, and schedules in a scheduling request mode through the Scheduler module;
(b3) the Engine module requests the Scheduler module for the next link to be crawled, and simultaneously, the Engine module sends the task to the Downloader module for downloading;
(b4) after the page is downloaded, the Downloader module feeds the downloaded data back to the Engine module and delivers the downloaded data to the Spider module to analyze and process the crawled data;
(b5) storing the analyzed data into a file according to a specified format;
(b6) after repeating steps (b2) to (b5) until the Scheduler module has no more requests, the Engine module closes, ending the data crawl.
7. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the unstructured data is extracted mainly by the following method:
(c1) searching a water affair field triple capable of embodying a preset relation in the established water affair field knowledge graph, and acquiring a training set for extracting the relation of the water affair field after aligning a corpus;
(c2) obtaining the expression of a sentence by using a neural network model, training the model, and obtaining a classifier for extracting the water affair field relation;
(c3) after the model accuracy is verified, named entity recognition is carried out on the new text, a water affair entity in a sentence is obtained, a new training sample is obtained, and the obtained model is used for carrying out relation extraction on the new training sample.
8. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 4, the specific method is as follows:
(1) because the letters have capital and small cases and some special characters are added in the name of the database table sometimes, the character strings need to be screened and converted in the early stage, and concept words are screened and converted into lowercase letters by formulating regular expressions;
(2) assuming that for two concepts to be compared, a source string is set as a set a, a target string is set as a set b, and lengths are t1 and t2, respectively, these two are converted into matrices in the form of m [ t1+1, t2+1], and the first row and the first column are set as 0, 1, 2 … t2 and 0, 1, 2 … t 1. Setting the editing cost as cost;
(3) comparing each pair of characters in a (x takes 1 to t1) and b (y takes 1 to t 2);
(4) if a [ x ] is the same as b [ y ], cost is 0; if a [ x ] is different from b [ y ], cost is 1;
(5) each m [ x, y ] is equal to the minimum of:
A. moving m [ x, y ] to a unit cell right above, namely m [ x-1, y ] + 1;
B. moving m [ x, y ] to the positive left by one cell, i.e., m [ x, y-1] + 1;
C. shifting m [ x, y ] one cell to the left and right, respectively, and adding the value of cost, i.e., m [ x-1, y-1] + cost;
(6) iterating the 2 nd, 3 rd and 4 th steps, wherein m [ t1, t2] is the minimum editing distance after the two concept words are converted into the same, and max (t1, t2) is the maximum value of the lengths of the two character strings;
then, the similarity between the two strings a and b is:
Figure FDA0003239152850000051
9. the method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 5, the node of the graph database based on the neo4j platform stored in the knowledge storage represents an entity in the network, the edge represents a relationship, all data of each entity is stored and expanded through < Key, Value >, and the data import aspect uses a Cypher statement inside neo4j for import.
CN202111011676.7A 2021-08-31 2021-08-31 Construction method of knowledge graph in water affairs field Pending CN113918725A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111011676.7A CN113918725A (en) 2021-08-31 2021-08-31 Construction method of knowledge graph in water affairs field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111011676.7A CN113918725A (en) 2021-08-31 2021-08-31 Construction method of knowledge graph in water affairs field

Publications (1)

Publication Number Publication Date
CN113918725A true CN113918725A (en) 2022-01-11

Family

ID=79233634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111011676.7A Pending CN113918725A (en) 2021-08-31 2021-08-31 Construction method of knowledge graph in water affairs field

Country Status (1)

Country Link
CN (1) CN113918725A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114386422A (en) * 2022-01-14 2022-04-22 淮安市创新创业科技服务中心 Intelligent aid decision-making method and device based on enterprise pollution public opinion extraction
CN114780742A (en) * 2022-04-19 2022-07-22 中国水利水电科学研究院 Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area
CN116542124A (en) * 2023-03-13 2023-08-04 广东省科学院广州地理研究所 Auxiliary modeling method for distributed hydrologic model
CN117009452A (en) * 2023-07-25 2023-11-07 浪潮智慧科技有限公司 Hydrologic service data acquisition method, equipment and medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114386422A (en) * 2022-01-14 2022-04-22 淮安市创新创业科技服务中心 Intelligent aid decision-making method and device based on enterprise pollution public opinion extraction
CN114386422B (en) * 2022-01-14 2023-09-15 淮安市创新创业科技服务中心 Intelligent auxiliary decision-making method and device based on enterprise pollution public opinion extraction
CN114780742A (en) * 2022-04-19 2022-07-22 中国水利水电科学研究院 Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area
CN116542124A (en) * 2023-03-13 2023-08-04 广东省科学院广州地理研究所 Auxiliary modeling method for distributed hydrologic model
CN116542124B (en) * 2023-03-13 2024-04-09 广东省科学院广州地理研究所 Auxiliary modeling method for distributed hydrologic model
CN117009452A (en) * 2023-07-25 2023-11-07 浪潮智慧科技有限公司 Hydrologic service data acquisition method, equipment and medium

Similar Documents

Publication Publication Date Title
CN111428053B (en) Construction method of tax field-oriented knowledge graph
CN111708773B (en) Multi-source scientific and creative resource data fusion method
CN113918725A (en) Construction method of knowledge graph in water affairs field
CN104933164B (en) In internet mass data name entity between relationship extracting method and its system
Caldarola et al. An approach to ontology integration for ontology reuse
CN110555568B (en) Road traffic running state real-time perception method based on social network information
Zhou et al. Real world city event extraction from Twitter data streams
Schulz et al. Crisis information management in the Web 3.0 age.
Kellou-Menouer et al. Schema discovery in RDF data sources
CN104318340A (en) Information visualization method and intelligent visual analysis system based on text curriculum vitae information
CN111899089A (en) Enterprise risk early warning method and system based on knowledge graph
CN111767725A (en) Data processing method and device based on emotion polarity analysis model
CN115794798B (en) Market supervision informatization standard management and dynamic maintenance system and method
CN115905563A (en) Method and device for constructing ship field supervision knowledge graph and electronic equipment
CN111061679A (en) Method and system for rapid configuration of technological innovation policy based on rete and drools rules
CN115982329A (en) Intelligent generation method and system for engineering construction scheme compilation basis
Leskinen et al. Reconciling and using historical person registers as linked open data in the AcademySampo portal and data service
CN117151659B (en) Ecological restoration engineering full life cycle tracing method based on large language model
CN110889632B (en) Data monitoring and analyzing system of company image lifting system
CN112905746A (en) System archive knowledge mining processing method based on knowledge graph technology
Guermazi et al. Address validation in transportation and logistics: A machine learning based entity matching approach
Katz et al. Data system design alters meaning in ecological data: Salmon habitat restoration across the US Pacific Northwest
CN115204393A (en) Smart city knowledge ontology base construction method and device based on knowledge graph
Xu Research on enterprise knowledge unified retrieval based on industrial big data
Wei et al. Design and construction of geographic knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zuo Xiang

Inventor after: Xiao Fei Tong

Inventor after: Liu Weifeng

Inventor after: Zhao Xingxing

Inventor after: Liu Xiuheng

Inventor before: Xiao Fei Tong

Inventor before: Zuo Xiang

Inventor before: Liu Weifeng

Inventor before: Zhao Xingxing

Inventor before: Liu Xiuheng