CN113918725A - Construction method of knowledge graph in water affairs field - Google Patents
Construction method of knowledge graph in water affairs field Download PDFInfo
- Publication number
- CN113918725A CN113918725A CN202111011676.7A CN202111011676A CN113918725A CN 113918725 A CN113918725 A CN 113918725A CN 202111011676 A CN202111011676 A CN 202111011676A CN 113918725 A CN113918725 A CN 113918725A
- Authority
- CN
- China
- Prior art keywords
- data
- constructing
- water
- concept
- knowledge graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for constructing a river and lake health knowledge map, which comprises the following main steps of: on the basis of analyzing relevant water conservancy industry standards and types of river and lake health related data resources, respectively defining river and lake health metadata types and a knowledge service mode based on catalog classification, determining an ontology set of a river and lake health ontology model, determining attributes, mining and establishing relations between ontologies according to the attributes, and modeling the river and lake health ontology library model; through various means such as topic mining, remote supervision, cause and effect relationship extraction, more entities and association relationships are extracted from massive heterogeneous data resources, and an ontology base model is further supplemented and perfected: comprehensive calculation is carried out by adopting a concept similarity calculation algorithm based on common attributes and a similarity calculation algorithm based on an in-out chain set, so that entity redundancy is reduced, and knowledge fusion is realized; and a self-adaptive updating mechanism is established to realize semi-automatic updating of the river and lake health knowledge map.
Description
Technical Field
The invention belongs to the field of knowledge maps, and particularly relates to a construction method of a knowledge map in the water affairs field.
Background
With the continuous promotion of the urbanization process, the requirements of people on the urban water management are gradually increased. Because the water affair management work relates to a wide range, and the interaction mechanism between water affair objects and elements is complex, the scientific grasp of the water situation and the water environment condition of the urban river network, the comprehensive management of the water supply and drainage pipe network, the effective prediction of waterlogging risks and the reasonable formulation of a water affair scheduling decision scheme are realized by analyzing and mining massive heterogeneous data in the water affair field. However, after years of accumulation, water affair-related departments obtain massive real-time data and basic data through various sensing devices, and generate a large amount of business data and text data in the water affair work circulation process, and various water affair theme data generated on various governments or public websites, wherein the data are scattered and distributed in different systems and platforms. The data are collocated in a certain relation through technical means to form a data semantic network, and the water management work provides decision support and is a problem to be considered at present. For example, mass data is stored by using a distributed storage technology platform, but the platform cannot mine the connection between data, the data relevance and the interoperability are poor, and the sharing capability is insufficient. The knowledge graph can abstract and unify concepts, strengthen the relation between various objects and concepts, and perform system integration and intensive management on a complex data system. By constructing the knowledge graph facing the water affair field, the scientific management of the water affairs can be served, the intelligent water affair construction is supported, and the intelligent level of the water affair work is improved.
Disclosure of Invention
The invention aims to solve the technical problem of the prior art, provides a construction method of a knowledge map in the water service field, aims to establish a communication bridge between water service field data and knowledge, solves the problems of abundant, dispersed, fuzzy and unguided data in the field, provides knowledge support for water service management personnel to make decisions, and can accurately develop force and comprehensively develop aiming at the outstanding problems of water service in different periods, thereby constantly exploring a good strategy suitable for the national conditions of China.
In order to achieve the purpose, the invention specifically adopts the following technical scheme:
a construction method of a knowledge graph in the water service field is characterized by comprising the following steps:
step 1: before top-level knowledge map construction and knowledge extraction are carried out on water affair data, data are verified and noise is removed;
step 2: constructing a water affair domain knowledge graph top-level conceptual model based on a neo4j platform, and taking the conceptual model as a framework of the water affair domain knowledge graph;
and step 3: performing entity extraction and relation extraction from industry standards, various databases, government function department websites, hydrological water environment monitoring websites, public websites, internet of things data, remote sensing images and other heterogeneous data sources;
and 4, step 4: on the basis of data extraction, three groups of data with the same reference are hooked under the same concept, and entity alignment is completed by calculating the similarity between concept entities; the entity ternary group data is a triple comprising an entity-attribute value and an entity-relationship-entity;
and 5: the storage of knowledge is done based on the graph database of the neo4j platform.
The construction method of the knowledge graph in the water service field is characterized in that the step 1 specifically comprises the following steps:
(1) cleaning missing values, abnormal values, repeated values and dirty data in the text data type;
(2) processing data recorded by tables and pictures in the non-text data, and sorting the data into text data by using manual extraction or picture-to-character software;
(3) filtering random errors existing in the data;
(4) the sentences in the text data are organized into usable corpora by taking single sentence phrases as units.
The method for constructing the knowledge graph in the water service field is characterized in that the step 2 specifically comprises the following steps:
classifying the water affair objects in a grading way, and dividing two subclasses of a geographic position concept and an object facility concept under the water affair field concept;
the domain class contained in the concept of the geographic position is a qualitative result of a geographic area, and the domain class contained in the concept of the object facility is a water affair object which naturally exists or is constructed manually;
for the concept of geographic location, the geographic location area described by the geographic location concept is further divided into descriptive places and functional places according to whether the geographic location area has actual functions;
for the concept of object facilities, natural objects and engineering facilities are further distinguished according to natural existence or artificial construction.
The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the types of the data sources are divided into the following three types:
(1) structured data, consisting essentially of: excel tables, relational databases (e.g., Mysql, Oracle, Microsoft Access, etc.), object oriented databases (e.g., Db4o), and the like;
(2) the semi-structured data mainly come from Baidu encyclopedia, government function department websites, hydrologic water environment monitoring websites, public websites, Wikipedia and other websites, and data stored in Xml files;
(3) the unstructured data mainly refers to unstructured text data such as a water administration related unit text, documents, and the internet.
The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the structured data is extracted mainly in the following way:
(a1) connecting a database;
(a2) carrying out basic data initialization operation;
(a3) constructing SQL sentences and carrying out data query;
(a4) carrying out data type, structure and attribute conversion;
(a5) judging whether the data exists in a neo4j database, if so, returning to the step (a3), and if not, storing the data in the step (a6) (mainly judging that the information of the same node is a labels field in the neo4 j);
(a6) constructing a neo4j data storage statement, determining a superior-inferior relation by combining information extracted by an SQL statement, and creating a node;
(a7) and judging whether the query of the SQL statement is finished, if so, exiting the extraction program, and if not, returning to the step (a3) to continue constructing the SQL statement for data query.
The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the semi-structured data is extracted mainly in the following way:
(b1) firstly, opening a website through an Engine module of script, and sending a first crawling request through a Spider module;
(b2) the Engine module obtains a crawling link from the Spider module, and schedules in a scheduling request mode through the Scheduler module;
(b3) the Engine module requests the Scheduler module for the next link to be crawled, and simultaneously, the Engine module sends the task to the Downloader module for downloading;
(b4) after the page is downloaded, the Downloader module feeds the downloaded data back to the Engine module and delivers the downloaded data to the Spider module to analyze and process the crawled data;
(b5) storing the analyzed data into a file according to a specified format;
(b6) after repeating steps (b2) to (b5) until the Scheduler module has no more requests, the Engine module closes, ending the data crawl.
The method for constructing the knowledge graph in the water service field is characterized in that in the step 3, the unstructured data is extracted mainly in the following way:
(c1) searching a water affair field triple capable of embodying a preset relation in the established water affair field knowledge graph, and acquiring a training set for extracting the relation of the water affair field after aligning a corpus;
(c2) obtaining the expression of a sentence by using a neural network model, training the model, and obtaining a classifier for extracting the water affair field relation;
(c3) after the model accuracy is verified, named entity recognition is carried out on the new text, a water affair entity in a sentence is obtained, a new training sample is obtained, and the obtained model is used for carrying out relation extraction on the new training sample.
The construction method of the knowledge graph in the water service field is characterized in that in the step 4, the specific method is as follows:
(1) because the letters have capital and small cases and some special characters are added in the name of the database table sometimes, the character strings need to be screened and converted in the early stage, and concept words are screened and converted into lowercase letters by formulating regular expressions;
(2) assuming that for two concepts to be compared, a source string is set as a set a, a target string is set as a set b, and lengths are t1 and t2, respectively, these two are converted into matrices in the form of m [ t1+1, t2+1], and the first row and the first column are set as 0, 1, 2 … t2 and 0, 1, 2 … t 1. Setting the editing cost as cost;
(3) comparing each pair of characters in a (x takes 1 to t1) and b (y takes 1 to t 2);
(4) if a [ x ] is the same as b [ y ], cost is 0; if a [ x ] is different from b [ y ], cost is 1;
(5) each m [ x, y ] is equal to the minimum of:
A. moving m [ x, y ] to a unit cell right above, namely m [ x-1, y ] + 1;
B. moving m [ x, y ] to the positive left by one cell, i.e., m [ x, y-1] + 1;
C. shifting m [ x, y ] one cell to the left and right, respectively, and adding the value of cost, i.e., m [ x-1, y-1] + cost;
(6) iterating the 2 nd, 3 rd and 4 th steps, wherein m [ t1, t2] is the minimum editing distance after the two concept words are converted into the same, and max (t1, t2) is the maximum value of the lengths of the two character strings;
then, the similarity between the two strings a and b is:
in the above construction method of the knowledge graph in the water service field, in step 5, the node of the graph database based on the neo4j platform stored in the knowledge storage represents an entity in the network, and the edge represents a relationship, all data of each entity is stored and expanded through < Key, Value >, and the data import aspect uses a Cypher statement inside neo4j for import.
The invention has the beneficial effects that:
(1) the invention is used for storing and intelligently identifying knowledge in the water affair field, can solve the problems of dispersion, fuzziness, non-guidance and the like of the knowledge in the water affair field, and has the service capability of merging, inducing and collating the knowledge and providing self-learning.
(2) The traditional training set for extracting the relation of the water affair entity based on manual labeling needs a large amount of manpower, also needs to have professional knowledge in the water affair field, and almost has no training set for extracting the relation of the water affair field at present. The invention adopts the relation extraction based on the remote supervision method, automatically constructs a relation instance data set which can be used for the relation extraction, trains a relation extraction model by using the constructed data set, and is used for judging the relation of entities in a new sentence.
(3) The method for constructing the knowledge graph in the water affair field can more conveniently and efficiently extract the water affair structured data and the unstructured text data and the relation and connect the water affair object.
Drawings
FIG. 1 is a schematic view of the present invention.
Fig. 2 is a schematic diagram of a calculation flow.
FIG. 3 is a schematic diagram of hierarchical levels of water service objects.
FIG. 4 is a diagram of the framework of the Scapy crawler.
FIG. 5 is a schematic diagram of a remote supervised relationship extraction framework based on outlier detection.
FIG. 6 is a schematic diagram of a domain knowledge map for water utilities.
Detailed description of the preferred embodiments
Example one
The construction method of the knowledge graph in the water service field is characterized by comprising the following steps:
step 1: before top-level knowledge map construction and knowledge extraction are carried out on water affair data, data are verified and noise is removed;
step 2: constructing a water affair domain knowledge graph top-level conceptual model based on a neo4j platform, and taking the conceptual model as a framework of the water affair domain knowledge graph;
and step 3: performing entity extraction and relation extraction from industry standards, various databases, government function department websites, hydrological water environment monitoring websites, public websites, internet of things data, remote sensing images and other heterogeneous data sources;
and 4, step 4: on the basis of data extraction, three groups of data with the same reference are hooked under the same concept, and entity alignment is completed by calculating the similarity between concept entities; the entity ternary group data is a triple comprising an entity-attribute value and an entity-relationship-entity;
and 5: the storage of knowledge is done based on the graph database of the neo4j platform.
Example two
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in the step 1, the following contents are specifically included:
(1) cleaning missing values, abnormal values, repeated values and dirty data in the text data type;
(2) processing data recorded by tables and pictures in the non-text data, and sorting the data into text data by using manual extraction or picture-to-character software;
(3) filtering random errors existing in the data;
(4) the sentences in the text data are organized into usable corpora by taking single sentence phrases as units.
EXAMPLE III
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in the step 2, the following contents are specifically included:
classifying the water affair objects in a grading way, and dividing two subclasses of a geographic position concept and an object facility concept under the water affair field concept;
the domain class contained in the concept of the geographic position is a qualitative result of a geographic area, and the domain class contained in the concept of the object facility is a water affair object which naturally exists or is constructed manually;
for the concept of geographic location, the geographic location area described by the geographic location concept is further divided into descriptive places and functional places according to whether the geographic location area has actual functions;
for the concept of object facilities, natural objects and engineering facilities are further distinguished according to natural existence or artificial construction.
Example four
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that in step 3, the types of data sources are divided into the following three types:
(1) structured data, consisting essentially of: excel tables, relational databases (e.g., Mysql, Oracle, Microsoft Access, etc.), object oriented databases (e.g., Db4o), and the like;
(2) the semi-structured data mainly come from Baidu encyclopedia, government function department websites, hydrologic water environment monitoring websites, public websites, Wikipedia and other websites, and data stored in Xml files;
(3) the unstructured data mainly refers to unstructured text data such as a water administration related unit text, documents, and the internet.
EXAMPLE five
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in step 3, the structured data is extracted mainly in the following manner:
(a1) connecting a database;
(a2) carrying out basic data initialization operation;
(a3) constructing SQL sentences and carrying out data query;
(a4) carrying out data type, structure and attribute conversion;
(a5) judging whether the data exists in a neo4j database, if so, returning to the step (a3), and if not, storing the data in the step (a6) (mainly judging that the information of the same node is a labels field in the neo4 j);
(a6) constructing a neo4j data storage statement, determining a superior-inferior relation by combining information extracted by an SQL statement, and creating a node;
(a7) and judging whether the query of the SQL statement is finished, if so, exiting the extraction program, and if not, returning to the step (a3) to continue constructing the SQL statement for data query.
EXAMPLE six
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in step 3, the semi-structured data is extracted mainly in the following manner:
(b1) firstly, opening a website through an Engine module of script, and sending a first crawling request through a Spider module;
(b2) the Engine module obtains a crawling link from the Spider module, and schedules in a scheduling request mode through the Scheduler module;
(b3) the Engine module requests the Scheduler module for the next link to be crawled, and simultaneously, the Engine module sends the task to the Downloader module for downloading;
(b4) after the page is downloaded, the Downloader module feeds the downloaded data back to the Engine module and delivers the downloaded data to the Spider module to analyze and process the crawled data;
(b5) storing the analyzed data into a file according to a specified format;
(b6) after repeating steps (b2) to (b5) until the Scheduler module has no more requests, the Engine module closes, ending the data crawl.
EXAMPLE seven
The method for constructing a knowledge graph in the water service field in this embodiment is characterized in that, in step 3, the unstructured data is extracted mainly in the following manner:
(c1) searching a water affair field triple capable of embodying a preset relation in the established water affair field knowledge graph, and acquiring a training set for extracting the relation of the water affair field after aligning a corpus;
(c2) obtaining the expression of a sentence by using a neural network model, training the model, and obtaining a classifier for extracting the water affair field relation;
(c3) after the model accuracy is verified, named entity recognition is carried out on the new text, a water affair entity in a sentence is obtained, a new training sample is obtained, and the obtained model is used for carrying out relation extraction on the new training sample.
Example eight
The method for constructing the knowledge graph in the water service field in this embodiment is characterized in that, in the step 4, the specific method is as follows:
(1) because the letters have capital and small cases and some special characters are added in the name of the database table sometimes, the character strings need to be screened and converted in the early stage, and concept words are screened and converted into lowercase letters by formulating regular expressions;
(2) assuming that for two concepts to be compared, a source string is set as a set a, a target string is set as a set b, and lengths are t1 and t2, respectively, these two are converted into matrices in the form of m [ t1+1, t2+1], and the first row and the first column are set as 0, 1, 2 … t2 and 0, 1, 2 … t 1. Setting the editing cost as cost;
(3) comparing each pair of characters in a (x takes 1 to t1) and b (y takes 1 to t 2);
(4) if a [ x ] is the same as b [ y ], cost is 0; if a [ x ] is different from b [ y ], cost is 1;
(5) each m [ x, y ] is equal to the minimum of:
A. moving m [ x, y ] to a unit cell right above, namely m [ x-1, y ] + 1;
B. moving m [ x, y ] to the positive left by one cell, i.e., m [ x, y-1] + 1;
C. shifting m [ x, y ] one cell to the left and right, respectively, and adding the value of cost, i.e., m [ x-1, y-1] + cost;
(6) iterating the 2 nd, 3 rd and 4 th steps, wherein m [ t1, t2] is the minimum editing distance after the two concept words are converted into the same, and max (t1, t2) is the maximum value of the lengths of the two character strings;
then, the similarity between the two strings a and b is:
example nine
In the method for constructing a knowledge graph in the water service field according to this embodiment, in step 5, a graph database based on a neo4j platform for storing knowledge is characterized in that stored nodes of the graph database represent entities in a network, edges of the graph database represent relationships, all data of each entity are stored and expanded through < Key and Value >, and data import is conducted by using a Cypher statement inside neo4 j.
Based on the above embodiment, the present invention has the following advantages: (1) the invention is used for storing and intelligently identifying knowledge in the water affair field, can solve the problems of dispersion, fuzziness, non-guidance and the like of the knowledge in the water affair field, and has the service capability of merging, inducing and collating the knowledge and providing self-learning; (2) the traditional training set for extracting the relation of the water affair entity based on manual labeling needs a large amount of manpower, also needs to have professional knowledge in the water affair field, and almost has no training set for extracting the relation of the water affair field at present. The invention adopts the relation extraction based on the remote supervision method, automatically constructs a relation instance data set which can be used for the relation extraction, trains a relation extraction model by using the constructed data set, and is used for judging the relation of entities in a new sentence; (3) the method for constructing the knowledge graph in the water affair field can more conveniently and efficiently extract the water affair structured data and the unstructured text data and the relation and connect the water affair object.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (9)
1. A construction method of a knowledge graph in the water service field is characterized by comprising the following steps:
step 1: before top-level knowledge map construction and knowledge extraction are carried out on water affair data, data are verified and noise is removed;
step 2: constructing a water affair domain knowledge graph top-level conceptual model based on a neo4j platform, and taking the conceptual model as a framework of the water affair domain knowledge graph;
and step 3: performing entity extraction and relation extraction from industry standards, various databases, government function department websites, hydrological water environment monitoring websites, public websites, internet of things data, remote sensing images and other heterogeneous data sources;
and 4, step 4: on the basis of data extraction, three groups of data with the same reference are hooked under the same concept, and entity alignment is completed by calculating the similarity between concept entities; the entity ternary group data is a triple comprising an entity-attribute value and an entity-relationship-entity;
and 5: the storage of knowledge is done based on the graph database of the neo4j platform.
2. The method for constructing a knowledge graph in the water service field according to claim 1, wherein the step 1 specifically comprises the following steps:
(1) cleaning missing values, abnormal values, repeated values and dirty data in the text data type;
(2) processing data recorded by tables and pictures in the non-text data, and sorting the data into text data by using manual extraction or picture-to-character software;
(3) filtering random errors existing in the data;
(4) the sentences in the text data are organized into usable corpora by taking single sentence phrases as units.
3. The method for constructing a knowledge graph in the water service field according to claim 1, wherein the step 2 specifically comprises the following steps:
classifying the water affair objects in a grading way, and dividing two subclasses of a geographic position concept and an object facility concept under the water affair field concept;
the domain class contained in the concept of the geographic position is a qualitative result of a geographic area, and the domain class contained in the concept of the object facility is a water affair object which naturally exists or is constructed manually;
for the concept of geographic location, the geographic location area described by the geographic location concept is further divided into descriptive places and functional places according to whether the geographic location area has actual functions;
for the concept of object facilities, natural objects and engineering facilities are further distinguished according to natural existence or artificial construction.
4. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the types of the data sources are divided into the following three types:
(1) structuring the data; (2) semi-structured data; (3) unstructured data.
5. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the structured data is extracted mainly by adopting the following method:
(a1) connecting a database;
(a2) carrying out basic data initialization operation;
(a3) constructing SQL sentences and carrying out data query;
(a4) carrying out data type, structure and attribute conversion;
(a5) judging whether the data exists in a neo4j database, if so, returning to the step (a3), and otherwise, storing the data in the step (a 6);
(a6) constructing a neo4j data storage statement, determining a superior-inferior relation by combining information extracted by an SQL statement, and creating a node;
(a7) and judging whether the query of the SQL statement is finished, if so, exiting the extraction program, and if not, returning to the step (a3) to continue constructing the SQL statement for data query.
6. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the semi-structured data is extracted mainly by adopting the following method:
(b1) firstly, opening a website through an Engine module of script, and sending a first crawling request through a Spider module;
(b2) the Engine module obtains a crawling link from the Spider module, and schedules in a scheduling request mode through the Scheduler module;
(b3) the Engine module requests the Scheduler module for the next link to be crawled, and simultaneously, the Engine module sends the task to the Downloader module for downloading;
(b4) after the page is downloaded, the Downloader module feeds the downloaded data back to the Engine module and delivers the downloaded data to the Spider module to analyze and process the crawled data;
(b5) storing the analyzed data into a file according to a specified format;
(b6) after repeating steps (b2) to (b5) until the Scheduler module has no more requests, the Engine module closes, ending the data crawl.
7. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 3, the unstructured data is extracted mainly by the following method:
(c1) searching a water affair field triple capable of embodying a preset relation in the established water affair field knowledge graph, and acquiring a training set for extracting the relation of the water affair field after aligning a corpus;
(c2) obtaining the expression of a sentence by using a neural network model, training the model, and obtaining a classifier for extracting the water affair field relation;
(c3) after the model accuracy is verified, named entity recognition is carried out on the new text, a water affair entity in a sentence is obtained, a new training sample is obtained, and the obtained model is used for carrying out relation extraction on the new training sample.
8. The method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 4, the specific method is as follows:
(1) because the letters have capital and small cases and some special characters are added in the name of the database table sometimes, the character strings need to be screened and converted in the early stage, and concept words are screened and converted into lowercase letters by formulating regular expressions;
(2) assuming that for two concepts to be compared, a source string is set as a set a, a target string is set as a set b, and lengths are t1 and t2, respectively, these two are converted into matrices in the form of m [ t1+1, t2+1], and the first row and the first column are set as 0, 1, 2 … t2 and 0, 1, 2 … t 1. Setting the editing cost as cost;
(3) comparing each pair of characters in a (x takes 1 to t1) and b (y takes 1 to t 2);
(4) if a [ x ] is the same as b [ y ], cost is 0; if a [ x ] is different from b [ y ], cost is 1;
(5) each m [ x, y ] is equal to the minimum of:
A. moving m [ x, y ] to a unit cell right above, namely m [ x-1, y ] + 1;
B. moving m [ x, y ] to the positive left by one cell, i.e., m [ x, y-1] + 1;
C. shifting m [ x, y ] one cell to the left and right, respectively, and adding the value of cost, i.e., m [ x-1, y-1] + cost;
(6) iterating the 2 nd, 3 rd and 4 th steps, wherein m [ t1, t2] is the minimum editing distance after the two concept words are converted into the same, and max (t1, t2) is the maximum value of the lengths of the two character strings;
then, the similarity between the two strings a and b is:
9. the method for constructing a knowledge graph in the water service field according to claim 1, wherein in the step 5, the node of the graph database based on the neo4j platform stored in the knowledge storage represents an entity in the network, the edge represents a relationship, all data of each entity is stored and expanded through < Key, Value >, and the data import aspect uses a Cypher statement inside neo4j for import.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111011676.7A CN113918725A (en) | 2021-08-31 | 2021-08-31 | Construction method of knowledge graph in water affairs field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111011676.7A CN113918725A (en) | 2021-08-31 | 2021-08-31 | Construction method of knowledge graph in water affairs field |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113918725A true CN113918725A (en) | 2022-01-11 |
Family
ID=79233634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111011676.7A Pending CN113918725A (en) | 2021-08-31 | 2021-08-31 | Construction method of knowledge graph in water affairs field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113918725A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114386422A (en) * | 2022-01-14 | 2022-04-22 | 淮安市创新创业科技服务中心 | Intelligent aid decision-making method and device based on enterprise pollution public opinion extraction |
CN114780742A (en) * | 2022-04-19 | 2022-07-22 | 中国水利水电科学研究院 | Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area |
CN116542124A (en) * | 2023-03-13 | 2023-08-04 | 广东省科学院广州地理研究所 | Auxiliary modeling method for distributed hydrologic model |
CN117009452A (en) * | 2023-07-25 | 2023-11-07 | 浪潮智慧科技有限公司 | Hydrologic service data acquisition method, equipment and medium |
-
2021
- 2021-08-31 CN CN202111011676.7A patent/CN113918725A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114386422A (en) * | 2022-01-14 | 2022-04-22 | 淮安市创新创业科技服务中心 | Intelligent aid decision-making method and device based on enterprise pollution public opinion extraction |
CN114386422B (en) * | 2022-01-14 | 2023-09-15 | 淮安市创新创业科技服务中心 | Intelligent auxiliary decision-making method and device based on enterprise pollution public opinion extraction |
CN114780742A (en) * | 2022-04-19 | 2022-07-22 | 中国水利水电科学研究院 | Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area |
CN116542124A (en) * | 2023-03-13 | 2023-08-04 | 广东省科学院广州地理研究所 | Auxiliary modeling method for distributed hydrologic model |
CN116542124B (en) * | 2023-03-13 | 2024-04-09 | 广东省科学院广州地理研究所 | Auxiliary modeling method for distributed hydrologic model |
CN117009452A (en) * | 2023-07-25 | 2023-11-07 | 浪潮智慧科技有限公司 | Hydrologic service data acquisition method, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111428053B (en) | Construction method of tax field-oriented knowledge graph | |
CN111708773B (en) | Multi-source scientific and creative resource data fusion method | |
CN113918725A (en) | Construction method of knowledge graph in water affairs field | |
CN104933164B (en) | In internet mass data name entity between relationship extracting method and its system | |
Caldarola et al. | An approach to ontology integration for ontology reuse | |
CN110555568B (en) | Road traffic running state real-time perception method based on social network information | |
Zhou et al. | Real world city event extraction from Twitter data streams | |
Schulz et al. | Crisis information management in the Web 3.0 age. | |
Kellou-Menouer et al. | Schema discovery in RDF data sources | |
CN104318340A (en) | Information visualization method and intelligent visual analysis system based on text curriculum vitae information | |
CN111899089A (en) | Enterprise risk early warning method and system based on knowledge graph | |
CN111767725A (en) | Data processing method and device based on emotion polarity analysis model | |
CN115794798B (en) | Market supervision informatization standard management and dynamic maintenance system and method | |
CN115905563A (en) | Method and device for constructing ship field supervision knowledge graph and electronic equipment | |
CN111061679A (en) | Method and system for rapid configuration of technological innovation policy based on rete and drools rules | |
CN115982329A (en) | Intelligent generation method and system for engineering construction scheme compilation basis | |
Leskinen et al. | Reconciling and using historical person registers as linked open data in the AcademySampo portal and data service | |
CN117151659B (en) | Ecological restoration engineering full life cycle tracing method based on large language model | |
CN110889632B (en) | Data monitoring and analyzing system of company image lifting system | |
CN112905746A (en) | System archive knowledge mining processing method based on knowledge graph technology | |
Guermazi et al. | Address validation in transportation and logistics: A machine learning based entity matching approach | |
Katz et al. | Data system design alters meaning in ecological data: Salmon habitat restoration across the US Pacific Northwest | |
CN115204393A (en) | Smart city knowledge ontology base construction method and device based on knowledge graph | |
Xu | Research on enterprise knowledge unified retrieval based on industrial big data | |
Wei et al. | Design and construction of geographic knowledge graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Zuo Xiang Inventor after: Xiao Fei Tong Inventor after: Liu Weifeng Inventor after: Zhao Xingxing Inventor after: Liu Xiuheng Inventor before: Xiao Fei Tong Inventor before: Zuo Xiang Inventor before: Liu Weifeng Inventor before: Zhao Xingxing Inventor before: Liu Xiuheng |