CN110347843B - Knowledge map-based Chinese tourism field knowledge service platform construction method - Google Patents
Knowledge map-based Chinese tourism field knowledge service platform construction method Download PDFInfo
- Publication number
- CN110347843B CN110347843B CN201910621399.8A CN201910621399A CN110347843B CN 110347843 B CN110347843 B CN 110347843B CN 201910621399 A CN201910621399 A CN 201910621399A CN 110347843 B CN110347843 B CN 110347843B
- Authority
- CN
- China
- Prior art keywords
- knowledge
- entity
- travel
- attribute
- tourism
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title claims description 27
- 238000000034 method Methods 0.000 claims abstract description 40
- 230000004927 fusion Effects 0.000 claims abstract description 29
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 14
- 238000013135 deep learning Methods 0.000 claims abstract description 9
- 230000009193 crawling Effects 0.000 claims abstract description 8
- 238000007619 statistical method Methods 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 18
- 238000013507 mapping Methods 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 7
- 230000002452 interceptive effect Effects 0.000 claims description 6
- 230000014509 gene expression Effects 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000005065 mining Methods 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 230000008676 import Effects 0.000 claims description 2
- 238000012706 support-vector machine Methods 0.000 claims description 2
- 238000012800 visualization Methods 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 2
- 239000002585 base Substances 0.000 description 21
- 238000010586 diagram Methods 0.000 description 3
- 241000209094 Oryza Species 0.000 description 2
- 235000007164 Oryza sativa Nutrition 0.000 description 2
- 230000004308 accommodation Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000037213 diet Effects 0.000 description 2
- 235000005911 diet Nutrition 0.000 description 2
- 235000013305 food Nutrition 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 235000009566 rice Nutrition 0.000 description 2
- 238000013550 semantic technology Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 239000012458 free base Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/14—Travel agencies
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Tourism & Hospitality (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Economics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
A method for constructing a Chinese tourism field knowledge service platform based on an intellectual map comprises the steps of obtaining structured tourism knowledge from an existing Chinese encyclopedia knowledge base, performing knowledge fusion, crawling tourism website page data, performing knowledge completion on entity Infobox attributes through custom attribute matching rules, constructing a tourism field body by adopting a Stanford body modeling tool Prot gee, converting the data into an RDF triple format by utilizing D2RQ combined with the constructed tourism body to obtain a tourism field knowledge map, and performing a Neo4j map database storage task of the tourism knowledge base, wherein the knowledge fusion task comprises the steps of completing entity alignment by using semantic similarity between improved deep learning knowledge representation model BERT calculation entities, performing attribute fusion based on a principle and a statistical method, and performing a triple fusion subtask by adopting a majority voting algorithm. The invention is convenient for tourists to obtain one-stop comprehensive service.
Description
Technical Field
The invention belongs to the field of computer information processing, and particularly relates to a Chinese tourism field knowledge service platform construction method based on a knowledge map.
Background
The knowledge graph describes concepts, entities and relations in an objective world in a structured mode, expresses information of the internet into a mode closer to the human cognitive world, and provides the capability of better organizing, managing and understanding mass information of the internet.
The ontology is a knowledge representation base of the knowledge graph and can be represented in a formalized mode as O ═ C, H, P, a, I }, C is a concept set like transactional concepts and event class concepts, H is a context relationship set of the concepts, also called Taxonomy knowledge, P is an attribute set and describes the features of the concepts, a is a rule set and describes domain rules, and I is an instance set and is used for describing entity-attribute-values. With the development of representation learning represented by deep learning, significant progress has been made in representation learning for entities and relationships in knowledge maps. The knowledge representation learning represents the entities and the relations as dense low-dimensional vectors, distributed representation of the entities and the relations is achieved, the entities and the relations can be efficiently calculated, knowledge sparseness is relieved, knowledge fusion is facilitated, and the knowledge representation learning becomes an important method for knowledge graph knowledge fusion and knowledge completion. Knowledge maps are divided into general knowledge maps and domain knowledge maps, wherein the general knowledge maps comprise WordNet describing semantic relations of English vocabulary layers, DBPedia organizing knowledge items in a form of a constructed body, YAGO fusing a concept hierarchy of the WordNet and a large amount of entity data of Wikipedia, Freebase and the like established by using a group intelligent method, Chinese general knowledge map research can be traced back to HowNet projects established by adopting a manual editing mode, the industry comprises OpenKG.CN, hundredth, dog searching and learning cube and the like, and the academic circles comprise large-scale knowledge maps, Zhishi.me and CN-DBpedia established by utilizing the hundredth encyclopedia, the interactive encyclopedia and the Chinese Wikipedia. Google releases knowledge map projects in 2012 and 5 months, and constructs a next-generation intelligent search engine on the basis of the knowledge map projects, which marks the successful application of large-scale knowledge in Internet semantic search.
Compared with a general knowledge graph, the domain knowledge graph has relatively fewer construction researches, is called an industry knowledge graph or a vertical knowledge graph, is oriented to a specific field, can be regarded as an industry knowledge base based on semantic technology, and has higher requirements on the depth and the knowledge accuracy of the knowledge in the field because of a strict and rich data mode based on the construction of industry data. The English-Chinese-English museum semantically organizes various data resources collected in the museum by combining a semantic technology, and provides knowledge service by semantic refining, multimedia resource labeling and other modes; the british broadcasting company BBC [ Kobilarov et al,2009] defines an ontology in its blocks of music, sports wild animals, etc., and converts news into a machine-readable information source for content management and report automatic generation. The domestic field knowledge graph technology utilizes a Shanghai atlas library to build a knowledge system for the family tree, celebrities, manuscripts and other resources by using a U.S. Country number frame BibFrame [ Kroeger et al,2013], and a family tree service platform is created to provide ancient book evidence-based service for researchers; chinese academy of agricultural sciences focuses on the field of rice subdivision, integrates industrial resources such as papers, patents, news and the like, constructs a rice knowledge map, and provides an industrial professional knowledge service platform for scientific researchers.
The information construction of the tourism industry in China has a history of more than 30 years, but a Chinese knowledge map specially aiming at the tourism field is very lacking, and the development and inheritance of the tourism culture in China are seriously hindered. And the prior knowledge graph in the Chinese field has different data modes oriented to different fields, different application requirements, no set of general standards and specifications for guiding construction and the like.
In summary, it is urgently needed to construct a Chinese tourism knowledge service platform based on a knowledge map to organize, manage and utilize mass tourism knowledge data such as food, lodging, traveling, touring, purchasing, entertaining and the like, so that tourists can conveniently obtain one-stop comprehensive service, tourism culture can be better spread, and finally the tourism industry is led to the tourism knowledge service from the tourism information service.
Disclosure of Invention
The invention aims to provide a Chinese tourism field knowledge service platform construction method based on an intellectual graph, which aims to solve the problems in the prior art, and comprises the steps of obtaining structured tourism knowledge, knowledge fusion and crawling tourism website page data from an existing Chinese encyclopedia knowledge base, performing knowledge completion on entity Infobox attributes through custom attribute matching rules, constructing a tourism field ontology by adopting a Stanford ontology modeling tool Prot g, converting the data into an RDF (remote description format) format by utilizing D2RQ combined with the constructed tourism ontology to obtain a tourism field knowledge graph and a Neo4j graph database storage task of the tourism field knowledge base.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
s1, knowledge acquisition: acquiring structured travel knowledge from an existing Chinese encyclopedia knowledge base;
s2, knowledge fusion: firstly, calculating semantic similarity between entities by using a deep learning knowledge representation model BERT to complete entity alignment, then performing attribute fusion based on a principle and a statistical method, and finally performing triple fusion by adopting a majority voting algorithm;
s3, crawling the data of the tourism website page, and performing knowledge completion on the attribute of the entity Infobox through an attribute matching rule;
s4, constructing an ontology: adopting a Stanford ontology modeling tool Prot g e to construct a tourism domain ontology;
s5, converting the data into an RDF triple format by combining the D2RQ with the tourist field ontology to obtain a tourist field knowledge map;
s6, data storage: storing the travel domain knowledge map into a Neo4j map database;
and S7, constructing a travel knowledge service platform.
The step S1 is specifically completed by the following processes: the method comprises the steps of obtaining entity structured knowledge from the classification of an existing Chinese encyclopedia knowledge base, wherein the Chinese encyclopedia knowledge base comprises Zhishi.me and CN-DBpedia, the classification comprises tourism, sightseeing and playing, the entity structured knowledge comprises scenic spots, historic sites, cities, characters and cultural relics, and triple data in the structured knowledge comprises entity names, entity introduction profiles, entity Infobox attributes and entity pictures;
attributes that ultimately define the tourist domain entities include chinese name, open time, foreign language name, ticket price, geographic location, year, literary insurance level, suggested play duration, suitable play season, city of interest, value, name, birth time, time of death, nationality, alias, achievement, work, year, nationality, and native place.
The specific implementation of the three steps in step S2 is as follows:
1) the step of calculating the semantic similarity between the entities by using the deep learning knowledge representation model BERT to complete the entity alignment comprises the following steps: firstly, using a BERT Chinese language model issued by Google, and setting parameters at a fine-tuning stage of the model to obtain a penultimate layer of an output layer to obtain an entity word vector; then, calculating cosine distances between different entities according to the obtained entity word vectors, namely semantic similarity; finally, the aim of entity alignment is achieved according to the semantic similarity by setting a threshold;
2) two methods can be selected for attribute fusion based on principles and statistical methods, wherein one method is to acquire the attribute of the tourism entity Infobox from the existing Chinese encyclopedia knowledge base, and finally determine the attribute content of the entity Infobox by compiling rules by using Python language and counting different name expressions of the same attribute in different knowledge bases; the other method is that the entity and the attribute are regarded as a triple relation and classified as a relation extraction problem, and attribute fusion is carried out through a support vector machine and a text mining algorithm;
3) when the majority voting algorithm is adopted to perform triple fusion, after entity alignment and attribute fusion, the triple fusion is performed on the data containing the same entity and attribute in the entity triple, and a unique attribute value is determined for each attribute through the majority voting algorithm.
The step S3 is completed by the following steps: crawling a travel website page and data of encyclopedia, interactive encyclopedia and Chinese Wikipedia, and performing knowledge completion on the part of the entity with missing attribute knowledge through an attribute matching rule.
The step S4 is completed by the following steps: and summarizing entities, attributes and relations in the data of the tourist field, determining a hierarchical structure of related concepts and categories of the tourist field, defining the attributes and value range of the entities, modeling and summarizing the model of the tourist map schema according to the knowledge, and constructing and finishing the ontology of the tourist field by using a top-down ontology construction method and combining with an ontology construction method of Stanford university and using an ontology modeling tool Prot eg.
The step S5 is completed by the following steps: according to the R2RML standard formulated by the RDB2RDF working group of W3C, mapping the data in the database to a self-defined travel domain body by editing and setting mapping rules, and converting the travel data in the relational database into data in an RDF format by using a D2RQ tool to obtain a travel domain knowledge map.
Converting data into an RDF triple format by using the D2RQ combined with the constructed travel domain ontology to obtain a travel domain knowledge map, and realizing the following processes: firstly, storing the acquired structured travel knowledge in a triple form into a relational database by designing a corresponding database table structure; secondly, using a D2RQ tool to run commands to generate default mapping files, modifying the mapping files according to the defined travel ontology, and mapping the database tables to the corresponding classes of the constructed travel domain ontology; finally, using the D2RQ tool, the command is run to convert the data into RDF format to obtain the travel domain knowledge map.
The step S6 is completed by the following steps: the process of storing the travel domain knowledge map into the Neo4j map database is completed by downloading the RDF import Neo4j map database extension jar package, modifying the Neo4j configuration file and creating the namespace prefix, and importing the travel domain knowledge map in the RDF format into the Neo4j map database by using a command line.
And S7, on the basis of the completion of the storage of the knowledge map in the tourism field, the background uses Java programming language and SpringMVC architecture, and the foreground uses JSP dynamic webpage technology and a D3.js data-driven visual component to build a tourism knowledge service platform.
Compared with the prior art, the method has the advantages that the semantic similarity between the entities is calculated by using the improved deep learning knowledge representation model BERT, so that the entity alignment is completed, the comparison test is carried out with other knowledge representation models, and the entity alignment accuracy is highest. The method comprises the steps of summarizing and summarizing entities (concepts), attributes and relations in data of the tourism field, determining a hierarchical structure of related concepts and categories of the tourism field, defining entity attributes and value range, modeling and summarizing a tourist map schema model according to the knowledge, defining a new category relation by combining with characteristics of the tourism industry, constructing a seven-step method by combining a top-down ontology construction method with an ontology of Stanford university, and completing the construction of a tourism field ontology by using an ontology modeling tool Prot eg. The invention adopts a database storage scheme of a travel field knowledge base RDF triple Neo4j, and can fully utilize the support of more perfect graph query language and algorithm provided by a native knowledge storage medium (Neo4 j). The tourism knowledge service platform constructed based on the method can combine graph mining calculation and knowledge reasoning to enable the tourism industry to go from information service to knowledge service.
Drawings
FIG. 1 is a schematic flow diagram of a construction method of the present invention;
FIG. 2 is a schematic diagram illustrating a process for implementing entity alignment in a knowledge fusion phase according to the present invention;
FIG. 3 is a schematic diagram of a schema model for knowledge modeling of a travel knowledge map.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Referring to fig. 1, the method for constructing the knowledge service platform in the Chinese tourism field based on the knowledge map comprises the following steps:
s1: acquiring knowledge: acquiring structured travel knowledge from an existing Chinese encyclopedia knowledge base;
the method comprises the following steps of obtaining entity structured knowledge of scenic spots, historic sites, cities, characters, cultural relics and the like from the categories of 'tourism', 'sightseeing', 'playing' and the like of the existing Chinese encyclopedia knowledge base Zhishi.me, CN-DBpedia (the official network of the knowledge base freely provides RDF triple format data download, and the data comprise encyclopedia, interactive encyclopedia and Chinese Wikipedia knowledge), wherein the entity structured knowledge comprises the following steps: entity name, entity profiles (Abstracts), entity Infobox attribute, entity picture, etc.
The attributes that ultimately define the travel domain entities include: chinese name, open time, foreign language name, entrance ticket price, geographic location, year, literary guarantee level, suggested playing duration, suitable playing season, city, value, name, birth time, death time, nationality, alias, achievement, work, year, nationality and native place.
S2: and (3) knowledge fusion: the knowledge fusion process comprises three parts, namely using an improved deep learning knowledge representation model BERT to calculate semantic similarity between entities to complete entity alignment, using a principle and statistical method-based attribute fusion method to perform attribute fusion and using a Majority Voting (Majority Voting) algorithm-based triple fusion method to perform triple fusion.
1. Calculating semantic similarity between entities by using an improved deep learning knowledge representation model BERT to complete entity alignment;
the entity alignment implementation process of the invention is as shown in fig. 2, firstly, the entity obtained in the S1 process is arranged in a text document to form a data set, a BERT chinese language model issued by Google is used as a service (Server) terminal under the Linux platform tensflow environment, a parameter is set at a fine-tuning (fine-tuning) stage to obtain a penultimate layer of an output layer, and an entity word vector is obtained at a Client (Client) terminal of a Windows platform; secondly, calculating cosine distances between different entities according to the obtained entity word vectors, namely semantic similarity; and finally, achieving the aim of entity alignment according to the semantic similarity.
2. Performing attribute fusion based on a principle and a statistical method;
the method comprises the steps of obtaining a travel entity Infobox attribute from an existing Chinese encyclopedia knowledge base, and finally determining the content of the travel entity Infobox attribute by writing rules (regular expressions) by using Python language and counting different name expressions (such as birth date and birth time) of the same attribute in different knowledge bases. For example: for the attribute values of the geographical position of the Ling Gu Temple, the attribute values of Zhongshan Ling, which are about 1.5 km from east, 1.5 km from east of Zhongshan Ling in Nanjing City, and Nanjing City in Jiangsu province, the second one is selected as the attribute value according to the accuracy principle and most principles.
3. Carrying out triple fusion by adopting a Majority Voting (Majority Voting) based algorithm;
after the entities are aligned and the attributes are fused, the entity triples are fused with the data containing the same entity and attribute, and a unique attribute value is determined for each attribute through a Majority Voting (Majority Voting) algorithm. For example: description of attribute values of attributes of 'construction age' in 'Dian Bell building' entity is seventy years (1384 years), seventy years (1384 years) and Ming of Ming Hongwu and Ming of Baidu encyclopedia, interactive encyclopedia and Chinese Wikipedia respectively, and according to a majority voting algorithm, we finally determine unique triple data (Dian Bell building, construction age, seventy years (1384 years)).
S3: crawling the page data of the tourist website, and performing knowledge completion on the attribute of the entity Infobox through a custom attribute matching rule. Crawling a travel website page and encyclopedia, interactive encyclopedia and Chinese Wikipedia text data, and performing knowledge completion on the part of the entity with missing attribute knowledge through a custom attribute matching rule (regular expression). For example, the regular matching template when complementing the "geo-location" attribute of a scenic spot is [ ^, | ^ located | on |. The regular matching template when the attribute of the person is complemented is [ + ], | ^ where the person is commonly called | original name | and also named | pen name | and chemical name [ + ]. ]+".
S4: constructing an ontology: construction of tourism domain ontology by adopting Stanford ontology modeling tool Prot g e
The invention determines the hierarchical structure of related concepts and categories of the tourist field by summarizing and summarizing the entities (concepts), attributes and relations in the tourist field data, defines the attributes and value ranges of the entities, models and summarizes a tourist map schema model according to the knowledge, then adopts a top-down ontology construction method to construct a seven-step method by combining with the ontology of Stanford university, and uses an ontology modeling tool Prot eg to construct and complete the tourist field ontology. The top layer is specifically tourism, and three secondary categories are determined: scenic spot, city, personage, the second grade category includes again: knowledge, traffic mode, scenic spot, diet, accommodation, entertainment, sightseeing and learning; the attribute value types comprise integer type (int), string, date type (date) and the like; on the basis of the original four basic relationships (part-of, local and overall, kind-of, parent class and subclass, instance-of, class and instance, and attribute of class, including object attribute and data attribute), the relationships between entities define other relationships according to the task needs and specific characteristics of the tourism domain ontology, and specifically include the following conditions:
1, birth-of: defining the birth date of the person, and being used for reasoning the age and the question and answer of the person;
time-of: defining the suggested playing time length, which is one of the most concerned problems for the tourists;
3, specialties-of: local characteristic food recommendations are defined and can be used for question answering and diet semantic search;
accprice-of: defining the price of accommodation is also one of the most critical issues for guests.
The travel map schema model of the invention is shown in figure 3, and three second-level categories of the travel map schema are determined: scenic spots, cities, people, and the relationship among the three, show some attributes and attribute value data types.
S5: and converting the data into an RDF triple format by using the D2RQ combined with the constructed travel ontology to obtain a travel knowledge map.
The acquisition of the travel field knowledge map is realized by the following processes:
according to the R2RML standard established by the RDB2RDF working group of W3C, mapping the data in the database to the self-defined tourism domain ontology by editing and setting mapping rules. The table names in the database correspond to concepts in the knowledge graph, the column names correspond to attributes, the column values correspond to attribute values, and the tables are in constraint correspondence. Specifically, a D2RQ tool is used for converting the travel data in the relational database into data in an RDF format, so that a travel domain knowledge map is obtained. The data are converted into the RDF triple format by the aid of the D2RQ tool and the constructed travel ontology to obtain the travel knowledge map, and the travel knowledge map is obtained through the following processes:
firstly, storing the acquired structured travel knowledge in the form of (entities, attributes and attribute values) triples into a relational database by designing a corresponding database table structure;
secondly, using a D2RQ tool to run commands to generate default mapping files, modifying the mapping files according to the defined travel ontology, and mapping the database tables to the corresponding classes of the constructed travel domain ontology;
finally, using the D2RQ tool, the command is run to convert the data into RDF format to obtain the travel domain knowledge map.
S6: the travel knowledge base is stored in a Neo4j database.
The storage of the travel knowledge base in the Neo4j database is realized by the following specific steps:
the method comprises the steps of importing a Neo4j database expansion jar package by downloading RDF, modifying a Neo4j configuration file and creating a namespace prefix, and importing a travel domain knowledge base in an RDF format into a Neo4j database by operating an instruction at a Neo4j console to finish the process of storing the travel knowledge base into the Neo4j database.
S7: and constructing a travel knowledge service platform on the basis of finishing the storage of the travel knowledge base.
The construction of the travel knowledge service platform is realized through the following processes:
on the basis of finishing the storage of the tourism knowledge base, a background uses Java programming language and SpringMVC architecture, and a foreground uses JSP dynamic webpage technology and a visual component driven by D3.js data to build a tourism knowledge service platform.
Thus, the Chinese travel knowledge service platform construction method based on the knowledge map is completely completed.
By adding semantics (knowledge) to mass internet tourism data, the data generate intelligence, the conversion process from the data to the information to the knowledge and finally to the intelligent application platform is completed, and the aims of information service to knowledge service, tourism culture propagation and the like are fulfilled.
It should be noted that, the method for constructing a chinese travel knowledge service platform based on a knowledge graph provided in the foregoing embodiment is only exemplified with respect to the above functional steps, and in practical applications, the above steps may be rearranged and combined as needed to complete corresponding functions. Any modification, equivalent replacement, or improvement made by those skilled in the art of the present invention without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (6)
1. A Chinese tourism field knowledge service platform construction method based on knowledge graph is characterized by comprising the following steps:
s1, knowledge acquisition: acquiring structured travel knowledge from an existing Chinese encyclopedia knowledge base;
s2, knowledge fusion: firstly, calculating semantic similarity between entities by using a deep learning knowledge representation model BERT to complete entity alignment, then performing attribute fusion based on a principle and a statistical method, and finally performing triple fusion by adopting a majority voting algorithm;
the specific implementation process is as follows:
1) the step of calculating the semantic similarity between the entities by using the deep learning knowledge representation model BERT to complete the entity alignment comprises the following steps: firstly, using a BERT Chinese language model issued by Google, and setting parameters at a fine-tuning stage of the model to obtain a penultimate layer of an output layer to obtain an entity word vector; then, calculating cosine distances between different entities according to the obtained entity word vectors, namely semantic similarity; finally, the aim of entity alignment is achieved according to the semantic similarity by setting a threshold;
2) two methods can be selected for attribute fusion based on principles and statistical methods, wherein one method is to acquire the attribute of the tourism entity Infobox from the existing Chinese encyclopedia knowledge base, and finally determine the attribute content of the entity Infobox by compiling rules by using Python language and counting different name expressions of the same attribute in different knowledge bases; the other method is that the entity and the attribute are regarded as a triple relation and classified as a relation extraction problem, and attribute fusion is carried out through a support vector machine and a text mining algorithm;
3) when triple fusion is carried out by adopting a majority voting algorithm, after entity alignment and attribute fusion, triple fusion is carried out on data containing the same entity and attribute in an entity triple, and a unique attribute value is determined for each attribute through the majority voting algorithm;
s3, crawling the data of the tourism website page, and performing knowledge completion on the attribute of the entity Infobox through an attribute matching rule;
s4, constructing an ontology: adopting a Stanford ontology modeling tool Prot g e to construct a tourism domain ontology;
s5, converting the data into an RDF triple format by combining the D2RQ with the tourist field ontology to obtain a tourist field knowledge map;
the method is completed by the following steps: according to the R2RML standard formulated by the RDB2RDF working group of W3C, mapping the data in the database to a self-defined travel field body by editing and setting mapping rules, and converting the travel data in the relational database into data in an RDF format by using a D2RQ tool to obtain a travel field knowledge map;
converting data into an RDF triple format by using the D2RQ combined with the constructed travel domain ontology to obtain a travel domain knowledge map, and realizing the following processes: firstly, storing the acquired structured travel knowledge in a triple form into a relational database by designing a corresponding database table structure; secondly, using a D2RQ tool to run commands to generate default mapping files, modifying the mapping files according to the defined travel ontology, and mapping the database tables to the corresponding classes of the constructed travel domain ontology; finally, a D2RQ tool is used, and commands are operated to convert the data into an RDF format so as to obtain a tourist field knowledge map;
s6, data storage: storing the travel domain knowledge map into a Neo4j map database;
and S7, constructing a travel knowledge service platform.
2. The knowledge-graph-based Chinese travel domain knowledge service platform construction method according to claim 1, wherein the step S1 is specifically completed through the following processes: the method comprises the steps of obtaining entity structured knowledge from the classification of an existing Chinese encyclopedia knowledge base, wherein the Chinese encyclopedia knowledge base comprises Zhishi.me and CN-DBpedia, the classification comprises tourism, sightseeing and playing, the entity structured knowledge comprises scenic spots, historic sites, cities, characters and cultural relics, and triple data in the structured knowledge comprises entity names, entity introduction profiles, entity Infobox attributes and entity pictures;
attributes that ultimately define the tourist domain entities include chinese name, open time, foreign language name, ticket price, geographic location, year, literary insurance level, suggested play duration, suitable play season, city of interest, value, name, birth time, time of death, nationality, alias, achievement, work, year, nationality, and native place.
3. The knowledge-graph-based Chinese travel domain knowledge service platform construction method according to claim 1, wherein the step S3 is completed through the following processes: crawling a travel website page and data of encyclopedia, interactive encyclopedia and Chinese Wikipedia, and performing knowledge completion on the part of the entity with missing attribute knowledge through an attribute matching rule.
4. The knowledge-graph-based Chinese travel domain knowledge service platform construction method according to claim 1, wherein the step S4 is completed through the following processes: and summarizing entities, attributes and relations in the data of the tourist field, determining a hierarchical structure of related concepts and categories of the tourist field, defining the attributes and value range of the entities, modeling and summarizing the model of the tourist map schema according to the knowledge, and constructing and finishing the ontology of the tourist field by using a top-down ontology construction method and combining with an ontology construction method of Stanford university and using an ontology modeling tool Prot eg.
5. The knowledge-graph-based Chinese travel domain knowledge service platform construction method according to claim 1, wherein the step S6 is completed through the following processes: the process of storing the travel domain knowledge map into the Neo4j map database is completed by downloading the RDF import Neo4j map database extension jar package, modifying the Neo4j configuration file and creating the namespace prefix, and importing the travel domain knowledge map in the RDF format into the Neo4j map database by using a command line.
6. The knowledge-graph-based Chinese travel field knowledge service platform construction method according to claim 1, wherein in step S7, on the basis of completion of travel field knowledge graph storage, a background uses Java programming language and SpringMVC architecture, and a foreground uses JSP dynamic webpage technology and D3.js data-driven visualization components to construct a travel knowledge service platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910621399.8A CN110347843B (en) | 2019-07-10 | 2019-07-10 | Knowledge map-based Chinese tourism field knowledge service platform construction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910621399.8A CN110347843B (en) | 2019-07-10 | 2019-07-10 | Knowledge map-based Chinese tourism field knowledge service platform construction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110347843A CN110347843A (en) | 2019-10-18 |
CN110347843B true CN110347843B (en) | 2022-04-15 |
Family
ID=68175783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910621399.8A Active CN110347843B (en) | 2019-07-10 | 2019-07-10 | Knowledge map-based Chinese tourism field knowledge service platform construction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110347843B (en) |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110826316B (en) * | 2019-11-06 | 2021-08-10 | 北京交通大学 | Method for identifying sensitive information applied to referee document |
CN111241835B (en) * | 2019-11-15 | 2021-12-14 | 上海景域文化传播股份有限公司 | Tourist map-based one-player scenic spot tourist knowledge embedding method and device |
CN110928963B (en) * | 2019-11-28 | 2023-10-24 | 西安理工大学 | Column-level authority knowledge graph construction method for operation and maintenance service data table |
CN110990417B (en) * | 2019-12-13 | 2023-04-21 | 陕西师范大学 | Knowledge base updating method for knowledge service platform in Chinese tourism field based on crowdsourcing |
CN111191050B (en) * | 2020-01-03 | 2023-07-04 | 中国建设银行股份有限公司 | Knowledge graph ontology model construction method and device |
CN111324691A (en) * | 2020-01-06 | 2020-06-23 | 大连民族大学 | Intelligent question-answering method for minority nationality field based on knowledge graph |
CN111291132B (en) * | 2020-01-14 | 2024-04-02 | 常州大学 | Cultural relic field ontology construction and analysis method for intelligent travel |
CN111538847A (en) * | 2020-04-16 | 2020-08-14 | 北方民族大学 | Ningxia rice knowledge graph construction method |
CN111753099B (en) * | 2020-06-28 | 2023-11-21 | 中国农业科学院农业信息研究所 | Method and system for enhancing relevance of archive entity based on knowledge graph |
CN111753100A (en) * | 2020-06-30 | 2020-10-09 | 广州小鹏车联网科技有限公司 | Knowledge graph generation method and server for vehicle-mounted application |
CN111832282B (en) * | 2020-07-16 | 2023-04-14 | 平安科技(深圳)有限公司 | External knowledge fused BERT model fine adjustment method and device and computer equipment |
CN112100395B (en) * | 2020-08-11 | 2024-03-29 | 淮阴工学院 | Expert cooperation feasibility analysis method |
CN112182241A (en) * | 2020-09-24 | 2021-01-05 | 四川大学 | Automatic construction method of knowledge graph in field of air traffic control |
CN112149423B (en) * | 2020-10-16 | 2024-01-26 | 中国农业科学院农业信息研究所 | Corpus labeling method and system for domain entity relation joint extraction |
CN113392220B (en) * | 2020-10-23 | 2024-03-26 | 腾讯科技(深圳)有限公司 | Knowledge graph generation method and device, computer equipment and storage medium |
CN112199515B (en) * | 2020-11-17 | 2023-08-15 | 西安交通大学 | Knowledge service innovation method driven by polymorphic knowledge graph |
CN112612902B (en) * | 2020-12-23 | 2023-07-14 | 国网浙江省电力有限公司电力科学研究院 | Knowledge graph construction method and device for power grid main equipment |
CN112699248B (en) * | 2020-12-24 | 2022-09-16 | 厦门市美亚柏科信息股份有限公司 | Knowledge ontology construction method, terminal equipment and storage medium |
CN112650855B (en) * | 2020-12-26 | 2022-09-13 | 曙光信息产业股份有限公司 | Knowledge graph engineering construction method and device, computer equipment and storage medium |
CN112650821A (en) * | 2021-01-20 | 2021-04-13 | 济南浪潮高新科技投资发展有限公司 | Entity alignment method fusing Wikidata |
CN113065003B (en) * | 2021-04-22 | 2023-05-26 | 国际关系学院 | Knowledge graph generation method based on multiple indexes |
CN113190689B (en) * | 2021-05-25 | 2023-04-18 | 广东电网有限责任公司广州供电局 | Construction method, device, equipment and medium of electric power safety knowledge graph |
CN113407688B (en) * | 2021-06-15 | 2022-09-16 | 西安理工大学 | Method for establishing knowledge graph-based survey standard intelligent question-answering system |
CN113468255B (en) * | 2021-06-25 | 2023-04-07 | 西安电子科技大学 | Knowledge graph-based data fusion method in social security comprehensive treatment field |
CN113204652B (en) * | 2021-07-05 | 2021-09-07 | 北京邮电大学 | Knowledge representation learning method and device |
CN113535986B (en) * | 2021-09-02 | 2023-05-05 | 中国医学科学院医学信息研究所 | Data fusion method and device applied to medical knowledge graph |
CN113821647B (en) * | 2021-11-22 | 2022-02-22 | 山东捷瑞数字科技股份有限公司 | Construction method and system of knowledge graph in engineering machinery industry |
CN113901238B (en) * | 2021-12-07 | 2022-02-18 | 武大吉奥信息技术有限公司 | City physical examination index knowledge graph construction method and system |
CN114238653B (en) * | 2021-12-08 | 2024-05-24 | 华东师范大学 | Method for constructing programming education knowledge graph, completing and intelligently asking and answering |
CN114328980A (en) * | 2022-03-14 | 2022-04-12 | 来也科技(北京)有限公司 | Knowledge graph construction method and device combining RPA and AI, terminal and storage medium |
CN115269931B (en) * | 2022-09-28 | 2022-11-29 | 深圳技术大学 | Rail transit station data map system based on service drive and construction method thereof |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777274A (en) * | 2016-06-16 | 2017-05-31 | 北京理工大学 | A kind of Chinese tour field knowledge mapping construction method and system |
CN109284394A (en) * | 2018-09-12 | 2019-01-29 | 青岛大学 | A method of Company Knowledge map is constructed from multi-source data integration visual angle |
-
2019
- 2019-07-10 CN CN201910621399.8A patent/CN110347843B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777274A (en) * | 2016-06-16 | 2017-05-31 | 北京理工大学 | A kind of Chinese tour field knowledge mapping construction method and system |
CN109284394A (en) * | 2018-09-12 | 2019-01-29 | 青岛大学 | A method of Company Knowledge map is constructed from multi-source data integration visual angle |
Non-Patent Citations (1)
Title |
---|
旅游知识图谱特征学习的景点推荐;贾中浩等;《智能***学报》;20190422(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110347843A (en) | 2019-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110347843B (en) | Knowledge map-based Chinese tourism field knowledge service platform construction method | |
Calvanese et al. | Ontology-based data integration in EPNet: Production and distribution of food during the Roman Empire | |
Dsouza et al. | Worldkg: A world-scale geographic knowledge graph | |
Punjani et al. | Template-based question answering over linked geospatial data | |
Morsey et al. | Dbpedia and the live extraction of structured data from wikipedia | |
CN109657068B (en) | Cultural relic knowledge graph generation and visualization method for intelligent museum | |
CN104205092A (en) | Building an ontology by transforming complex triples | |
CN110941612A (en) | Autonomous data lake construction system and method based on associated data | |
CN101566988A (en) | Method, system and device for searching fuzzy semantics | |
CN107992608B (en) | SPARQL query statement automatic generation method based on keyword context | |
US20120239677A1 (en) | Collaborative knowledge management | |
CN116050429B (en) | Geographic environment entity construction system and method based on multi-mode data association | |
CN111694968A (en) | Raw and fresh food supply chain knowledge graph construction method based on semi-structured data | |
Tachmazidis et al. | A Hypercat-enabled semantic Internet of Things data hub | |
Feng et al. | Geoqamap-geographic question answering with maps leveraging LLM and open knowledge base (short paper) | |
Ding et al. | Integrating 3D city data through knowledge graphs | |
CN114880483A (en) | Metadata knowledge graph construction method, storage medium and system | |
Zhang et al. | Semantic web and geospatial unique features based geospatial data integration | |
Laddha et al. | Semantic tourism information retrieval interface | |
Wang et al. | NALMO: Transforming queries in natural language for moving objects databases | |
Zhang et al. | A comprehensive overview of RDF for spatial and spatiotemporal data management | |
Zhang et al. | Research on the construction of geographic knowledge graph integrating natural disaster information | |
Hu et al. | Intelligent Question-Answering System for Famous Towns and Villages Based on Knowledge Graph | |
Wu | The semantic retrieval system for learning resources based on subject knowledge ontology | |
Zhang et al. | Semantic-Based geospatial data integration with unique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |