CN110347843B - Knowledge map-based Chinese tourism field knowledge service platform construction method - Google Patents

Knowledge map-based Chinese tourism field knowledge service platform construction method Download PDF

Info

Publication number
CN110347843B
CN110347843B CN201910621399.8A CN201910621399A CN110347843B CN 110347843 B CN110347843 B CN 110347843B CN 201910621399 A CN201910621399 A CN 201910621399A CN 110347843 B CN110347843 B CN 110347843B
Authority
CN
China
Prior art keywords
knowledge
entity
travel
attribute
tourism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910621399.8A
Other languages
Chinese (zh)
Other versions
CN110347843A (en
Inventor
曹菡
张威震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Normal University
Original Assignee
Shaanxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Normal University filed Critical Shaanxi Normal University
Priority to CN201910621399.8A priority Critical patent/CN110347843B/en
Publication of CN110347843A publication Critical patent/CN110347843A/en
Application granted granted Critical
Publication of CN110347843B publication Critical patent/CN110347843B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A method for constructing a Chinese tourism field knowledge service platform based on an intellectual map comprises the steps of obtaining structured tourism knowledge from an existing Chinese encyclopedia knowledge base, performing knowledge fusion, crawling tourism website page data, performing knowledge completion on entity Infobox attributes through custom attribute matching rules, constructing a tourism field body by adopting a Stanford body modeling tool Prot gee, converting the data into an RDF triple format by utilizing D2RQ combined with the constructed tourism body to obtain a tourism field knowledge map, and performing a Neo4j map database storage task of the tourism knowledge base, wherein the knowledge fusion task comprises the steps of completing entity alignment by using semantic similarity between improved deep learning knowledge representation model BERT calculation entities, performing attribute fusion based on a principle and a statistical method, and performing a triple fusion subtask by adopting a majority voting algorithm. The invention is convenient for tourists to obtain one-stop comprehensive service.

Description

Knowledge map-based Chinese tourism field knowledge service platform construction method
Technical Field
The invention belongs to the field of computer information processing, and particularly relates to a Chinese tourism field knowledge service platform construction method based on a knowledge map.
Background
The knowledge graph describes concepts, entities and relations in an objective world in a structured mode, expresses information of the internet into a mode closer to the human cognitive world, and provides the capability of better organizing, managing and understanding mass information of the internet.
The ontology is a knowledge representation base of the knowledge graph and can be represented in a formalized mode as O ═ C, H, P, a, I }, C is a concept set like transactional concepts and event class concepts, H is a context relationship set of the concepts, also called Taxonomy knowledge, P is an attribute set and describes the features of the concepts, a is a rule set and describes domain rules, and I is an instance set and is used for describing entity-attribute-values. With the development of representation learning represented by deep learning, significant progress has been made in representation learning for entities and relationships in knowledge maps. The knowledge representation learning represents the entities and the relations as dense low-dimensional vectors, distributed representation of the entities and the relations is achieved, the entities and the relations can be efficiently calculated, knowledge sparseness is relieved, knowledge fusion is facilitated, and the knowledge representation learning becomes an important method for knowledge graph knowledge fusion and knowledge completion. Knowledge maps are divided into general knowledge maps and domain knowledge maps, wherein the general knowledge maps comprise WordNet describing semantic relations of English vocabulary layers, DBPedia organizing knowledge items in a form of a constructed body, YAGO fusing a concept hierarchy of the WordNet and a large amount of entity data of Wikipedia, Freebase and the like established by using a group intelligent method, Chinese general knowledge map research can be traced back to HowNet projects established by adopting a manual editing mode, the industry comprises OpenKG.CN, hundredth, dog searching and learning cube and the like, and the academic circles comprise large-scale knowledge maps, Zhishi.me and CN-DBpedia established by utilizing the hundredth encyclopedia, the interactive encyclopedia and the Chinese Wikipedia. Google releases knowledge map projects in 2012 and 5 months, and constructs a next-generation intelligent search engine on the basis of the knowledge map projects, which marks the successful application of large-scale knowledge in Internet semantic search.
Compared with a general knowledge graph, the domain knowledge graph has relatively fewer construction researches, is called an industry knowledge graph or a vertical knowledge graph, is oriented to a specific field, can be regarded as an industry knowledge base based on semantic technology, and has higher requirements on the depth and the knowledge accuracy of the knowledge in the field because of a strict and rich data mode based on the construction of industry data. The English-Chinese-English museum semantically organizes various data resources collected in the museum by combining a semantic technology, and provides knowledge service by semantic refining, multimedia resource labeling and other modes; the british broadcasting company BBC [ Kobilarov et al,2009] defines an ontology in its blocks of music, sports wild animals, etc., and converts news into a machine-readable information source for content management and report automatic generation. The domestic field knowledge graph technology utilizes a Shanghai atlas library to build a knowledge system for the family tree, celebrities, manuscripts and other resources by using a U.S. Country number frame BibFrame [ Kroeger et al,2013], and a family tree service platform is created to provide ancient book evidence-based service for researchers; chinese academy of agricultural sciences focuses on the field of rice subdivision, integrates industrial resources such as papers, patents, news and the like, constructs a rice knowledge map, and provides an industrial professional knowledge service platform for scientific researchers.
The information construction of the tourism industry in China has a history of more than 30 years, but a Chinese knowledge map specially aiming at the tourism field is very lacking, and the development and inheritance of the tourism culture in China are seriously hindered. And the prior knowledge graph in the Chinese field has different data modes oriented to different fields, different application requirements, no set of general standards and specifications for guiding construction and the like.
In summary, it is urgently needed to construct a Chinese tourism knowledge service platform based on a knowledge map to organize, manage and utilize mass tourism knowledge data such as food, lodging, traveling, touring, purchasing, entertaining and the like, so that tourists can conveniently obtain one-stop comprehensive service, tourism culture can be better spread, and finally the tourism industry is led to the tourism knowledge service from the tourism information service.
Disclosure of Invention
The invention aims to provide a Chinese tourism field knowledge service platform construction method based on an intellectual graph, which aims to solve the problems in the prior art, and comprises the steps of obtaining structured tourism knowledge, knowledge fusion and crawling tourism website page data from an existing Chinese encyclopedia knowledge base, performing knowledge completion on entity Infobox attributes through custom attribute matching rules, constructing a tourism field ontology by adopting a Stanford ontology modeling tool Prot g, converting the data into an RDF (remote description format) format by utilizing D2RQ combined with the constructed tourism ontology to obtain a tourism field knowledge graph and a Neo4j graph database storage task of the tourism field knowledge base.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
s1, knowledge acquisition: acquiring structured travel knowledge from an existing Chinese encyclopedia knowledge base;
s2, knowledge fusion: firstly, calculating semantic similarity between entities by using a deep learning knowledge representation model BERT to complete entity alignment, then performing attribute fusion based on a principle and a statistical method, and finally performing triple fusion by adopting a majority voting algorithm;
s3, crawling the data of the tourism website page, and performing knowledge completion on the attribute of the entity Infobox through an attribute matching rule;
s4, constructing an ontology: adopting a Stanford ontology modeling tool Prot g e to construct a tourism domain ontology;
s5, converting the data into an RDF triple format by combining the D2RQ with the tourist field ontology to obtain a tourist field knowledge map;
s6, data storage: storing the travel domain knowledge map into a Neo4j map database;
and S7, constructing a travel knowledge service platform.
The step S1 is specifically completed by the following processes: the method comprises the steps of obtaining entity structured knowledge from the classification of an existing Chinese encyclopedia knowledge base, wherein the Chinese encyclopedia knowledge base comprises Zhishi.me and CN-DBpedia, the classification comprises tourism, sightseeing and playing, the entity structured knowledge comprises scenic spots, historic sites, cities, characters and cultural relics, and triple data in the structured knowledge comprises entity names, entity introduction profiles, entity Infobox attributes and entity pictures;
attributes that ultimately define the tourist domain entities include chinese name, open time, foreign language name, ticket price, geographic location, year, literary insurance level, suggested play duration, suitable play season, city of interest, value, name, birth time, time of death, nationality, alias, achievement, work, year, nationality, and native place.
The specific implementation of the three steps in step S2 is as follows:
1) the step of calculating the semantic similarity between the entities by using the deep learning knowledge representation model BERT to complete the entity alignment comprises the following steps: firstly, using a BERT Chinese language model issued by Google, and setting parameters at a fine-tuning stage of the model to obtain a penultimate layer of an output layer to obtain an entity word vector; then, calculating cosine distances between different entities according to the obtained entity word vectors, namely semantic similarity; finally, the aim of entity alignment is achieved according to the semantic similarity by setting a threshold;
2) two methods can be selected for attribute fusion based on principles and statistical methods, wherein one method is to acquire the attribute of the tourism entity Infobox from the existing Chinese encyclopedia knowledge base, and finally determine the attribute content of the entity Infobox by compiling rules by using Python language and counting different name expressions of the same attribute in different knowledge bases; the other method is that the entity and the attribute are regarded as a triple relation and classified as a relation extraction problem, and attribute fusion is carried out through a support vector machine and a text mining algorithm;
3) when the majority voting algorithm is adopted to perform triple fusion, after entity alignment and attribute fusion, the triple fusion is performed on the data containing the same entity and attribute in the entity triple, and a unique attribute value is determined for each attribute through the majority voting algorithm.
The step S3 is completed by the following steps: crawling a travel website page and data of encyclopedia, interactive encyclopedia and Chinese Wikipedia, and performing knowledge completion on the part of the entity with missing attribute knowledge through an attribute matching rule.
The step S4 is completed by the following steps: and summarizing entities, attributes and relations in the data of the tourist field, determining a hierarchical structure of related concepts and categories of the tourist field, defining the attributes and value range of the entities, modeling and summarizing the model of the tourist map schema according to the knowledge, and constructing and finishing the ontology of the tourist field by using a top-down ontology construction method and combining with an ontology construction method of Stanford university and using an ontology modeling tool Prot eg.
The step S5 is completed by the following steps: according to the R2RML standard formulated by the RDB2RDF working group of W3C, mapping the data in the database to a self-defined travel domain body by editing and setting mapping rules, and converting the travel data in the relational database into data in an RDF format by using a D2RQ tool to obtain a travel domain knowledge map.
Converting data into an RDF triple format by using the D2RQ combined with the constructed travel domain ontology to obtain a travel domain knowledge map, and realizing the following processes: firstly, storing the acquired structured travel knowledge in a triple form into a relational database by designing a corresponding database table structure; secondly, using a D2RQ tool to run commands to generate default mapping files, modifying the mapping files according to the defined travel ontology, and mapping the database tables to the corresponding classes of the constructed travel domain ontology; finally, using the D2RQ tool, the command is run to convert the data into RDF format to obtain the travel domain knowledge map.
The step S6 is completed by the following steps: the process of storing the travel domain knowledge map into the Neo4j map database is completed by downloading the RDF import Neo4j map database extension jar package, modifying the Neo4j configuration file and creating the namespace prefix, and importing the travel domain knowledge map in the RDF format into the Neo4j map database by using a command line.
And S7, on the basis of the completion of the storage of the knowledge map in the tourism field, the background uses Java programming language and SpringMVC architecture, and the foreground uses JSP dynamic webpage technology and a D3.js data-driven visual component to build a tourism knowledge service platform.
Compared with the prior art, the method has the advantages that the semantic similarity between the entities is calculated by using the improved deep learning knowledge representation model BERT, so that the entity alignment is completed, the comparison test is carried out with other knowledge representation models, and the entity alignment accuracy is highest. The method comprises the steps of summarizing and summarizing entities (concepts), attributes and relations in data of the tourism field, determining a hierarchical structure of related concepts and categories of the tourism field, defining entity attributes and value range, modeling and summarizing a tourist map schema model according to the knowledge, defining a new category relation by combining with characteristics of the tourism industry, constructing a seven-step method by combining a top-down ontology construction method with an ontology of Stanford university, and completing the construction of a tourism field ontology by using an ontology modeling tool Prot eg. The invention adopts a database storage scheme of a travel field knowledge base RDF triple Neo4j, and can fully utilize the support of more perfect graph query language and algorithm provided by a native knowledge storage medium (Neo4 j). The tourism knowledge service platform constructed based on the method can combine graph mining calculation and knowledge reasoning to enable the tourism industry to go from information service to knowledge service.
Drawings
FIG. 1 is a schematic flow diagram of a construction method of the present invention;
FIG. 2 is a schematic diagram illustrating a process for implementing entity alignment in a knowledge fusion phase according to the present invention;
FIG. 3 is a schematic diagram of a schema model for knowledge modeling of a travel knowledge map.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Referring to fig. 1, the method for constructing the knowledge service platform in the Chinese tourism field based on the knowledge map comprises the following steps:
s1: acquiring knowledge: acquiring structured travel knowledge from an existing Chinese encyclopedia knowledge base;
the method comprises the following steps of obtaining entity structured knowledge of scenic spots, historic sites, cities, characters, cultural relics and the like from the categories of 'tourism', 'sightseeing', 'playing' and the like of the existing Chinese encyclopedia knowledge base Zhishi.me, CN-DBpedia (the official network of the knowledge base freely provides RDF triple format data download, and the data comprise encyclopedia, interactive encyclopedia and Chinese Wikipedia knowledge), wherein the entity structured knowledge comprises the following steps: entity name, entity profiles (Abstracts), entity Infobox attribute, entity picture, etc.
The attributes that ultimately define the travel domain entities include: chinese name, open time, foreign language name, entrance ticket price, geographic location, year, literary guarantee level, suggested playing duration, suitable playing season, city, value, name, birth time, death time, nationality, alias, achievement, work, year, nationality and native place.
S2: and (3) knowledge fusion: the knowledge fusion process comprises three parts, namely using an improved deep learning knowledge representation model BERT to calculate semantic similarity between entities to complete entity alignment, using a principle and statistical method-based attribute fusion method to perform attribute fusion and using a Majority Voting (Majority Voting) algorithm-based triple fusion method to perform triple fusion.
1. Calculating semantic similarity between entities by using an improved deep learning knowledge representation model BERT to complete entity alignment;
the entity alignment implementation process of the invention is as shown in fig. 2, firstly, the entity obtained in the S1 process is arranged in a text document to form a data set, a BERT chinese language model issued by Google is used as a service (Server) terminal under the Linux platform tensflow environment, a parameter is set at a fine-tuning (fine-tuning) stage to obtain a penultimate layer of an output layer, and an entity word vector is obtained at a Client (Client) terminal of a Windows platform; secondly, calculating cosine distances between different entities according to the obtained entity word vectors, namely semantic similarity; and finally, achieving the aim of entity alignment according to the semantic similarity.
2. Performing attribute fusion based on a principle and a statistical method;
the method comprises the steps of obtaining a travel entity Infobox attribute from an existing Chinese encyclopedia knowledge base, and finally determining the content of the travel entity Infobox attribute by writing rules (regular expressions) by using Python language and counting different name expressions (such as birth date and birth time) of the same attribute in different knowledge bases. For example: for the attribute values of the geographical position of the Ling Gu Temple, the attribute values of Zhongshan Ling, which are about 1.5 km from east, 1.5 km from east of Zhongshan Ling in Nanjing City, and Nanjing City in Jiangsu province, the second one is selected as the attribute value according to the accuracy principle and most principles.
3. Carrying out triple fusion by adopting a Majority Voting (Majority Voting) based algorithm;
after the entities are aligned and the attributes are fused, the entity triples are fused with the data containing the same entity and attribute, and a unique attribute value is determined for each attribute through a Majority Voting (Majority Voting) algorithm. For example: description of attribute values of attributes of 'construction age' in 'Dian Bell building' entity is seventy years (1384 years), seventy years (1384 years) and Ming of Ming Hongwu and Ming of Baidu encyclopedia, interactive encyclopedia and Chinese Wikipedia respectively, and according to a majority voting algorithm, we finally determine unique triple data (Dian Bell building, construction age, seventy years (1384 years)).
S3: crawling the page data of the tourist website, and performing knowledge completion on the attribute of the entity Infobox through a custom attribute matching rule. Crawling a travel website page and encyclopedia, interactive encyclopedia and Chinese Wikipedia text data, and performing knowledge completion on the part of the entity with missing attribute knowledge through a custom attribute matching rule (regular expression). For example, the regular matching template when complementing the "geo-location" attribute of a scenic spot is [ ^, | ^ located | on |. The regular matching template when the attribute of the person is complemented is [ + ], | ^ where the person is commonly called | original name | and also named | pen name | and chemical name [ + ]. ]+".
S4: constructing an ontology: construction of tourism domain ontology by adopting Stanford ontology modeling tool Prot g e
The invention determines the hierarchical structure of related concepts and categories of the tourist field by summarizing and summarizing the entities (concepts), attributes and relations in the tourist field data, defines the attributes and value ranges of the entities, models and summarizes a tourist map schema model according to the knowledge, then adopts a top-down ontology construction method to construct a seven-step method by combining with the ontology of Stanford university, and uses an ontology modeling tool Prot eg to construct and complete the tourist field ontology. The top layer is specifically tourism, and three secondary categories are determined: scenic spot, city, personage, the second grade category includes again: knowledge, traffic mode, scenic spot, diet, accommodation, entertainment, sightseeing and learning; the attribute value types comprise integer type (int), string, date type (date) and the like; on the basis of the original four basic relationships (part-of, local and overall, kind-of, parent class and subclass, instance-of, class and instance, and attribute of class, including object attribute and data attribute), the relationships between entities define other relationships according to the task needs and specific characteristics of the tourism domain ontology, and specifically include the following conditions:
1, birth-of: defining the birth date of the person, and being used for reasoning the age and the question and answer of the person;
time-of: defining the suggested playing time length, which is one of the most concerned problems for the tourists;
3, specialties-of: local characteristic food recommendations are defined and can be used for question answering and diet semantic search;
accprice-of: defining the price of accommodation is also one of the most critical issues for guests.
The travel map schema model of the invention is shown in figure 3, and three second-level categories of the travel map schema are determined: scenic spots, cities, people, and the relationship among the three, show some attributes and attribute value data types.
S5: and converting the data into an RDF triple format by using the D2RQ combined with the constructed travel ontology to obtain a travel knowledge map.
The acquisition of the travel field knowledge map is realized by the following processes:
according to the R2RML standard established by the RDB2RDF working group of W3C, mapping the data in the database to the self-defined tourism domain ontology by editing and setting mapping rules. The table names in the database correspond to concepts in the knowledge graph, the column names correspond to attributes, the column values correspond to attribute values, and the tables are in constraint correspondence. Specifically, a D2RQ tool is used for converting the travel data in the relational database into data in an RDF format, so that a travel domain knowledge map is obtained. The data are converted into the RDF triple format by the aid of the D2RQ tool and the constructed travel ontology to obtain the travel knowledge map, and the travel knowledge map is obtained through the following processes:
firstly, storing the acquired structured travel knowledge in the form of (entities, attributes and attribute values) triples into a relational database by designing a corresponding database table structure;
secondly, using a D2RQ tool to run commands to generate default mapping files, modifying the mapping files according to the defined travel ontology, and mapping the database tables to the corresponding classes of the constructed travel domain ontology;
finally, using the D2RQ tool, the command is run to convert the data into RDF format to obtain the travel domain knowledge map.
S6: the travel knowledge base is stored in a Neo4j database.
The storage of the travel knowledge base in the Neo4j database is realized by the following specific steps:
the method comprises the steps of importing a Neo4j database expansion jar package by downloading RDF, modifying a Neo4j configuration file and creating a namespace prefix, and importing a travel domain knowledge base in an RDF format into a Neo4j database by operating an instruction at a Neo4j console to finish the process of storing the travel knowledge base into the Neo4j database.
S7: and constructing a travel knowledge service platform on the basis of finishing the storage of the travel knowledge base.
The construction of the travel knowledge service platform is realized through the following processes:
on the basis of finishing the storage of the tourism knowledge base, a background uses Java programming language and SpringMVC architecture, and a foreground uses JSP dynamic webpage technology and a visual component driven by D3.js data to build a tourism knowledge service platform.
Thus, the Chinese travel knowledge service platform construction method based on the knowledge map is completely completed.
By adding semantics (knowledge) to mass internet tourism data, the data generate intelligence, the conversion process from the data to the information to the knowledge and finally to the intelligent application platform is completed, and the aims of information service to knowledge service, tourism culture propagation and the like are fulfilled.
It should be noted that, the method for constructing a chinese travel knowledge service platform based on a knowledge graph provided in the foregoing embodiment is only exemplified with respect to the above functional steps, and in practical applications, the above steps may be rearranged and combined as needed to complete corresponding functions. Any modification, equivalent replacement, or improvement made by those skilled in the art of the present invention without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (6)

1. A Chinese tourism field knowledge service platform construction method based on knowledge graph is characterized by comprising the following steps:
s1, knowledge acquisition: acquiring structured travel knowledge from an existing Chinese encyclopedia knowledge base;
s2, knowledge fusion: firstly, calculating semantic similarity between entities by using a deep learning knowledge representation model BERT to complete entity alignment, then performing attribute fusion based on a principle and a statistical method, and finally performing triple fusion by adopting a majority voting algorithm;
the specific implementation process is as follows:
1) the step of calculating the semantic similarity between the entities by using the deep learning knowledge representation model BERT to complete the entity alignment comprises the following steps: firstly, using a BERT Chinese language model issued by Google, and setting parameters at a fine-tuning stage of the model to obtain a penultimate layer of an output layer to obtain an entity word vector; then, calculating cosine distances between different entities according to the obtained entity word vectors, namely semantic similarity; finally, the aim of entity alignment is achieved according to the semantic similarity by setting a threshold;
2) two methods can be selected for attribute fusion based on principles and statistical methods, wherein one method is to acquire the attribute of the tourism entity Infobox from the existing Chinese encyclopedia knowledge base, and finally determine the attribute content of the entity Infobox by compiling rules by using Python language and counting different name expressions of the same attribute in different knowledge bases; the other method is that the entity and the attribute are regarded as a triple relation and classified as a relation extraction problem, and attribute fusion is carried out through a support vector machine and a text mining algorithm;
3) when triple fusion is carried out by adopting a majority voting algorithm, after entity alignment and attribute fusion, triple fusion is carried out on data containing the same entity and attribute in an entity triple, and a unique attribute value is determined for each attribute through the majority voting algorithm;
s3, crawling the data of the tourism website page, and performing knowledge completion on the attribute of the entity Infobox through an attribute matching rule;
s4, constructing an ontology: adopting a Stanford ontology modeling tool Prot g e to construct a tourism domain ontology;
s5, converting the data into an RDF triple format by combining the D2RQ with the tourist field ontology to obtain a tourist field knowledge map;
the method is completed by the following steps: according to the R2RML standard formulated by the RDB2RDF working group of W3C, mapping the data in the database to a self-defined travel field body by editing and setting mapping rules, and converting the travel data in the relational database into data in an RDF format by using a D2RQ tool to obtain a travel field knowledge map;
converting data into an RDF triple format by using the D2RQ combined with the constructed travel domain ontology to obtain a travel domain knowledge map, and realizing the following processes: firstly, storing the acquired structured travel knowledge in a triple form into a relational database by designing a corresponding database table structure; secondly, using a D2RQ tool to run commands to generate default mapping files, modifying the mapping files according to the defined travel ontology, and mapping the database tables to the corresponding classes of the constructed travel domain ontology; finally, a D2RQ tool is used, and commands are operated to convert the data into an RDF format so as to obtain a tourist field knowledge map;
s6, data storage: storing the travel domain knowledge map into a Neo4j map database;
and S7, constructing a travel knowledge service platform.
2. The knowledge-graph-based Chinese travel domain knowledge service platform construction method according to claim 1, wherein the step S1 is specifically completed through the following processes: the method comprises the steps of obtaining entity structured knowledge from the classification of an existing Chinese encyclopedia knowledge base, wherein the Chinese encyclopedia knowledge base comprises Zhishi.me and CN-DBpedia, the classification comprises tourism, sightseeing and playing, the entity structured knowledge comprises scenic spots, historic sites, cities, characters and cultural relics, and triple data in the structured knowledge comprises entity names, entity introduction profiles, entity Infobox attributes and entity pictures;
attributes that ultimately define the tourist domain entities include chinese name, open time, foreign language name, ticket price, geographic location, year, literary insurance level, suggested play duration, suitable play season, city of interest, value, name, birth time, time of death, nationality, alias, achievement, work, year, nationality, and native place.
3. The knowledge-graph-based Chinese travel domain knowledge service platform construction method according to claim 1, wherein the step S3 is completed through the following processes: crawling a travel website page and data of encyclopedia, interactive encyclopedia and Chinese Wikipedia, and performing knowledge completion on the part of the entity with missing attribute knowledge through an attribute matching rule.
4. The knowledge-graph-based Chinese travel domain knowledge service platform construction method according to claim 1, wherein the step S4 is completed through the following processes: and summarizing entities, attributes and relations in the data of the tourist field, determining a hierarchical structure of related concepts and categories of the tourist field, defining the attributes and value range of the entities, modeling and summarizing the model of the tourist map schema according to the knowledge, and constructing and finishing the ontology of the tourist field by using a top-down ontology construction method and combining with an ontology construction method of Stanford university and using an ontology modeling tool Prot eg.
5. The knowledge-graph-based Chinese travel domain knowledge service platform construction method according to claim 1, wherein the step S6 is completed through the following processes: the process of storing the travel domain knowledge map into the Neo4j map database is completed by downloading the RDF import Neo4j map database extension jar package, modifying the Neo4j configuration file and creating the namespace prefix, and importing the travel domain knowledge map in the RDF format into the Neo4j map database by using a command line.
6. The knowledge-graph-based Chinese travel field knowledge service platform construction method according to claim 1, wherein in step S7, on the basis of completion of travel field knowledge graph storage, a background uses Java programming language and SpringMVC architecture, and a foreground uses JSP dynamic webpage technology and D3.js data-driven visualization components to construct a travel knowledge service platform.
CN201910621399.8A 2019-07-10 2019-07-10 Knowledge map-based Chinese tourism field knowledge service platform construction method Active CN110347843B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910621399.8A CN110347843B (en) 2019-07-10 2019-07-10 Knowledge map-based Chinese tourism field knowledge service platform construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910621399.8A CN110347843B (en) 2019-07-10 2019-07-10 Knowledge map-based Chinese tourism field knowledge service platform construction method

Publications (2)

Publication Number Publication Date
CN110347843A CN110347843A (en) 2019-10-18
CN110347843B true CN110347843B (en) 2022-04-15

Family

ID=68175783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910621399.8A Active CN110347843B (en) 2019-07-10 2019-07-10 Knowledge map-based Chinese tourism field knowledge service platform construction method

Country Status (1)

Country Link
CN (1) CN110347843B (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826316B (en) * 2019-11-06 2021-08-10 北京交通大学 Method for identifying sensitive information applied to referee document
CN111241835B (en) * 2019-11-15 2021-12-14 上海景域文化传播股份有限公司 Tourist map-based one-player scenic spot tourist knowledge embedding method and device
CN110928963B (en) * 2019-11-28 2023-10-24 西安理工大学 Column-level authority knowledge graph construction method for operation and maintenance service data table
CN110990417B (en) * 2019-12-13 2023-04-21 陕西师范大学 Knowledge base updating method for knowledge service platform in Chinese tourism field based on crowdsourcing
CN111191050B (en) * 2020-01-03 2023-07-04 中国建设银行股份有限公司 Knowledge graph ontology model construction method and device
CN111324691A (en) * 2020-01-06 2020-06-23 大连民族大学 Intelligent question-answering method for minority nationality field based on knowledge graph
CN111291132B (en) * 2020-01-14 2024-04-02 常州大学 Cultural relic field ontology construction and analysis method for intelligent travel
CN111538847A (en) * 2020-04-16 2020-08-14 北方民族大学 Ningxia rice knowledge graph construction method
CN111753099B (en) * 2020-06-28 2023-11-21 中国农业科学院农业信息研究所 Method and system for enhancing relevance of archive entity based on knowledge graph
CN111753100A (en) * 2020-06-30 2020-10-09 广州小鹏车联网科技有限公司 Knowledge graph generation method and server for vehicle-mounted application
CN111832282B (en) * 2020-07-16 2023-04-14 平安科技(深圳)有限公司 External knowledge fused BERT model fine adjustment method and device and computer equipment
CN112100395B (en) * 2020-08-11 2024-03-29 淮阴工学院 Expert cooperation feasibility analysis method
CN112182241A (en) * 2020-09-24 2021-01-05 四川大学 Automatic construction method of knowledge graph in field of air traffic control
CN112149423B (en) * 2020-10-16 2024-01-26 中国农业科学院农业信息研究所 Corpus labeling method and system for domain entity relation joint extraction
CN113392220B (en) * 2020-10-23 2024-03-26 腾讯科技(深圳)有限公司 Knowledge graph generation method and device, computer equipment and storage medium
CN112199515B (en) * 2020-11-17 2023-08-15 西安交通大学 Knowledge service innovation method driven by polymorphic knowledge graph
CN112612902B (en) * 2020-12-23 2023-07-14 国网浙江省电力有限公司电力科学研究院 Knowledge graph construction method and device for power grid main equipment
CN112699248B (en) * 2020-12-24 2022-09-16 厦门市美亚柏科信息股份有限公司 Knowledge ontology construction method, terminal equipment and storage medium
CN112650855B (en) * 2020-12-26 2022-09-13 曙光信息产业股份有限公司 Knowledge graph engineering construction method and device, computer equipment and storage medium
CN112650821A (en) * 2021-01-20 2021-04-13 济南浪潮高新科技投资发展有限公司 Entity alignment method fusing Wikidata
CN113065003B (en) * 2021-04-22 2023-05-26 国际关系学院 Knowledge graph generation method based on multiple indexes
CN113190689B (en) * 2021-05-25 2023-04-18 广东电网有限责任公司广州供电局 Construction method, device, equipment and medium of electric power safety knowledge graph
CN113407688B (en) * 2021-06-15 2022-09-16 西安理工大学 Method for establishing knowledge graph-based survey standard intelligent question-answering system
CN113468255B (en) * 2021-06-25 2023-04-07 西安电子科技大学 Knowledge graph-based data fusion method in social security comprehensive treatment field
CN113204652B (en) * 2021-07-05 2021-09-07 北京邮电大学 Knowledge representation learning method and device
CN113535986B (en) * 2021-09-02 2023-05-05 中国医学科学院医学信息研究所 Data fusion method and device applied to medical knowledge graph
CN113821647B (en) * 2021-11-22 2022-02-22 山东捷瑞数字科技股份有限公司 Construction method and system of knowledge graph in engineering machinery industry
CN113901238B (en) * 2021-12-07 2022-02-18 武大吉奥信息技术有限公司 City physical examination index knowledge graph construction method and system
CN114238653B (en) * 2021-12-08 2024-05-24 华东师范大学 Method for constructing programming education knowledge graph, completing and intelligently asking and answering
CN114328980A (en) * 2022-03-14 2022-04-12 来也科技(北京)有限公司 Knowledge graph construction method and device combining RPA and AI, terminal and storage medium
CN115269931B (en) * 2022-09-28 2022-11-29 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777274A (en) * 2016-06-16 2017-05-31 北京理工大学 A kind of Chinese tour field knowledge mapping construction method and system
CN109284394A (en) * 2018-09-12 2019-01-29 青岛大学 A method of Company Knowledge map is constructed from multi-source data integration visual angle

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777274A (en) * 2016-06-16 2017-05-31 北京理工大学 A kind of Chinese tour field knowledge mapping construction method and system
CN109284394A (en) * 2018-09-12 2019-01-29 青岛大学 A method of Company Knowledge map is constructed from multi-source data integration visual angle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
旅游知识图谱特征学习的景点推荐;贾中浩等;《智能***学报》;20190422(第03期);全文 *

Also Published As

Publication number Publication date
CN110347843A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN110347843B (en) Knowledge map-based Chinese tourism field knowledge service platform construction method
Calvanese et al. Ontology-based data integration in EPNet: Production and distribution of food during the Roman Empire
Dsouza et al. Worldkg: A world-scale geographic knowledge graph
Punjani et al. Template-based question answering over linked geospatial data
Morsey et al. Dbpedia and the live extraction of structured data from wikipedia
CN109657068B (en) Cultural relic knowledge graph generation and visualization method for intelligent museum
CN104205092A (en) Building an ontology by transforming complex triples
CN110941612A (en) Autonomous data lake construction system and method based on associated data
CN101566988A (en) Method, system and device for searching fuzzy semantics
CN107992608B (en) SPARQL query statement automatic generation method based on keyword context
US20120239677A1 (en) Collaborative knowledge management
CN116050429B (en) Geographic environment entity construction system and method based on multi-mode data association
CN111694968A (en) Raw and fresh food supply chain knowledge graph construction method based on semi-structured data
Tachmazidis et al. A Hypercat-enabled semantic Internet of Things data hub
Feng et al. Geoqamap-geographic question answering with maps leveraging LLM and open knowledge base (short paper)
Ding et al. Integrating 3D city data through knowledge graphs
CN114880483A (en) Metadata knowledge graph construction method, storage medium and system
Zhang et al. Semantic web and geospatial unique features based geospatial data integration
Laddha et al. Semantic tourism information retrieval interface
Wang et al. NALMO: Transforming queries in natural language for moving objects databases
Zhang et al. A comprehensive overview of RDF for spatial and spatiotemporal data management
Zhang et al. Research on the construction of geographic knowledge graph integrating natural disaster information
Hu et al. Intelligent Question-Answering System for Famous Towns and Villages Based on Knowledge Graph
Wu The semantic retrieval system for learning resources based on subject knowledge ontology
Zhang et al. Semantic-Based geospatial data integration with unique

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant