CN110569367A - Knowledge graph-based space keyword query method, device and equipment - Google Patents

Knowledge graph-based space keyword query method, device and equipment Download PDF

Info

Publication number
CN110569367A
CN110569367A CN201910854840.7A CN201910854840A CN110569367A CN 110569367 A CN110569367 A CN 110569367A CN 201910854840 A CN201910854840 A CN 201910854840A CN 110569367 A CN110569367 A CN 110569367A
Authority
CN
China
Prior art keywords
concept
query
concept tag
user
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910854840.7A
Other languages
Chinese (zh)
Inventor
许佳捷
孙佳宝
周晓方
赵朋朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201910854840.7A priority Critical patent/CN110569367A/en
Publication of CN110569367A publication Critical patent/CN110569367A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

the invention discloses a knowledge graph-based space keyword query method, a knowledge graph-based space keyword query device, knowledge graph-based space keyword query equipment and a computer-readable storage medium, wherein the method comprises the following steps: carrying out knowledge map annotation on user query input by utilizing space keyword query mapped to a knowledge map to obtain a target concept tag set representing user query intention; and determining related local objects between each concept label in the target concept label set and each spatial text object in the spatial text object set mapped to the knowledge graph as target query results. The method, the device, the equipment and the computer readable storage medium provided by the invention effectively enhance the understanding of the short text search intention of the user.

Description

Knowledge graph-based space keyword query method, device and equipment
Technical Field
the invention relates to the technical field of information retrieval, in particular to a spatial keyword query method, a spatial keyword query device, spatial keyword query equipment and a computer-readable storage medium based on a knowledge graph.
Background
With the generation and widespread use of a series of portable communication devices including smart phones, tablet computers and the like, the mobile internet has a well-spraying development situation. Massive spatial text information, such as social media information, public service information, and point of interest information, is emerging on the internet. Such information has led to the generation and development of a large number of location-based information service tools represented by *** maps, Baidu maps, and the like. The spatial keyword query is a core technology of these Location Based Service (LBS) tools, and aims to efficiently and accurately understand the search intention of a user from a large amount of spatial text objects and then find a query result satisfied by the user.
the understanding of the user's search intent determines the accuracy of the spatial keyword query. Generally, a user search in a spatial keyword query has the following two features: (1) the coverage of search intent is broad. For example, when searching for a restaurant, a user may have a request for the restaurant's cuisine, price, convenience of transportation, etc. (2) User searches are typically short text. The search intent of a short text description is often unclear or even ambiguous; for example, a user enters the query "apple", and a search engine typically cannot distinguish whether the user is querying the fruit "apple" or the electronic product "apple".
In recent years, extensive and intensive research on spatial keyword queries, which is mainly focused on understanding of user search intentions, has been conducted by related researchers in academia and industry. The related studies can be classified into a conventional method and a machine learning method.
the conventional method mainly resolves the user's search intention by performing string matching on the query keyword. The space keyword query based on the traditional method is mainly applied to a keyword matching scene, and the search intention understanding is realized by adopting a method of character string accurate matching or fuzzy matching. For example: spatial keyword boolean queries and spatial keyword approximation queries. The spatial keyword Boolean query requires that the query keyword and the spatial text object are completely matched accurately, and the spatial keyword approximate query requires that the query keyword and the spatial text object are matched in a fuzzy manner.
with the development of machine learning and deep learning in recent years, more and more researchers have started to use machine learning and deep learning techniques to perform analysis of user search intentions. The space keyword query based on the machine learning method is used for searching and mapping a user to one point of a semantic space by introducing a word embedding technology, so that the query of the user is converted into an implicit semantic vector. The semantic-based space keyword query mainly utilizes a Word Embedding (Word Embedding) technology to map a query keyword to one point in a semantic space, so that the conversion from search of a user to a semantic vector is realized. In addition, for analysis of the user search intention, a learner also studies understanding the user search intention by interacting with the user. Interactive spatial keyword queries construct an interactive framework that learns the weights of different keywords in a user's search from user feedback.
Because the short text has the characteristics of insufficient information quantity, ambiguity and the like, the traditional method and the machine learning method cannot process the space keyword query which is queried into the short text. However, in an actual application scenario, a large number of queries are short texts, so that the two methods have no small limitation in actual application.
From the above, it can be seen that how to enhance understanding of the short text search intention of the user is a problem to be solved at present.
disclosure of Invention
The invention aims to provide a knowledge graph-based spatial keyword query method, a knowledge graph-based spatial keyword query device, knowledge graph-based spatial keyword query equipment and a computer-readable storage medium, so as to solve the problem that the search intention of a short text of a user cannot be understood by a spatial keyword query method in the prior art.
in order to solve the technical problem, the invention provides a spatial keyword query method based on a knowledge graph, which comprises the following steps: carrying out knowledge map annotation on user query input by utilizing space keyword query mapped to a knowledge map to obtain a target concept tag set representing user query intention; determining a correlation between each concept tag in the target concept tag set and each spatial text object in a set of spatial text objects mapped to the knowledge-graph; and selecting the space text objects corresponding to the k maximum correlation values as target query results according to the correlation values.
Preferably, the performing a knowledge graph annotation on a user query input by using a spatial keyword query mapped to a knowledge graph to obtain a target concept tag set representing a user query intention includes:
Searching upper-layer concepts input by the user query in all concepts on the knowledge graph, and confirming the concept label candidate set representing the user query intention according to the upper-layer concepts;
And carrying out iterative operation on the concept label candidate set by utilizing a greedy algorithm, and selecting a group of concept labels with the maximum correlation evaluation value in the concept label candidate set as the target concept label set representing the query intention of the user.
preferably, the performing an iterative operation on the concept tag candidate set by using a greedy algorithm, and selecting a group of concept tags with the largest relevance evaluation value in the concept tag candidate set as the target concept tag set representing the query intention of the user includes:
Iteratively executing the steps of adding any one candidate concept tag in the concept tag candidate set to an initial concept tag set each time, calculating and recording an updated concept tag set and a relevance evaluation value input by the user query until all candidate concept tags in the concept tag candidate set are added to the initial concept tag set, and selecting a group of concept tag sets corresponding to the maximum relevance evaluation value as a first target concept tag set;
Iteratively executing the steps of calculating and recording an updated concept tag set and a relevance evaluation value input by the user after optionally selecting one candidate concept tag in the concept tag candidate set and replacing any concept tag in the initial concept tag set each time, until all candidate concept tags in the concept tag candidate set are added into the initial concept tag set, and selecting a group of concept tag sets corresponding to the maximum relevance evaluation value as a second target concept tag set;
after any one concept tag in the initial concept tag set is deleted every time through iterative execution, calculating and recording the updated concept tag set and the relevance evaluation value input by the user query until the concept tags in the initial concept tag set are completely deleted, and selecting a group of concept tag sets corresponding to the maximum relevance evaluation value as a third target concept tag set;
And selecting the concept label set with the maximum correlation evaluation value with the user query input from the first target concept label set, the second target concept label set and the third label set as the target concept label set representing the user query intention.
Preferably, the calculating and recording the updated concept tag set and the relevance assessment value of the user query input comprises:
Determining a typicality assessment value of the updated concept tag set and the user query input based on a naive Bayes model;
Determining the standard granularity evaluation values of the updated concept tag set and the user query input according to the number of entities in the updated concept tag set and the distance evaluation values of the updated concept tag set and the user query input;
And combining the typicality evaluation and the label granularity evaluation to obtain the relevance evaluation value of the updated concept label set and the user query input, and recording the relevance evaluation value.
preferably, the user query input comprises: a query keyword input by a user and a query location of the user.
the invention also provides a device for inquiring the spatial keywords based on the knowledge graph, which comprises the following components:
the acquisition module is used for carrying out knowledge graph annotation on the user query input by utilizing the space keyword query mapped to the knowledge graph to obtain a target concept tag set representing the user query intention;
A determining module for determining a correlation between each concept tag in the target concept tag set and each spatial text object in a set of spatial text objects mapped to the knowledge-graph;
and the selecting module is used for selecting the space text objects corresponding to the k maximum correlation values as target query results according to the correlation values.
preferably, the obtaining module includes:
the searching unit is used for searching upper-layer concepts input by the user query in all the concepts on the knowledge graph and confirming the concept label candidate set representing the user query intention according to the upper-layer concepts;
And the selecting unit is used for carrying out iterative operation on the concept label candidate set by utilizing a greedy algorithm, and selecting a group of concept labels with the maximum relevance evaluation value in the concept label candidate set as the target concept label set representing the query intention of the user.
Preferably, the selection unit includes:
An adding subunit, configured to iteratively perform the steps of adding any one candidate concept tag in the concept tag candidate set to an initial concept tag set each time, calculating and recording an updated concept tag set and a relevance evaluation value input by the user query until all candidate concept tags in the concept tag candidate set are added to the initial concept tag set, and selecting a group of concept tag sets corresponding to a maximum relevance evaluation value as a first target concept tag set;
a replacing subunit, configured to iteratively perform, after optionally selecting one candidate concept tag in the concept tag candidate set and replacing any concept tag in the initial concept tag set each time, the step of calculating and recording an updated concept tag set and a relevance evaluation value input by the user query until all candidate concept tags in the concept tag candidate set are added to the initial concept tag set, and selecting a set of concept tag sets corresponding to a maximum relevance evaluation value as a second target concept tag set;
a deleting subunit, configured to iteratively perform the step of calculating and recording the updated concept tag set and the relevance assessment value input by the user query after deleting any one of the concept tags in the initial concept tag set each time, until the concept tags in the initial concept tag set are completely deleted, and select a group of concept tag sets corresponding to the maximum relevance assessment value as a third target concept tag set;
And the selecting subunit is used for selecting the concept label set with the maximum relevance evaluation value with the user query input from the first target concept label set, the second target concept label set and the third label set as the target concept label set representing the user query intention.
the invention also provides knowledge graph-based space keyword query equipment, which comprises:
A memory for storing a computer program; and the processor is used for realizing the steps of the knowledge-graph-based spatial keyword query method when executing the computer program.
The present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the above-mentioned knowledge-graph-based spatial keyword query method.
the method for querying the spatial keywords based on the knowledge graph, provided by the invention, utilizes knowledge graph-driven spatial keyword query to label the knowledge graph of the query input of the user, and obtains a target concept label set representing the query intention of the user. And calculating the correlation between each concept label in the target concept label set and each spatial text object in the spatial text object set mapped to the knowledge graph. And selecting k space text objects with the highest degree of association with the user query intention as final query results according to the magnitude of the correlation. The improved space keyword query method introduces the structured map knowledge, associates the search input, the space text object and the concepts and the entities in the map, thereby formally defining the map knowledge-driven space keyword query and effectively enhancing the understanding of the short text search intention of the user.
drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flowchart of a first embodiment of a knowledge-graph-based spatial keyword query method according to the present invention;
FIG. 2 is a flowchart of a second embodiment of the spatial keyword query method based on knowledge-graph according to the present invention;
FIG. 3 is a flowchart of a third embodiment of the spatial keyword query method based on knowledge-graph according to the present invention;
fig. 4 is a block diagram of a spatial keyword query apparatus based on a knowledge graph according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a knowledge graph-based space keyword query method, a knowledge graph-based space keyword query device, knowledge graph-based space keyword query equipment and a computer-readable storage medium, which effectively enhance the understanding of the short text search intention of a user.
in order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a first embodiment of a knowledge-graph-based spatial keyword query method according to the present invention; the specific operation steps are as follows:
Step S101: carrying out knowledge map annotation on user query input by utilizing space keyword query mapped to a knowledge map to obtain a target concept tag set representing user query intention;
The user query input includes a query keyword input by a user and a query location of the user. And in the space keyword query, the query keywords and the query position of the user are used as input, and the space text object most similar to the user query input is returned by combining the space position measurement and the text similarity measurement between the space text object and the user query input.
a knowledge graph is a data-driven concept corpus, which is mainly composed of a series of concepts, entities and relationships between the concepts and the entities. Typical knowledge maps include the English knowledge map represented by Microsoft's Probase, Google's Freebase, and the Chinese knowledge map represented by Haerbin university, the great forest of the great university, the CN-DBpedia of the great university.
The knowledge graph is mainly constructed based on data frequently browsed and used by users in a large corpus, so that the knowledge graph comprises two main features. One of the features is that the knowledge graph has a large and fine-grained conceptual structure. Based on these fine-grained concepts, the machine is able to attribute the user's query input to more specific concepts and labels. For example, given a query input of { a happy hotel, a hilton hotel }, the best concept to attribute to this query input is a luxury hotel, rather than a concept that is correct but too broad as a hotel. Another feature is that the relationship between the concept and the entity in the knowledge-graph is not only attributed to yes or no, but the concept and the entity are usually measured by probability or weight according to the frequency of co-occurrence between the two. Thus, the knowledge graph can support concept reasoning. For example, walma and madlon are also shopping centers, but walma is conceptually more typical than madlon in the knowledge of general users in shopping centers. This intuitive cognition of the user can be formally described with conditional probabilities: p (wal ma | shopping mall) > P (madlon | shopping mall).
in the embodiment of the invention, the isA relationship in the knowledge-graph is mainly used. Given an entity o and a concept c, the isA relationship between o and c means that o is the entity under concept c and c is a concept of entity o. And each set of the isA relationships of concepts and entities simultaneously contains a measure of the frequency of n (o, c), which represents the number of times that the entity o and concept c appear simultaneously in the isA sentence pattern in the corpus. On this basis, the probability that concept c is an upper-level concept of entity o can be expressed as:
Meanwhile, the probability that the entity o belongs to the concept c can be expressed as:
In addition, the prior probability of any concept c and entity o in the atlas can be expressed as:
In this embodiment, the spatial keyword query is formulated as q ═ (q.loc, q.term), where q.loc is the geographic location of the query point, i.e., the spatial location where the user is located, and is usually represented by two-dimensional spatial coordinates; term is the user's query input, which in this embodiment may be mapped to a set of concepts and entities in the knowledge-graph. Such as "comedo, hilton" for describing the user's query intent.
Step S102: determining a correlation between each concept tag in the target concept tag set and each spatial text object in a set of spatial text objects mapped to the knowledge-graph;
A spatial text object may be formatted as o ═ o.loc, o.term, where o.loc is the geographic location where the spatial text object is located, typically represented by two-dimensional spatial coordinates; term is the set of all upper level concepts of a spatial text object. In this embodiment, each spatial text object may be mapped to an entity in the knowledge-graph.
Step S103: and selecting the space text objects corresponding to the k maximum correlation values as target query results according to the correlation values.
In the embodiment, a spatial keyword query q and a spatial text object set O are given in combination with a knowledge graph, knowledge graph labeling is firstly performed on user query input in graph knowledge-driven spatial keyword query to obtain a concept tag set C representing a query intention of a user, and then k spatial text objects with the highest degree of association with the user query intention are returned as final query results in combination with spatial distance and semantic relevance.
Based on the above embodiments, in the present embodiment, a target concept tag set representing the user's intention is selected in the concept tag candidate set according to a greedy algorithm. Referring to fig. 2, fig. 2 is a flowchart illustrating a second embodiment of a knowledge-graph-based spatial keyword query method according to the present invention; the specific operation steps are as follows:
step S201: performing knowledge graph annotation on user query input by utilizing space keyword query mapped to the knowledge graph;
Step S202: searching upper-layer concepts input by the user query in all concepts on the knowledge graph, and confirming the concept label candidate set representing the user query intention according to the upper-layer concepts;
Step S203: performing iterative operation on the concept label candidate set by using a greedy algorithm, and selecting a group of concept labels with the maximum correlation evaluation value in the concept label candidate set as the target concept label set representing the query intention of the user;
Due to the fact that the knowledge graph has millions of concepts and relations, searching the concept space through exhaustion has huge cost, and the requirements of practical application cannot be met; therefore, in the present embodiment, a method of tag search is needed to quickly search the optimal concept tag set labeling the intent of the user query.
in this embodiment, a heuristic greedy strategy is adopted, and a greedy search method for labeling search intents is provided. The upper level concepts of the user query input are first looked up among all the concepts on the knowledge-graph, ignoring concept labels that have only a small number of entity relationships (since these concepts cannot be a suitable label describing the user's intent). A candidate set Q of concept labels is obtained by filtering the concepts. And defines C as the currently selected concept set and initializes to an empty set. Then, a greedy search iteration is performed. In each iteration, the following three operations may be repeatedly attempted:
1) adding: i.e. adding an unvisited concept in Q \ C to the current set C. And optionally adding one candidate concept label in the concept label candidate set to an initial concept label set each time, and calculating and recording the updated concept label set and the relevance evaluation value of the user query input until all candidate concept labels in the concept label candidate set are added to the initial concept label set.
2) and (3) deleting: i.e., randomly removing a concept label from the current set C. After any concept label in the initial concept label set is deleted each time, calculating and recording the updated concept label set and the relevance evaluation value of the user query input until the concept labels in the initial concept label set are completely deleted,
3) and (3) replacing: namely, randomly selecting a concept label from Q \ C to replace a random concept label in C. And after optionally selecting one candidate concept tag in the concept tag candidate set and replacing any concept tag in the initial concept tag set each time, calculating and recording an updated concept tag set and a relevance evaluation value of the user query input until all candidate concept tags in the concept tag candidate set are added into the initial concept tag set.
for the three operations, after a plurality of trials, and in each iteration, a group of concept tags which can increase the relevance evaluation value to the maximum (the relevance evaluation value is the maximum) is selected as the current target concept tag set describing the query intention of the user. When the relevance cannot grow, the iteration will end and return the concept tag set of the end user's search intent. Since the relevance of the tag set C to the user search is monotonically increasing throughout the iteration process, we can ensure that the greedy search process for the entire search intent annotation is convergent. And selecting the concept label set with the maximum correlation evaluation value with the user query input from the three operations as the target concept label set representing the user query intention.
step S204: determining a correlation between each concept tag in the target concept tag set and each spatial text object in a set of spatial text objects mapped to the knowledge-graph;
Step S205: and selecting the space text objects corresponding to the k maximum correlation values as target query results according to the correlation values.
in this embodiment, a greedy algorithm is used, so that a target concept tag set representing the query intention of the user can be quickly searched in the upper-layer concepts of the query keyword of the user.
based on the foregoing embodiments, in this embodiment, on the basis of a knowledge graph, in order to achieve accurate understanding of a user search intention and avoid the existence of fewer spatial text objects matching the learned user intention around a query point, the embodiment provides a multi-concept label set evaluation method for spatial keyword search intention representation.
Referring to fig. 3, fig. 3 is a flowchart illustrating a method for querying spatial keywords based on a knowledge-graph according to a third embodiment of the present invention; the specific operation steps are as follows:
step S301: performing knowledge graph annotation on user query input by utilizing space keyword query mapped to the knowledge graph;
step S302: searching upper-layer concepts input by the user query in all concepts on the knowledge graph, and confirming the concept label candidate set representing the user query intention according to the upper-layer concepts;
Step S303: respectively performing iterative addition, replacement and deletion operations on the initial concept tag set by using a greedy algorithm and the concept tag candidate set to obtain an updated concept tag set;
step S304: determining a typicality assessment value of the updated concept tag set and the user query input based on a naive Bayes model;
the multi-concept label set type evaluation method for the search intention expression of the spatial keywords mainly considers the following two aspects when evaluating user search and concept labels: (1) and (5) evaluating the typicality. Typicality requires a strong correlation between concept labels and query keywords. (2) And (5) evaluating the granularity of the label. Granular evaluation requires that the concept labels describing the user query not be too broad. For example, considering a typicality evaluation, the concept label for the query keyword "happy hotel, hilton hotel" may be "hotel" or "luxury hotel," but it is clear that "luxury hotel" is a more granular and suitable concept label.
typicality evaluation mainly requires that strong correlation exists between concept labels and query keywords, and spatial text objects under the concept labels are distributed frequently near the position of a query point. Considering that the naive bayes model can effectively model the correlation and can reasonably embed the spatial distribution condition of the concept label, the naive bayes model is used for modeling the typicality in the embodiment.
the standard naive bayes model evaluation formula is as follows:
where q is the user query, C is a set of concept labels, and P (C | q) represents the probability that C is a set of concept labels for q. Since the size of P (q) depends only on the user query, it can be ignored in ranking the concept tags, and we can in turn get:
P(C|q)=P(q|C)P(C)
Considering that the query location and query key are logically independent of each other, therefore:
where P (q.loc | C) and P (word | C) measure the probability that the concept tag set C is selected under the query q in terms of spatial dimension and text relevance, respectively.
in terms of spatial dimension, we want the concept tag set C to be frequently distributed around the query location q.loc. The spatial distribution probability of this concept can be modeled as:
considering that the distribution of the concept label on a single point of a certain spatial position is certainly sparse, the probability needs to be calculated with coarser granularity for the spatial distribution measurement of the concept, and then the spatial distribution probability of the final concept label is obtained. Therefore, we introduced a grid index (gridlndex) approach to divide the entire geographic space into n × n cells. So we can model P (q.loc | C) again to get:
wherein g isjis cell number j in the grid index. P (g) is the label of the end user's query intention because we expect that the closer the concept label to the cell where the query location is located, the higher the probability of the label being the end user's query intentionj) Can be defined as follows:
wherein g isqRepresenting the cell in which the query location is located, dc (g)j,gq) Represents a cell gjAnd cell gqthe chebyshev distance therebetween. λ e (01) is a decay factor that indicates the farther from the cell the query location is locatedthe lower the likelihood that a concept becomes a user query intent concept tag.
in terms of the textual relevance metrics for concept set C and user query keywords, we wish to be able to measure the relevance of the query to the concept set as a whole (rather than to individual concepts). Intuitively, we can get a more specific and accurate object when we observe more conceptual labels. This observation means that as we observe more concepts, the relevance between the set of concept labels and the query should be enhanced. Therefore, we use a modeling approach based on noise reduction to measure the textual relevance of concept set C and user query keywords, formalized as follows:
Step S305: determining the standard granularity evaluation values of the updated concept tag set and the user query input according to the number of entities in the updated concept tag set and the distance evaluation values of the updated concept tag set and the user query input;
It is expected that models in this embodiment tend to use fine-grained concepts more when describing user query keywords, because fine-grained concept tags are closer to the true intent of the user query. However, in a typicality evaluation of concept labels and user queries, a broad concept tends to be more typicality because it may be associated with more query keywords. Therefore, the annotation model provided by the present embodiment also takes into account the granularity of the concept labels. The present embodiment finds two main features of the broad concept: (1) broad conceptual tags have a large number of entities; (2) the distance from a broad concept in the knowledge-graph to the entity is longer. Thus, entity number-based evaluation and distance-based evaluation are considered in the conceptual granularity evaluation.
intuitively, the embodiment finds that the concept labels associated with more entities tend to be broad, while the embodiment only wants to learn that there are less associated entities under the concept labels, the closer the concept labels are to the query intention of the user. Thus, an evaluation based on the number of entities is defined:
where O is the entire set of entities (spatial text objects) and Number (C) represents the Number of all entities associated with concept set C. It is clear that the evaluation based on the number of entities is independent of the specific user query, so this way of evaluation is biased towards fine-grained concepts.
Distance-based evaluation takes advantage of the hierarchy of the knowledge graph and measures the granularity of concept labels by querying the distance of keywords to target concepts in the knowledge graph. The expected number of hits for a random walk is used as a distance measure in this embodiment. In short, query keywords require longer distances on the knowledge-graph to access a broad concept through random traversal. For example, a macchem restaurant is both a restaurant and a luxury restaurant, and a luxury restaurant is also a restaurant. The access path from maccim restaurant to the restaurant must go through the luxury restaurant. The reciprocal of the expected number of hits is therefore used to define the distance-based evaluation:
Where h (c | word) represents the expected number of steps in the knowledge-graph to access the concept label c from the keyword word by random walks.
Where hyperm (word) is all the upper conceptual labels of word in the knowledge-graph, and P (c '| word) represents the transition probability from word to c' by random walks in the knowledge-graph.
the above-described entity-number-based evaluation and distance-based evaluation measure the relevance of the concept label to the query from different features, respectively. Since the evaluation based on the number of entities is biased toward a fine-grained concept, high accuracy and low recall of the labeling results may result. Therefore, in this embodiment, an F value concept is introduced to define a granularity evaluation formula based on an F value:
Step S306: combining the typicality evaluation and the label granularity evaluation to obtain the updated concept label set and a correlation evaluation value input by a user query, and recording the correlation evaluation value;
combining the typicality evaluation and the label granularity evaluation of the search intention expression facing the space keywords, thereby defining a comprehensive relevance evaluation formula between the user query and the concept label:
rel(C|q)=P(C|q)θ(C|q)
where P (C | q) and θ (C | q) represent the typicality and granularity, respectively, of the concept tag set C.
step S307: selecting a group of concept label sets corresponding to the maximum relevance evaluation value from the updated concept label sets as target concept label sets representing the query intention of the user;
Step S308: determining a correlation between each concept tag in the target concept tag set and each spatial text object in a set of spatial text objects mapped to the knowledge-graph;
step S309: and selecting the space text objects corresponding to the k maximum correlation values as target query results according to the correlation values.
in the embodiment, structured graph knowledge is introduced, and search input, spatial text objects and concepts and entities in the graph are associated, so that a graph knowledge-driven spatial keyword query is formally defined. Secondly, a multi-concept label set type evaluation method facing to the expression of the search intention of the space keywords is provided by combining the space attribute of the search of the space keywords, and the concept labels far away from the search point of the user are punished by combining the space position of the user in the understanding of the search intention, so that the learned search intention labels of the user have frequent spatial distribution around the search point, and the situation that few or even no space text objects related to the query semantics exist around the search position of the user is avoided. Finally, in order to further improve the efficiency of search intention labeling, a greedy search method facing to space keyword search intention labeling is provided, a group of labels with the highest relevance to user search are searched through adding, deleting and replacing operations, and meanwhile the convergence of the greedy search method is ensured.
Referring to fig. 4, fig. 4 is a block diagram illustrating a spatial keyword query apparatus based on a knowledge-graph according to an embodiment of the present invention; the specific device may include:
an obtaining module 100, configured to perform a knowledge graph annotation on a user query input by using a spatial keyword query mapped to a knowledge graph, so as to obtain a target concept tag set representing a user query intention;
A determining module 200, configured to determine a correlation between each concept tag in the target concept tag set and each spatial text object in the set of spatial text objects mapped to the knowledge graph;
And a selecting module 300, configured to select, according to the magnitude of the correlation, the spatial text objects corresponding to the k maximum correlation values as target query results.
The spatial keyword query device based on a knowledge graph of this embodiment is used to implement the aforementioned spatial keyword query method based on a knowledge graph, and therefore specific implementations of the spatial keyword query device based on a knowledge graph may be found in the foregoing embodiments of the spatial keyword query method based on a knowledge graph, for example, the obtaining module 100, the determining module 200, and the selecting module 300 are respectively used to implement steps S101, S102, and S103 in the aforementioned spatial keyword query method based on a knowledge graph, so that specific implementations thereof may refer to descriptions of corresponding embodiments of each part, and are not described herein again.
The specific embodiment of the invention also provides knowledge graph-based spatial keyword query equipment, which comprises: a memory for storing a computer program; and the processor is used for realizing the steps of the knowledge-graph-based spatial keyword query method when executing the computer program.
the specific embodiment of the present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the above-mentioned spatial keyword query method based on a knowledge graph.
the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
the steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
the present invention provides a method, an apparatus, a device and a computer readable storage medium for querying spatial keywords based on a knowledge graph. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (10)

1. a spatial keyword query method based on a knowledge graph is characterized by comprising the following steps:
Carrying out knowledge map annotation on user query input by utilizing space keyword query mapped to a knowledge map to obtain a target concept tag set representing user query intention;
determining a correlation between each concept tag in the target concept tag set and each spatial text object in a set of spatial text objects mapped to the knowledge-graph;
And selecting the space text objects corresponding to the k maximum correlation values as target query results according to the correlation values.
2. the method of spatial keyword query of claim 1, wherein the performing a knowledge-graph annotation on a user query input using a spatial keyword query mapped to a knowledge-graph to obtain a target concept tag set representing a user query intent comprises:
searching upper-layer concepts input by the user query in all concepts on the knowledge graph, and confirming the concept label candidate set representing the user query intention according to the upper-layer concepts;
And carrying out iterative operation on the concept label candidate set by utilizing a greedy algorithm, and selecting a group of concept labels with the maximum correlation evaluation value in the concept label candidate set as the target concept label set representing the query intention of the user.
3. The method for querying spatial keywords according to claim 2, wherein the performing iterative operation on the concept tag candidate set by using a greedy algorithm, and selecting a set of concept tags with the largest relevance evaluation value in the concept tag candidate set as the target concept tag set representing the query intent of the user comprises:
iteratively executing the steps of adding any one candidate concept tag in the concept tag candidate set to an initial concept tag set each time, calculating and recording an updated concept tag set and a relevance evaluation value input by the user query until all candidate concept tags in the concept tag candidate set are added to the initial concept tag set, and selecting a group of concept tag sets corresponding to the maximum relevance evaluation value as a first target concept tag set;
iteratively executing the steps of calculating and recording an updated concept tag set and a relevance evaluation value input by the user after optionally selecting one candidate concept tag in the concept tag candidate set and replacing any concept tag in the initial concept tag set each time, until all candidate concept tags in the concept tag candidate set are added into the initial concept tag set, and selecting a group of concept tag sets corresponding to the maximum relevance evaluation value as a second target concept tag set;
After any one concept tag in the initial concept tag set is deleted every time through iterative execution, calculating and recording the updated concept tag set and the relevance evaluation value input by the user query until the concept tags in the initial concept tag set are completely deleted, and selecting a group of concept tag sets corresponding to the maximum relevance evaluation value as a third target concept tag set;
and selecting the concept label set with the maximum correlation evaluation value with the user query input from the first target concept label set, the second target concept label set and the third label set as the target concept label set representing the user query intention.
4. the spatial keyword query method of claim 3, wherein the calculating and recording the updated concept tag set and the relevance assessment value of the user query input comprises:
determining a typicality assessment value of the updated concept tag set and the user query input based on a naive Bayes model;
Determining the standard granularity evaluation values of the updated concept tag set and the user query input according to the number of entities in the updated concept tag set and the distance evaluation values of the updated concept tag set and the user query input;
and combining the typicality evaluation and the label granularity evaluation to obtain the relevance evaluation value of the updated concept label set and the user query input, and recording the relevance evaluation value.
5. the spatial keyword query method of any one of claims 1 to 4, wherein the user query input comprises: a query keyword input by a user and a query location of the user.
6. A spatial keyword query device based on knowledge graph is characterized by comprising:
The acquisition module is used for carrying out knowledge graph annotation on the user query input by utilizing the space keyword query mapped to the knowledge graph to obtain a target concept tag set representing the user query intention;
A determining module for determining a correlation between each concept tag in the target concept tag set and each spatial text object in a set of spatial text objects mapped to the knowledge-graph;
And the selecting module is used for selecting the space text objects corresponding to the k maximum correlation values as target query results according to the correlation values.
7. The spatial keyword query apparatus of claim 6, wherein the obtaining module comprises:
The searching unit is used for searching upper-layer concepts input by the user query in all the concepts on the knowledge graph and confirming the concept label candidate set representing the user query intention according to the upper-layer concepts;
And the selecting unit is used for carrying out iterative operation on the concept label candidate set by utilizing a greedy algorithm, and selecting a group of concept labels with the maximum relevance evaluation value in the concept label candidate set as the target concept label set representing the query intention of the user.
8. The spatial keyword query apparatus of claim 7, wherein the selection unit comprises:
an adding subunit, configured to iteratively perform the steps of adding any one candidate concept tag in the concept tag candidate set to an initial concept tag set each time, calculating and recording an updated concept tag set and a relevance evaluation value input by the user query until all candidate concept tags in the concept tag candidate set are added to the initial concept tag set, and selecting a group of concept tag sets corresponding to a maximum relevance evaluation value as a first target concept tag set;
a replacing subunit, configured to iteratively perform, after optionally selecting one candidate concept tag in the concept tag candidate set and replacing any concept tag in the initial concept tag set each time, the step of calculating and recording an updated concept tag set and a relevance evaluation value input by the user query until all candidate concept tags in the concept tag candidate set are added to the initial concept tag set, and selecting a set of concept tag sets corresponding to a maximum relevance evaluation value as a second target concept tag set;
A deleting subunit, configured to iteratively perform the step of calculating and recording the updated concept tag set and the relevance assessment value input by the user query after deleting any one of the concept tags in the initial concept tag set each time, until the concept tags in the initial concept tag set are completely deleted, and select a group of concept tag sets corresponding to the maximum relevance assessment value as a third target concept tag set;
And the selecting subunit is used for selecting the concept label set with the maximum relevance evaluation value with the user query input from the first target concept label set, the second target concept label set and the third label set as the target concept label set representing the user query intention.
9. a spatial keyword query apparatus based on a knowledge graph, comprising:
a memory for storing a computer program;
A processor for implementing the steps of a knowledge-graph based spatial keyword query method according to any one of claims 1 to 5 when executing said computer program.
10. a computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the steps of a method of knowledge-graph based spatial keyword query according to any one of claims 1 to 5.
CN201910854840.7A 2019-09-10 2019-09-10 Knowledge graph-based space keyword query method, device and equipment Pending CN110569367A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910854840.7A CN110569367A (en) 2019-09-10 2019-09-10 Knowledge graph-based space keyword query method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910854840.7A CN110569367A (en) 2019-09-10 2019-09-10 Knowledge graph-based space keyword query method, device and equipment

Publications (1)

Publication Number Publication Date
CN110569367A true CN110569367A (en) 2019-12-13

Family

ID=68778917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910854840.7A Pending CN110569367A (en) 2019-09-10 2019-09-10 Knowledge graph-based space keyword query method, device and equipment

Country Status (1)

Country Link
CN (1) CN110569367A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309872A (en) * 2020-03-26 2020-06-19 北京百度网讯科技有限公司 Search processing method, device and equipment
CN112905884A (en) * 2021-02-10 2021-06-04 北京百度网讯科技有限公司 Method, apparatus, medium, and program product for generating sequence annotation model
CN113961717A (en) * 2021-10-26 2022-01-21 上海石湾科技有限公司 Searching system based on knowledge graph
CN114168756A (en) * 2022-01-29 2022-03-11 浙江口碑网络技术有限公司 Query understanding method and apparatus for search intention, storage medium, and electronic device
CN117271577A (en) * 2023-11-21 2023-12-22 连邦网络科技服务南通有限公司 Keyword retrieval method based on intelligent analysis
CN112905884B (en) * 2021-02-10 2024-05-31 北京百度网讯科技有限公司 Method, apparatus, medium and program product for generating sequence annotation model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070282826A1 (en) * 2006-06-06 2007-12-06 Orland Harold Hoeber Method and apparatus for construction and use of concept knowledge base
CN101364239A (en) * 2008-10-13 2009-02-11 中国科学院计算技术研究所 Method for auto constructing classified catalogue and relevant system
US20100211566A1 (en) * 2009-02-13 2010-08-19 Yahoo! Inc. Entity-based search results and clusters on maps

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070282826A1 (en) * 2006-06-06 2007-12-06 Orland Harold Hoeber Method and apparatus for construction and use of concept knowledge base
CN101364239A (en) * 2008-10-13 2009-02-11 中国科学院计算技术研究所 Method for auto constructing classified catalogue and relevant system
US20100211566A1 (en) * 2009-02-13 2010-08-19 Yahoo! Inc. Entity-based search results and clusters on maps

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309872A (en) * 2020-03-26 2020-06-19 北京百度网讯科技有限公司 Search processing method, device and equipment
CN111309872B (en) * 2020-03-26 2023-08-08 北京百度网讯科技有限公司 Search processing method, device and equipment
CN112905884A (en) * 2021-02-10 2021-06-04 北京百度网讯科技有限公司 Method, apparatus, medium, and program product for generating sequence annotation model
CN112905884B (en) * 2021-02-10 2024-05-31 北京百度网讯科技有限公司 Method, apparatus, medium and program product for generating sequence annotation model
CN113961717A (en) * 2021-10-26 2022-01-21 上海石湾科技有限公司 Searching system based on knowledge graph
CN114168756A (en) * 2022-01-29 2022-03-11 浙江口碑网络技术有限公司 Query understanding method and apparatus for search intention, storage medium, and electronic device
CN117271577A (en) * 2023-11-21 2023-12-22 连邦网络科技服务南通有限公司 Keyword retrieval method based on intelligent analysis
CN117271577B (en) * 2023-11-21 2024-03-15 连邦网络科技服务南通有限公司 Keyword retrieval method based on intelligent analysis

Similar Documents

Publication Publication Date Title
Purves et al. The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet
CN110569367A (en) Knowledge graph-based space keyword query method, device and equipment
Silva et al. Adding geographic scopes to web resources
CN102193973B (en) Present answer
CN102725758B (en) Generate and present horizontal concept
US8090729B2 (en) Large graph measurement
CN111488426A (en) Query intention determining method and device and processing equipment
Machado et al. An ontological gazetteer and its application for place name disambiguation in text
Kamalloo et al. A coherent unsupervised model for toponym resolution
WO2013133985A1 (en) Entity augmentation service from latent relational data
CN106407445B (en) A kind of unstructured data resource identification and localization method based on URL
WO2004013774A2 (en) Search engine for non-textual data
Yang et al. Mining multi-tag association for image tagging
Frontiera et al. A comparison of geometric approaches to assessing spatial similarity for GIR
Rajput et al. BNOSA: A Bayesian network and ontology based semantic annotation framework
CN115618113A (en) Search recall method and system based on knowledge graph representation learning
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
Partyka et al. Enhanced geographically typed semantic schema matching
Huang et al. Design and implementation of oil and gas information on intelligent search engine based on knowledge graph
KR100844265B1 (en) Method and system for providing POI searching services by semantic web
Xu et al. CISK: An interactive framework for conceptual inference based spatial keyword query
Liaqat et al. Applying uncertain frequent pattern mining to improve ranking of retrieved images
KR20120038418A (en) Searching methods and devices
CN116628303A (en) Semi-structured webpage attribute value extraction method and system based on prompt learning
Fize et al. Could spatial features help the matching of textual data?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191213

RJ01 Rejection of invention patent application after publication