CN116010621B

CN116010621B - Rule-guided self-adaptive path generation method

Info

Publication number: CN116010621B
Application number: CN202310032764.8A
Authority: CN
Inventors: 周光有; 陈昱丞
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2023-01-10
Filing date: 2023-01-10
Publication date: 2023-08-11
Anticipated expiration: 2043-01-10
Also published as: CN116010621A

Abstract

The application discloses a self-adaptive path generation method based on rule guidance, which comprises the following steps: filtering and screening a subject entity in a problem, inputting the subject entity into an reinforcement learning environment, generating three query graphs into an initial space by utilizing three symbol operations, carrying out rule induction on a large-scale database by utilizing a rule induction algorithm based on the subject entity, and complementing missing intermediate entities in a sparse knowledge base to form a rule query graph to form a new action decision space, wherein the decision space is formed by entities in the knowledge base and corresponding relations thereof, guiding an agent to carry out action selection and giving rewards through a strategy network, and finally obtaining a needed relation path and reaching a target entity. The application provides a reinforcement learning method for rule guidance and combined rewards, which is characterized in that an action decision space is adaptively generated on the basis of rule guidance, and the combined rewards are reconstructed by utilizing the internal relation between a relation path and rules to alleviate the sparse rewards.

Description

Rule-guided self-adaptive path generation method

Technical Field

The application belongs to the technical field of automatic question and answer of a large-scale knowledge base, and particularly relates to a rule-guided self-adaptive path generation method.

Background

The main methods of the question-answering of the complex knowledge base are three types: semantic parsing based methods (Semantic parsing based method, SP-based for short), information retrieval based methods (Information retrieval based method, IR-based for short), and reinforcement learning based methods (Reinforcement learning based method, RL-based for short). The first type of method analyzes the problem by constructing a semantic analyzer, thereby converting the natural language description into a structured query statement. Such methods are limited by the coverage of the query templates and cannot flexibly cope with complex problems. The core idea of the second type of method is to search information associated with the questions in a knowledge base, construct a question sub-graph, and embed the questions and sub-graph to represent, thereby constructing an end-to-end answer search. Although the information retrieval-based method has higher flexibility, the method has no traceable reasoning process and has poor interpretation. Based on the reinforcement learning method, the complex knowledge base questions and answers are modeled as a sequence decision process, and the RL agent is trained through the model to execute strategy-guided random walk on KB until the target entity is reached. The advantage of this approach is that it provides better flexibility and interpretability.

Although the reinforcement learning-based approach achieves a significant effect boost, it still faces the following two challenges: 1) Most of the knowledge bases in the real world are sparse, and a large number of false paths are generated due to the lack of intermediate entities in the process of multi-hop question-answer reasoning, so that the efficiency of model reasoning is low. In the past, a strategy network is designed based on path coding and sequence coding to guide an agent to a target entity, which cannot effectively eliminate false paths and complement missing intermediate entities, so that the efficiency of a model is reduced. 2) The existing work only takes an arriving target entity as a supervision signal to give a certain rewarding feedback, so that the intelligent agent can obtain a certain rewarding feedback when only a small number of paths exist in a large number of path exploration processes, and the extreme sparse rewarding can cause instability of a model and lead to sparse rewarding problems.

Disclosure of Invention

The application aims to provide a rule-guided adaptive path generation method to solve the problems in the prior art.

In order to achieve the above object, the present application provides a rule-guided adaptive path generation method, including:

constructing a strategy network, and obtaining action probability distribution through the strategy network;

constructing an agent action decision space based on symbol operation and the strategy network;

constructing a combined rewards function based on the strategy network and the agent action decision space;

and based on the agent action decision space, the action probability distribution and the combined rewarding function, adopting an iterative optimization strategy to realize self-adaptive agent path generation.

Optionally, a policy network is constructed, the policy network comprising: acquiring a candidate entity set of a complex problem, and initializing an environment state and an action space state based on the candidate entity set; and coding and representing the complex problem by adopting a bidirectional gating cyclic neural network, wherein all entities and relations in a knowledge base are represented by adopting embedded vectors.

Optionally, the process of obtaining the candidate entity set includes: disambiguating words constituting the complex problem, extracting entity references in the problem through an entity linking tool, and linking the entity references to candidate topic entities in a knowledge base to obtain a candidate entity set of the complex problem.

Optionally, the process of constructing the agent action decision space includes: generating three query graphs through three symbol operations, wherein the query graphs form a candidate action decision space; and generating a rule query graph by adopting a rule induction algorithm, obtaining a missing intermediate entity and a relationship thereof based on the rule query graph, and obtaining an agent action decision space based on the rule query graph and the candidate action decision space, wherein the rule query graph is dynamically updated based on the action of the agent.

Optionally, the process of obtaining the rule query graph includes: and selecting a path in the candidate action decision space, guiding an intelligent agent to perform new path exploration based on a link relation between a rule conversion set and a knowledge base by utilizing an entity on the path, and obtaining a rule query graph, wherein the conversion rule set is obtained based on a rule induction algorithm and comprises a forward rule and a reverse rule, the quality of a single rule is judged through the calculation result of the rule matching degree score and the inference path matching degree score, the entity on the path accords with a rule application range, and the rule application range is obtained based on the rule induction algorithm.

Optionally, the process of obtaining the action probability distribution through the policy network includes: based on the entity and relation combination after the intelligent agent takes action, expanding the historical path of the intelligent agent by adopting symbol operation to obtain a candidate path, constructing a graph attention network encoder, and encoding the candidate path through the graph attention network encoder; and (3) encoding the historical path of the intelligent agent through a bidirectional gating circulating neural network, inputting the encoded historical path into a graph annotation force network encoder, and normalizing and maximizing the output of the bidirectional gating circulating neural network and the graph annotation force network encoder to obtain the action probability distribution.

Optionally, the combined reward function is composed of a regular reward function and a similarity reward function;

the rule rewarding function is a piecewise function and is formed based on the entity, the target entity, the inference path embedded vector, the embedded vector of the complex problem and the matching degree score of the rule which are reached by the agent at present;

the similarity reward function is composed based on a current entity embedded vector, a subject entity embedded vector, a question embedded vector and a similarity weight factor.

Optionally, the combined reward function is a weighted sum of a regular reward function and a similarity reward function, and the sum of weight values is 1; and updating the combined rewards function by introducing a cosine function, so that the combined rewards keep an optimal strategy, wherein the cosine function is obtained based on the current entity embedding vector, the next entity and the relationship after the intelligent agent takes action.

The application has the technical effects that:

(1) The self-adaptive expansion action decision space is provided, symbol operation and rule guidance are combined, missing intermediate entities in a sparse knowledge base are complemented, better navigation is provided for an intelligent agent, the probability of reaching a target entity through a correct path is improved, compared with the prior art that the strategy network is designed to guide the intelligent agent to reach a target answer entity based on path coding and sequence coding, false paths can be better eliminated and the missing intermediate entities are complemented through combining rule guidance, and the intelligent agent can cover all possible paths in the exploration process and simultaneously eliminate false paths caused by the missing intermediate entities. Meanwhile, the application summarizes the applicable range of the rule, and is convenient for further application.

(2) And reconstructing combined rewards by utilizing the internal relations between the relation paths and the rules, so that the intelligent agent obtains rewards feedback for as many paths as possible in the process of exploring a large number of paths, thereby relieving the problem of sparse rewards.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a schematic diagram of a method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a process for guiding an agent exploration example through rules in an embodiment of the present application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

Example 1

As shown in fig. 1, the present embodiment provides a rule guidance-based adaptive path generation method, which includes:

defining a given problem asWherein->Is a word that constitutes a question. The objective of the knowledge base question-and-answer task is to find an inference path in the knowledge base from the topic entity e ₀ Starting, finally reaching the target answer entity ++>The model question-answer reasoning process proposed herein mainly comprises the following parts: 1) Given question->Firstly, disambiguating keywords or phrases in a problem, collecting candidate items from a knowledge base for each problem term by using standard vocabulary matching scores, extracting entity mention in the problem by a model through an entity linking tool, and linking the entity mention to candidate topic entities in the knowledge base to obtain a candidate entity set of the problem; 2) Creating an embedding for all entities and relationships in the knowledge base, initializing the definition s_0 with candidate entities extracted from the previous part for an initial state in the environment; 3) The agent expands the path and updates the state. Three according to the current state by Zhang et alOperation O of one symbol _t Generating three query graphs to form an initial candidate action decision space A_o by using an epsilon { extension, bridging, unite }, generating a rule query graph to form an additional action decision space A_t by using a rule induction algorithm, complementing the missing intermediate entities, adding the missing intermediate entities and the relation thereof into an original action decision space to obtain a final action decision space A=A_o+A_t, guiding the navigation of an agent by combining symbol operation and rules, and improving the probability of reaching a target entity through a correct path, wherein the specific process is shown as an algorithm 1; 4) The combined rewards are reconstructed through the internal relations between the relation paths and the rules, so that the intelligent agent obtains rewards and feedback for as many paths as possible in the exploration process, and the intelligent agent and the environment are fully interacted.

Markov decision modeling

Under the reinforcement learning framework, the agent iteratively explores the inference path by interacting with the Environment (Environment), and the Policy Network (Policy Network) needs to give a supervision signal, and in each iteration, instruct the agent to select an optimal path for exploration until the target entity is obtained. The complex knowledge base question-and-answer task is modeled as a Markov decision process (Markov Decision Process, MDP), defined asThe MDP includes four important elements, respectively expressed: 1) S: state space. The state of the environment describes the starting node of the inference process, the historic inference process and the nodes currently reached by the agent. In particular, reference is made herein to an adaptive path generator proposed by Zhang et al by introducing a symbol operation O _t E { extend, bridging, unite }, the inference path is adaptively generated for different types of complex problems. 2) A: an action space. The action space corresponds to three query graphs and rules generated by rule guidance generated by three symbol operations (Three symbol operation, TSO for short) respectivelyQuery graphs, in order to better transfer knowledge base structural information, a graph annotating force mechanism is combined on the basis of the query graphs. 3) Delta: and (5) state transition. State transitions are based on the current state of the environment and the candidate action space at the current time step. 4) R: feedback is rewarded. Reinforcement learning expects the agent to explore efficient inference paths to get timely rewards (forward) feedback to optimize the model.

The knowledge base is generally sparse, many intermediate entities are absent in multi-hop knowledge base question-answer reasoning, a large number of false paths are easy to generate, the accuracy of reaching target entities is reduced, and the reasoning efficiency is low. Such methods have often lacked interpretability and presented a large number of false paths due to missing intermediate entities, which have been explored in the past by designing strategic network guided agents based on path coding and sequence coding. To address the challenges described above, a reinforcement learning framework and a dynamic completion mechanism are presented, namely a rule-guided adaptive path generation model framework RS-DAG.

First, the model operates o using rules in conjunction with symbols _t And navigating for the intelligent body, and adaptively generating an action decision space. Secondly, the inherent relation between the relation path and the rule is utilized to provide combined rewards, so that the problems of low model reasoning efficiency and sparse rewards are relieved, and the main framework is shown in figure 1. Filtering and screening a subject entity in a problem, inputting the subject entity into an reinforcement learning environment, generating three query graphs into an initial space by utilizing three symbol operations defined by Zhang et al, carrying out Rule Induction (RI) on KB (KB) by utilizing an AnyBURL Rule Induction algorithm proposed by Ganhotra et al based on the subject entity, and forming a Rule query graph for the middle entity missing in a full sparse knowledge base to form a new action decision space, wherein the decision space is composed of entities in the knowledge base and corresponding relations thereof, guiding an agent to perform action selection and giving rewards through a strategy network, and finally obtaining a required relation path and reaching a target entity.

TABLE 1

In the policy network, all entities and relations are represented by embedded vectors, and global context information, state information and query quadruples are coded and represented by Bi-GRU (Bi-directional gating cyclic neural network). Specifically, at time step t, a _t ＝(r _t+1 ，e _t+1 ，o _t ，h _t )，h _t The representation consists of the next entity and relationship after the agent has taken an action. At t ₀ -t _h In the iteration, the agent selects different action expansion corresponding historical paths from the current entity according to the operation of different symbols, so as to obtain the current candidate path. For coding the historical paths of the query graph, coding through a graph attention mechanism to obtain information of adjacent nodes in the graph and context information between nodes far away from the query graph, firstly, coding sub-paths generated by iteration of the historical paths by Bi-GRU (Bi-directional gating cyclic neural network) to serve as input vectors of a graph attention network coder, then outputting through normalization and a maximum pooling layer to obtain graph attention codes of candidate paths, and finally outputting action probabilities.

The strategy network designed is innovative in that an action decision space is adaptively generated in a rule-guided mode, and the model mainly comprises two core parts: (1) Adaptively expanding an action decision space based on a target entity; (2) a combined reward based on the rule path.

Adaptive expansion action decision space based on target entity

The self-adaptive expansion action decision space based on the target entity is mainly formed by designing a strategy network to guide the intelligent agent to finish through rule and symbol operation, and the false path is eliminated while the missing intermediate entity is complemented, so that the reasoning effect of the model is improved. In this section, it will be described in detail how to update the action decision space by combining three symbol operations based on rule mining of target entities in KB, thereby reducing the impact of false paths on the reasoning effect.

TABLE 2

Mining missing intermediate entities using rules

Since the action decision space in question-answer reasoning based on reinforcement learning is composed of entities and relations, inspired by Ganhotra et al, intermediate entities missing in a sparse knowledge base and corresponding relation paths thereof are mined from the KB induction rules by utilizing an AnyBURL rule algorithm, and the method is operated in a symbol O _t Initial action decision space A composed of generated query graphs ₀ On the basis of the above, a rule query graph is added to dynamically update an action decision space, so that the problem of a large number of false paths caused by the lack of intermediate entities in a sparse knowledge base is solved. To learn to mine rules related to the subject entity, O is first operated from the symbol based on the current entity by Anyburgl rule algorithm _t And E { extending, bridging, unite } generates a knowledge graph, selects an inference path, and utilizes the link relation of entities on the inference path in the knowledge base to dig out the missing intermediate entities and the corresponding relation thereof through the conversion rule of the table 1.

The relation and entity constitution rule query graph which is mined based on the rule learning of the target entity is added into the initial action decision space to form the final action decision space A of the intelligent agent _t Experimental results show that the problem of a large number of false paths caused by the absence of intermediate entities can be effectively solved by combining rule guidance on the basis of three-symbol operation.

To mine missing intermediate entities in the sparse knowledge base, the AnyBURL rule algorithm proposed by Ganhotra et al summarizes the rule conversion sets (forward and reverse rules) based on new facts as shown in Table 1, rule Q _t F (a, b) can be in the form of recursion F ₁ (a，o1)～F ₂ (o ₁ ，o ₂ )～F ₃ (o ₂ ，o ₃ )～...～F _n (o _n ，b)

To represent. Wherein a, b and o _i Corresponding toEntities on a relational path in a knowledge base. To measure the quality of the mined rules, equations (1) and (2) give how to calculate the rule matching score cf and the matching score f of the inference path generated by rule guidance, respectively _i 。

Wherein H represents the number of entities having a link relationship with the subject entity in the rule conversion, and M represents the number of entities having a link relationship with the subject entity in the initial action space. Kappa denotes the header entity e ₀ All triplets { (e) _o ，F _i ，e _n )，i∈0，..，n}，f _i A matching degree score, e, representing the i-th hop path _i Representing the entity after the fact of the path update rule of the ith hop,representing the tail entity inferred by the ith hop path.

For clarity and intuition, table 2 presents a rule mining process for complex queries (heskey, owen, team,. First decompose the complex problem into two queries (heskey, team,as shown in Table 2, the common entity Gerard is obtained by using the conversion rule 1 and rule 2, so as to dig out a missing triplet (LFC), and the missing intermediate entity and the corresponding relation thereof are added into the rule query graph to guide the agent to infer and obtain a rule query graph through symbol operation o _t The unexplored inference paths, and thus the elimination of false paths, result in the target entity LFC, as shown in table 2. The rule entities in table 2 are represented as entities on the rule conversion relationship, and the symbol entities represent entities obtained by three symbol operations. Examples of operations are shown.

And for the current entity, firstly, matching the first entity in the rule expression, if the chain type rule summarized in the table 1 is matched, obtaining the corresponding rationality score, and adding the tail entity of the chain type rule with the highest rationality score and the corresponding relation thereof into the generated rule query graph. A new action decision space is formed by a rule query graph formed by entities and relations mined by KB rules, so that the influence of false paths obtained by lacking a large number of intermediate entities on a subsequent reasoning process is reduced.

In knowledge base multi-hop inference questions and answers, in order to accurately obtain missing intermediate entities, alternative generalizations based on new facts are proposed, through which new rules are again subsumed in each mined new fact. In fig. 1, after a problem attention coding vector q is calculated through a policy network, a relationship with the highest attention degree of the K items in the q rank and an entity composition rule query graph are selected to update an action decision space. The rule query graph consists of new facts which are mined and relations of the new facts, the new rules are collected from the new facts which are inferred through an AnyBURL rule collection algorithm by the alternate collection module, and then the new rules are added into the existing rule query graph so as to promote each other, so that an intelligent agent is guided to search more effectively, and the accuracy of reaching a target entity is improved.

Rule application scope

In order to facilitate the further use of the missing intermediate entity in rule mining, the rule application scope of the following four aspects is summarized based on the AnyBURL rule algorithm proposed by Ganhotra et al: (1) Questions involved in knowledge base reasoning questions and answers should contain multiple entities and multiple relationships; (2) When the rules are utilized to mine the intermediate entities, the link relation related to the subject entities should appear once or more times in the sparse knowledge base; (3) The method comprises the steps that random sampling is carried out on entities on a knowledge graph generated based on three symbol operations for path sampling, and meanwhile, the sampled entities are guaranteed to have a plurality of link relations in a sparse knowledge base; (4) The length L of the path mined by the rule should be greater than or equal to the link relation coefficient n existing between the missing intermediate entity and the subject entity.

Combined rewards based on rule paths

In complex KBQA, the existing work only gives a certain rewarding feedback by taking an arrival target entity as a supervision signal, so that an intelligent agent can obtain a certain rewarding feedback when only a small number of paths exist in a large number of path exploration processes, and the extreme sparse rewards can cause instability of a model and lead to sparse rewards.

For example, model TUL proposed in 2019 of LAN et al is set in a rewarding function, a cumulative rewarding can be obtained only when an answer is extracted, model KG-RS proposed in 2021 of He et al introduces a scoring function in a rewarding strategy, a fact triplet is scored through an existing pre-trained knowledge base model to obtain a corresponding rewarding, and the problem in the design of the model rewarding function is that a plurality of reasonable actions cannot be correspondingly rewarded until an agent reaches a target entity. Therefore, compared with the existing reinforcement learning reward mechanism with a relatively mature structure, the novel method has the innovation point that the rule is utilized to exclude false paths, meanwhile, the internal relation between the relation paths and the rule is considered, reasonable action appropriate rewards on the false paths are given through the combined reward mechanism, and experimental results show that the reward mechanism can well relieve sparse rewards. The combined rewards are mainly divided into two parts of regular rewards and similarity rewards.

The first partial rule rewards are defined as:

wherein e _t Representing the entity that is currently being reached,representing the target entity->Representing the rule-guided generated inference path embedding vector representation, q represents the embedding vector of the problem, and cf represents the matching degree score of the rule.

When the intelligent agent cannot obtain the target entity, similarity rewards are provided for more paths with reasonable reasoning to obtain proper rewards feedback. And calculating the similarity between the path relation and the problem through a pre-trained model, wherein the similarity between the current entity and the subject entity. The similarity rewards are defined as:

R _sim ＝φcos(h _t ，q)+(1-φ)cos(e ₀ ，e _c ) (6)

wherein e _c Representing the current entity embedded vector e ₀ An embedding vector representing a subject entity, q representing an embedding vector of a problem, φ ε [0,1]Representing the similarity weight factor.

The proposed combined rewards combine similarity rewards and rule rewards, defined as

R(s _t ，a _t )＝αR _r +(1-α)R _sim (7)

Where α is the weight of the two rewards, s _t Representing the current state, a _t Representing candidate actions (entities in the combined rewards and their corresponding relationships), a cosine function ρ(s) is introduced to keep the combined rewards optimal _t )＝cos(e _c ，h _t ) And updating the combined rewards mechanism to:

R(s _t+1 ，a _t )＝R(s _t ，a _t )+γ·ρ(s _t+1 )-ρ(s _t ) (8)

where γ is a learnable parameter, representing a discount factor. The combined rewarding mechanism can relieve model instability caused by extreme sparse rewards and help the intelligent body to perform correct action selection.

Training strategy optimization

Training the policy network with the maximum expected return, defining a cumulative reward for the policy network as:

classical REINFORCE was used to maximize the total prize return.

Wherein pi _θ (a _t |s _t+1 ) Is the probability distribution of the policy network output candidate actions, R (s _t+1 ，a _t ) Is a combined reward.

The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. The self-adaptive path generation method based on rule guidance is characterized by comprising the following steps of:

constructing an intelligent agent action decision space based on three symbol operations and the strategy network;

based on the agent action decision space, the action probability distribution and the combined rewarding function, adopting an iterative optimization strategy to realize self-adaptive agent path generation;

constructing a strategy network, and obtaining the strategy network comprises the following steps: acquiring a candidate entity set of a complex problem, and initializing an environment state and an action space state based on the candidate entity set; coding and representing the complex problem by adopting a bidirectional gating cyclic neural network, wherein all entities and relations in a knowledge base are represented by adopting embedded vectors;

the process of obtaining the candidate entity set includes: disambiguating words constituting the complex problem, extracting candidate subject entities mentioned by the entities in the problem through an entity linking tool and linking the candidate subject entities in a knowledge base, and obtaining a candidate entity set of the complex problem;

the process of constructing the agent action decision space comprises the following steps: generating three query graphs through three symbol operations, wherein the query graphs form a candidate action decision space; generating a rule query graph by adopting a rule induction algorithm, obtaining a missing intermediate entity and a relation thereof based on the rule query graph, and adding the missing intermediate entity and the relation thereof into a candidate action decision space to obtain an agent action decision space, wherein the rule query graph is dynamically updated based on actions of an agent;

the process of obtaining the rule query graph comprises the following steps: and selecting a path in the candidate action decision space, guiding an intelligent agent to perform new path exploration based on a link relation between a rule conversion set and a knowledge base by utilizing an entity on the path, and obtaining a rule query graph, wherein the rule conversion set is obtained based on a rule induction algorithm and comprises a forward rule and a reverse rule, the quality of a single rule is judged through the calculation result of a rule matching degree score and a reasoning path matching degree score, the entity on the path accords with a rule application range, and the rule application range is obtained based on the rule induction algorithm.

2. The rule-based guided adaptive path generation method of claim 1,

the process of obtaining the action probability distribution through the policy network comprises the following steps: based on the entity and relation combination after the intelligent agent takes action, expanding the historical path of the intelligent agent by adopting symbol operation to obtain a candidate path, constructing a graph attention network encoder, and encoding the candidate path through the graph attention network encoder; and (3) encoding the historical path of the intelligent agent through a bidirectional gating circulating neural network, inputting the encoded historical path into a graph annotation force network encoder, and normalizing and maximizing the output of the bidirectional gating circulating neural network and the graph annotation force network encoder to obtain the action probability distribution.

3. The rule-based guided adaptive path generation method of claim 1,

the combined rewarding function is composed of a regular rewarding function and a similarity rewarding function;

4. The method for rule-based directed adaptive path generation of claim 3 wherein,

the combined rewarding function is formed by weighting and summing a regular rewarding function and a similarity rewarding function, and the sum of weight values is 1; and updating the combined rewards function by introducing a cosine function, so that the combined rewards keep an optimal strategy, wherein the cosine function is obtained based on the current entity embedding vector, the next entity and the relationship after the intelligent agent takes action.