CN115526321A

CN115526321A - Knowledge reasoning method and system based on intelligent agent dynamic path completion strategy

Info

Publication number: CN115526321A
Application number: CN202211168516.8A
Authority: CN
Inventors: 夏毅; 罗军勇; 兰明敬; 周刚; 陈晓慧; 卢记仓; 刘铄; 章梦礼; 黄宁博; 李顺航; 李珠峰
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-09-24
Filing date: 2022-09-24
Publication date: 2022-12-27

Abstract

The invention belongs to the technical field of knowledge maps, and particularly relates to a knowledge reasoning method and a knowledge reasoning system based on an intelligent dynamic path completion strategy, wherein by extracting entities and relationships among the entities in a target knowledge map, rules in the target knowledge map and confidence scores corresponding to the rules are mined; constructing a reinforcement learning intelligent agent, dynamically guiding the intelligent agent to perform knowledge graph path completion through rules according to the current entity state and historical path information, calculating the total reward of the intelligent agent according to the hit reward when the intelligent agent hits a target entity and the rule reward when the reasoning path of the intelligent agent accords with the rules, and training a strategy network of the intelligent agent by maximizing the expected value of the total reward of the intelligent agent; and aiming at a given target condition to be inquired, acquiring a corresponding knowledge inference result by using the trained intelligent agent through path inference in the knowledge graph. The invention dynamically supplements the most probable path in the inference process through the dynamic path supplementation strategy to obtain a complete inference path, and solves the inference truncation problem caused by the missing path of the sparse knowledge map.

Description

Knowledge reasoning method and system based on intelligent agent dynamic path completion strategy

Technical Field

The invention belongs to the technical field of knowledge maps, and particularly relates to a knowledge reasoning method and a knowledge reasoning system based on an intelligent dynamic path completion strategy.

Background

With the continuous development of information technology, artificial intelligence realizes the gradual evolution from the 'computational intelligence' capable of storing and calculating, to the 'perception intelligence' capable of listening and speaking and seeing and recognizing, and to the 'cognitive intelligence' with understanding, reasoning and explaining capabilities in the next stage, and the realization difficulty and value of the three stages are also gradually improved. The knowledge graph is one of the core technologies of the current artificial intelligence, as a novel knowledge representation method, the knowledge graph contains a large amount of prior knowledge, massive information is organized in a structured triple form, and different data sources are associated and deeply fused in an entity and relationship form. At present, a large number of knowledge maps, such as YAGO, dbpedia and Freebase, are developed, related technologies are widely applied to tasks such as intelligent question answering, recommendation systems and information security, and the prominent effect of the related technologies is that both academia and industry obtain wide attention.

Knowledge reasoning is a process of generalizing from individual knowledge to general knowledge by starting from known knowledge and acquiring new facts from the knowledge through reasoning mining or generalizing a large amount of existing knowledge. Early reasoning research is mostly in the field of logic description and knowledge engineering, many scholars advocate formalized methods to describe the objective world, and it is always the focus of their research to consider that all reasoning is based on existing logic knowledge, such as first-order logic and predicate logic, and how to draw correct conclusions from known propositions and predicates. In recent years, with the explosive growth of the scale of internet data, the traditional method based on manual knowledge base establishment cannot meet the mining requirement of the big data era on a large amount of knowledge. Therefore, data-driven reasoning methods are becoming the mainstream of knowledge reasoning research.

The existing knowledge reasoning method has good effect in an experimental environment with sufficient sample size, however, in real life, the corpus samples generally obey long-tailed distribution, that is, most of the corpus is low in resource, and the constructed knowledge graph is mostly sparse. The sparseness of the map environment causes the loss of a plurality of key reasoning paths, so that the phenomenon of reasoning interruption frequently occurs when knowledge reasoning is carried out in the sparse knowledge map environment. Meanwhile, the sparse map environment causes that the model is difficult to obtain enough training samples, and lacks sufficient information to guide the training of the model, so that the strategy decision network of the reinforcement learning intelligent agent is difficult to train well. Experiments show that many inference methods are not suitable for the sparse knowledge graph inference environment in real life at present, and once the sparse knowledge graph inference environment is applied, the performance is generally greatly reduced.

Disclosure of Invention

Therefore, the invention provides a knowledge inference method and a knowledge inference system based on an intelligent agent dynamic path completion strategy, which solve the inference truncation problem caused by the missing path of a sparse knowledge graph, can dynamically complete the most probable path in the inference process through the dynamic path completion strategy in a sparse knowledge graph environment to obtain a complete inference path, and further can obtain a final result through inference.

According to the design scheme provided by the invention, a knowledge inference method based on an agent dynamic path completion strategy is provided, which comprises the following contents:

extracting entities and relationships among the entities in the target knowledge graph, and mining rules in the target knowledge graph and confidence score corresponding to the rules;

constructing a reinforcement learning intelligent agent, dynamically guiding the intelligent agent to perform knowledge graph path completion through rules according to the current entity state and historical path information, calculating the total reward of the intelligent agent according to the hit reward when the intelligent agent hits a target entity and the rule reward when the reasoning path of the intelligent agent accords with the rules, and training a strategy network of the intelligent agent by maximizing the expected value of the total reward of the intelligent agent;

and aiming at a given target condition to be inquired, acquiring a corresponding knowledge inference result by using the trained intelligent agent through path inference in the knowledge graph.

As a knowledge inference method based on an intelligent agent dynamic path completion strategy, further, in the process of mining the rules in the target knowledge graph, a structured chain rule is induced in the target knowledge graph by using a rule induction method; and filters out the rules with confidence scores higher than the score threshold using a preset score threshold.

As the knowledge reasoning method based on the intelligent agent dynamic path completion strategy, the constructed reinforcement learning intelligent agent elements further comprise: the intelligent agent comprises a state, an action, a reward, a strategy network and a transfer function, wherein the state represents a node where the intelligent agent is embedded and represented, the action represents all possible next-step operations of the intelligent agent on the node where the intelligent agent is located, the reward represents feedback obtained after the intelligent agent takes the action, the strategy network represents a network for the intelligent agent to carry out reinforcement learning according to the node state, the action, the reward and the transfer function, and the transfer function represents a transfer result of the state after the intelligent agent carries out the next-step actions.

The knowledge inference method based on the intelligent agent dynamic path completion strategy is characterized in that in the intelligent agent path inference process, if the sequence of the inference path corresponds to the rule, the confidence coefficient corresponding to the rule is used as the amountOut-of-rule rewards; and when the intelligent agent hits the correct tail entity, a hit reward is obtained; total reward for agent is denoted R _total ＝λR _r +(1-λ)R _h Wherein R is _h For hit reward, R _r For regular awards, λ is a preset weight parameter.

As the knowledge inference method based on the intelligent agent dynamic path completion strategy, further, when intelligent agent rule reward is set, if two inference paths simultaneously meet a plurality of rules, confidence scores of the rules are compared, the rule corresponding to the maximum confidence score is selected as a path matching rule, and the path matching rule confidence score is used as the rule reward of the intelligent agent.

The method comprises the steps of firstly setting a positive sample set and a negative sample set of an action space, initializing an intelligent body action vector and an action vector, then setting an iterative expansion step of the intelligent body action space and an iterative termination condition, in the iterative expansion step, updating the action space of the intelligent body by adding an additional action space, acquiring the next step action of the intelligent body through an intelligent body random sampling strategy function, updating the positive sample set and the negative sample set of the action space by utilizing whether the intelligent body reaches a target entity through walking, and terminating iteration according to whether the termination condition is met.

As the knowledge inference method based on the dynamic path completion strategy of the intelligent agent, the invention further updates the positive and negative sample sets of the action space by utilizing whether the intelligent agent reaches the target entity by walking, and the updating process is as follows: and if the intelligent agent reaches the target entity through the wandering, adding the next action and the intelligent agent state to the negative sample set, resampling the intelligent agent strategy function to obtain the next action of the intelligent agent, adding the resampled next action and state of the intelligent agent to the positive sample set, and if the intelligent agent reaches the target entity through the wandering, adding the next action and the intelligent agent state to the positive sample set.

The knowledge inference method based on the intelligent agent dynamic path completion strategy further comprises the steps of measuring the possible probabilities of all paths in a path candidate set inferred by the intelligent agent through the probability of the next action of the intelligent agent when the action space of the intelligent agent is updated, and calculating the attention weight of the relation in each path candidate set; according to the ranking of the path candidate concentration attention weight, selecting the first x relations as a most possible completion path set of the intelligent agent in the current state; according to the current entity e _t And a complementary path set, which provides an extra and N size for the agent by performing link prediction based on the embedding method ConvE _add Of (2) an action space

And updating the node generation search space of the agent missing in the current inference path by combining the action space of the agent originally at the current node

The updating process is represented as

As the knowledge inference method based on the dynamic path completion strategy of the intelligent agent, further, the probability calculation process of the next action of the intelligent agent is represented as follows: p ((r, e) | s) _t )＝P(r∣s _t )P(e∣r,s _t ) Wherein, P (r | s) _t ) Representing the probability distribution of the next step relation, P (e | r, s), selected by the agent in the current state _t ) The probability distribution of the next entity is selected on behalf of the agent. (ii) a The attention weight of the relationship in each path candidate set is expressed as

Wherein the content of the first and second substances,

represents a matrix of units, and represents a matrix of units,

representing a rule-directed auxiliary matrix, r representing a candidate relationship for the agent in the current state,

representing the number of candidate relationship elements.

Further, the present invention also provides a knowledge inference system based on the intelligent agent dynamic path completion strategy, comprising: a rule mining module, a path completion module and a knowledge reasoning module, wherein,

the rule mining module is used for extracting entities and relationships among the entities in the target knowledge graph and mining rules in the target knowledge graph and confidence scores corresponding to the rules;

the path completion module is used for constructing a reinforcement learning intelligent agent, dynamically guiding the intelligent agent to perform knowledge graph path completion through rules according to the current entity state and historical path information, calculating the total reward of the intelligent agent according to the hit reward when the intelligent agent hits a target entity and the rule reward when the inference path of the intelligent agent accords with the rules, and training a strategy network of the intelligent agent by maximizing the expected value of the total reward of the intelligent agent;

and the knowledge inference module is used for acquiring a corresponding knowledge inference result by utilizing the trained intelligent agent to perform path inference in the knowledge graph according to the given target condition to be queried.

The invention has the beneficial effects that:

the method utilizes dynamic path completion to realize knowledge reasoning under the sparse knowledge map environment and the condition that a reasoning path is missing, and combines the sequence decision process of the reinforcement learning agent to perform dynamic path completion on the missing path in the reasoning process, thereby realizing more comprehensive knowledge reasoning; the problem that the intelligent agent lacks information guidance in the reasoning process is solved by combining global knowledge and local knowledge, and more efficient knowledge reasoning is realized by iterative updating and mutual promotion of rule induction and factual reasoning. Compared with the current neural network which can not explain the output reasoning result, the application shows the reasoning process in an interpretable path mode, the model is relatively more transparent, the trust degree on the model decision is higher, the unexplainable property of the reasoning method has great influence on the reasoning result and the related backtracking, and the dominant reasoning scheme adopted by the application has the advantage that the wrong reasoning sample can be better backtracked, so that the application in the actual scene is facilitated.

Description of the drawings:

FIG. 1 is a schematic diagram of a knowledge inference process in an embodiment;

FIG. 2 is a schematic diagram of knowledge inference performed in a sparse graph environment in an embodiment;

FIG. 3 is a diagram illustrating an embodiment of an agent dynamic path completion strategy;

FIG. 4 is a schematic diagram of an iterative inference framework in an embodiment.

The specific implementation mode is as follows:

in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.

The embodiment of the invention, as shown in fig. 1, provides a knowledge inference method based on an agent dynamic path completion strategy, comprising:

s101, extracting entities and relationships among the entities in the target knowledge graph, and mining rules in the target knowledge graph and confidence scores corresponding to the rules;

s102, constructing a reinforcement learning agent, dynamically guiding the agent to perform knowledge graph path completion through rules according to the current entity state and historical path information, calculating the total reward of the agent according to the hit reward when the agent hits a target entity and the rule reward when the agent reasoning path accords with the rules, and training a strategy network of the agent by maximizing the expected value of the total reward of the agent;

s103, aiming at the given target condition to be inquired, the trained agent is used for acquiring a corresponding knowledge reasoning result by performing path reasoning in the knowledge graph.

As shown in fig. 2, the sparseness of the graph environment causes many missing of the key inference paths, so that the inference is often interrupted when performing the knowledge inference in the sparse knowledge graph environment. Meanwhile, the sparse map environment causes that the model is difficult to obtain enough training samples, and lacks sufficient information to guide the training of the model, so that the strategy decision network of the reinforcement learning intelligent agent is difficult to train well. In the embodiment of the scheme, complete path reasoning is realized by dynamic path completion, and the problem of reasoning truncation caused by sparse knowledge graph missing paths is solved.

Further, in the process of mining the rules in the target knowledge graph, a rule induction method is used for inducing the structured chain type rules in the target knowledge graph; and using a preset score threshold to screen out the rules with confidence scores higher than the score threshold. The constructed reinforcement learning agent elements comprise: the intelligent agent comprises a state, an action, a reward, a strategy network and a transfer function, wherein the state represents a node embedded expression of the intelligent agent, the action represents all possible next-step operations of the intelligent agent on the node, the reward represents feedback obtained after the intelligent agent takes the action, the strategy network represents a network for the intelligent agent to carry out reinforcement learning according to the node state, the action, the reward and the transfer function, and the transfer function represents a transfer result of the state after the intelligent agent takes the next-step action.

The method is characterized in that knowledge reasoning is carried out based on reinforcement learning, a multi-hop reasoning problem in the knowledge reasoning is modeled into a serialized decision problem, and a reinforcement learning intelligent agent is trained in a feedback and interaction mode. In the process of searching the intelligent agent path, learning the path with high reward to update the strategy network of the intelligent agent, and reasoning the path by a reinforcement learning method. When the inference path is missing due to path interruption in the inference process, the reinforcement learning agent performs dynamic path completion according to the state of the current entity and the historical path information. The possible motion space is increased by extending the motion space, which is an additional motion space increase depending on the history information (encoded by LSTM) and the current state information. As shown in fig. 3, in the dynamic path completion process, when a new additional action space is constructed in the dynamic path completion, the reinforcement learning agent dynamically expands the path in combination with global information, so as to alleviate the problem of path discontinuity in the multi-hop inference.

As a preferred embodiment, further, in the agent path inference process, if the sequence of the inference paths corresponds to the rules, the confidence degrees corresponding to the rules are rewarded as additional rules; when the intelligent agent hits the correct tail entity, a hit reward is obtained; agent Total reward is denoted R _total ＝λR _r +(1-λ)R _h Wherein R is _h For hit reward, R _r For regular awards, λ is a preset weight parameter.

And extracting some rules from the knowledge graph by a rule induction method AnyBURL, wherein the rules are frequently-appearing path combination modes in the graph and can provide guidance of global information for path reasoning of the reinforcement learning agent. Examples of extraction rules are as follows:

concept:athlete_playsin_league(a,b)←concept:athlete_playsin_team(a,e)∧concept:team_playsin_league(e,b)

each rule corresponds to a confidence score, and the higher the confidence score is, the higher the confidence of the corresponding rule is. The higher the confidence score of a rule, the more frequently the pattern appears in the knowledge graph, the higher the confidence of the corresponding rule.

Wherein, in one rule, the rule head r (…) is the rule b ₁ (…),...,b _n (…). Thus a rule

Can be expressed as

In which the capital letters represent the variables, r (c) _i ,c _j ) Andtriplet in atlas (c) _i ,r,c _j ) And equivalence.

In order to dynamically guide the search of the intelligent agent path through the rules, the invention carries out additional rule reward on paths which accord with the rules in the process of exploring the intelligent agent path. If the sequence of agent inference paths corresponds to a rule, i.e. b ₁ ＝r _j-i+1 ,...,b _i-1 ＝r _j-1 We use the confidence corresponding to the rule as the extra rule reward c _ij E.g. E. The higher the reliability of the rule corresponding to the inference path is, the higher the obtained rule reward is.

If the inference path explored by the reinforcement learning agent satisfies the rule at the same time

And rules

At this time, the corresponding R is compared ₁ And R ₂ And taking the rule corresponding to the higher confidence score as the matching rule.

As a preferred embodiment, further, a path completion algorithm is used to perform expansion of an action space of an agent and completion of a knowledge graph path by combining attention mechanism and rule guidance, wherein in the path completion algorithm, firstly, a positive and negative sample set of the action space is set, an agent state vector and an action vector are initialized, then, an iterative expansion step and an iterative termination condition of the action space of the agent are set, in the iterative expansion step, the action space of the agent is updated by adding an additional action space, next action of the agent is obtained by an agent random sampling strategy function, whether the agent reaches a target entity by walking is used to update the positive and negative sample set of the action space, and iteration is terminated according to whether the termination condition is met.

The method comprises the following steps of completing a high-quality path by combining an attention mechanism and rule guidance, and further expanding the action space of the reinforcement learning agent, wherein the probability calculation method of the next action of the agent is as follows:

P((r,e)∣s _t )＝P(r∣s _t )P(e∣r,s _t ).

by dynamically weighing the likely probabilities of all paths in the candidate set of paths explored by the reinforcement learning agent, the attention weight of the relationships in each candidate set is calculated as follows:

and selecting the top x relations according to the ordering of the weight w in the path candidate set as a most possible completed path set of the agent in the current state. Next, by following the current entity e _t And the above dynamic completion path, the embedded method ConvE is used for carrying out link prediction, and an additional N size is provided for the reinforcement learning agent _add Of movement space, i.e. N _add = kx. Combining the action space of the agent originally at the current node, the agent generates a larger search space at the node where the current inference path is missing

Namely, it is

By the dynamic path completion strategy, when the reinforcement learning agent interrupts the inference path at the current node, the path completion can be dynamically inferred by combining the global information provided by the rule and the information predicted by the target, and the next inference jump is carried out. The method provides a larger and more effective action search space for the reinforcement learning agent, thereby relieving the inference truncation problem caused by missing edges of the sparse knowledge map. The specific implementation algorithm can be designed as follows:

in the algorithm, a positive sample set and a negative sample set of an action space are updated by utilizing whether the intelligent agent reaches a target entity through wandering, if the intelligent agent reaches the target entity through wandering, the next action and the intelligent agent state are added to the negative sample set, the intelligent agent strategy function is resampled to obtain the next action of the intelligent agent, the next action and the state of the intelligent agent obtained through resampling are added to the positive sample set, and if the intelligent agent reaches the target entity through wandering, the next action and the intelligent agent state obtained before iteration are added to the positive sample set. The steps of the algorithm can be summarized as:

I. an embedded representation of entities and relationships in the knowledge-graph is initialized.

II. And mining the rules in the map and the corresponding confidence scores by using a rule induction method AnyBURL.

And III, exploring the path of the graph by using the reinforcement learning intelligent agent, and expanding the action space of the intelligent agent by using a dynamic path completion algorithm when the intelligent agent encounters path interruption due to sparse graph environment.

And IV, calculating the total reward of the reinforcement learning agent. The reward is divided into two parts, namely a hit reward and a regular reward. When the agent hits the target entity, the agent receives a hit reward R _h (ii) a When the agent's inference path conforms to the combination of rules, the agent obtains a rule reward R at that time _r (ii) a The total reward ultimately earned by the agent is the sum of two partial rewards, namely: r _total ＝λR _r +(1-λ)R _h 。

V, training the strategy network of the intelligent agent by maximizing the expectation J (theta) of the reward sum, wherein the parameter optimization of the intelligent agent is the updating of the parameters of the intelligent agent by maximizing the expectation of the reward sum through a REINFORCE algorithm.

VI, performing iterative training on the rule induction module and the fact reasoning module through an iterative reasoning framework to realize more efficient knowledge reasoning.

Referring to fig. 4, the problem of lack of information guidance of the agent in the inference process is solved by an iterative inference framework and combination of rule induction and factual inference on the model influence. In the rule induction, an AnyBEURL model can be used for inducing a structured chain rule in a knowledge graph, and a rule with the confidence coefficient higher than a given threshold value is screened. These high confidence rules represent frequent patterns of paths in the graph, thus generalizing part of the graph global information and providing rule rewards and global information guidance for further training and reasoning of the reinforcement learning agent. In the fact reasoning, the fact reasoning is carried out by utilizing the reinforcement learning combined with the dynamic path completion strategy. The reinforcement learning agent obtains the hit reward R by hitting the correct tail entity _h . Meanwhile, in the exploration path of the reasoning process, the path conforming to the rule can be rewarded by more rules R _r . Agent awards R based on hit target entity _h And regular rewards R _r By means of a Reinforcement algorithm, the total reward obtained by the agent is maximized, parameters of the agent are updated, and the agent is encouraged to explore paths with high reward, namely paths capable of hitting correct entities and paths conforming to high-confidence rules are subjected to reasoning and mining. Exploration of a path through repeated knowledge graphsAnd combining the training converged reinforcement learning agent with a dynamic path completion strategy to deduce a new fact triple in the current knowledge graph environment.

And the rule induction and the fact reasoning adopt strategies which mutually promote optimization for iterative training. The rule induction provides more high-quality measurement rules for the fact inference, guides an intelligent agent to carry out more efficient path search, and carries out the fact inference with better link prediction effect; and the fact reasoning is combined with the rule provided by rule induction to carry out reasoning to obtain more fact triples with higher accuracy, and the more fact triples are beneficial to the rule induction module to induce more rules with higher confidence coefficient, so that the two rules form the effect of mutual iterative optimization. Through the iterative reasoning framework, the rule induction and the fact reasoning are iteratively learned and reasoned, more high-quality fact triples and high-confidence rules are generated, and the effect of mutually promoting optimization is achieved. The reinforcement learning agent can conduct finer-grained reasoning tasks under the guidance of global information, and the problem of insufficient information caused by sparse maps is solved to a certain extent.

Further, based on the above method, an embodiment of the present invention further provides a knowledge inference system based on an agent dynamic path completion policy, including: a rule mining module, a path completion module and a knowledge reasoning module, wherein,

the rule mining module is used for extracting entities and relationships among the entities in the target knowledge graph and mining rules in the target knowledge graph and confidence score corresponding to the rules;

Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The elements of each example, and method steps, described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and the components and steps of each example have been described in a functional generic sense in the foregoing description for the purpose of illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Those skilled in the art will appreciate that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, which may be stored in a computer-readable storage medium, such as: read-only memory, magnetic or optical disk, and the like. Alternatively, all or part of the steps of the foregoing embodiments may also be implemented by using one or more integrated circuits, and accordingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A knowledge inference method based on an agent dynamic path completion strategy is characterized by comprising the following contents:

extracting entities in the target knowledge graph and the relation between the entities, and mining rules in the target knowledge graph and confidence score corresponding to the rules;

2. The intellectual inference method based on agent dynamic path completion strategy according to claim 1, characterized in that in mining the rules in the target knowledge graph, structured chain rules are induced in the target knowledge graph by using a rule induction method; and using a preset score threshold to screen out the rules with confidence scores higher than the score threshold.

3. The method of claim 1, wherein the constructed reinforcement learning agent elements comprise: the intelligent agent comprises a state, an action, a reward, a strategy network and a transfer function, wherein the state represents a node embedded expression of the intelligent agent, the action represents all possible next-step operations of the intelligent agent on the node, the reward represents feedback obtained after the intelligent agent takes the action, the strategy network represents a network for the intelligent agent to carry out reinforcement learning according to the node state, the action, the reward and the transfer function, and the transfer function represents a transfer result of the state after the intelligent agent takes the next-step action.

4. The intelligent agent dynamic path completion strategy-based knowledge inference method according to claim 1, wherein in the intelligent agent path inference process, if the sequence of inference paths corresponds to a rule, the confidence corresponding to the rule is rewarded as an additional rule; and when the intelligent agent hits the correct tail entity, a hit reward is obtained; total reward for agent is denoted R _total ＝λR _r +(1-λ)R _h Wherein R is _h For hit reward, R _r For regular awards, λ is a preset weight parameter.

5. The knowledge inference method based on an agent dynamic path completion strategy according to claim 1 or 5, characterized in that when an agent rule reward is set, if two inference paths simultaneously satisfy multiple rules, the confidence scores of the multiple rules are compared, the rule corresponding to the maximum confidence score is selected as a path matching rule, and the path matching rule confidence score is used as the agent rule reward.

6. The intellectual inference method based on intellectual dynamic path completion strategy according to claim 1 or 5, characterized in that the intellectual inference method is used for extending the intellectual dynamic path completion strategy and completing intellectual map path completion by combining attention mechanism and rule guidance, wherein in the path completion algorithm, firstly, the positive and negative sample sets of the action space are set and the intellectual dynamic vector and the action vector are initialized, then, the iterative expansion step and the iterative termination condition of the intellectual dynamic space are set, in the iterative expansion step, the action space of the intellectual body is updated by adding extra action space, the next action of the intellectual body is obtained by the intellectual dynamic path completion strategy function, whether the intellectual body reaches the target entity by wandering or not is used for updating the positive and negative sample sets of the action space, and the iteration is terminated according to whether the termination condition is met or not.

7. The method of claim 6, wherein the positive and negative sample sets of the action space are updated by whether the agent reaches the target entity by walking, and the updating process is as follows: and if the intelligent agent reaches the target entity through the wandering, adding the next action and the intelligent agent state to the negative sample set, resampling the intelligent agent strategy function to obtain the next action of the intelligent agent, adding the resampled next action and state of the intelligent agent to the positive sample set, and if the intelligent agent reaches the target entity through the wandering, adding the next action and the intelligent agent state to the positive sample set.

8. The intelligent agent dynamic path completion strategy-based knowledge inference method according to claim 6, characterized in that when the intelligent agent action space is updated, the possible probabilities of all paths in the path candidate set inferred by the intelligent agent are measured by the probability of the next action of the intelligent agent, and the attention weight of the relationship in each path candidate set is calculated; according to the ranking of the path candidate concentration attention weight, selecting the first x relations as a most possible completion path set of the intelligent agent in the current state; according to the current entity e _t And completing a path set, and providing extra N size for the agent by performing link prediction based on the ConvE embedding method _add Of (2) an action space

The updating process is represented as

9. The method of claim 7, wherein the probability calculation process of the next action of the agent is represented as: p ((r, e) | s) _t )＝P(r∣s _t )P(e∣r,s _t ) Wherein s is _t Representing the current entity, r representing the relation of the agent for further selection, e representing the entity in the map, P (r | s) _t ) Representing the probability distribution of the next step relation, P (e | r, s), selected by the agent in the current state _t ) Selecting a probability distribution of a next entity on behalf of the agent; the attention weight of the relationship in each path candidate set is expressed as

Wherein the content of the first and second substances,

represents a matrix of units, and represents a matrix of units,

representing an auxiliary matrix for rule-guiding the agent, r representing a candidate relationship of the agent in the current state,

representing the number of candidate relationship elements.

10. A knowledge inference system based on an agent dynamic path completion strategy is characterized by comprising: a rule mining module, a path completion module and a knowledge reasoning module, wherein,