CN115526317A

CN115526317A - Multi-agent knowledge inference method and system based on deep reinforcement learning

Info

Publication number: CN115526317A
Application number: CN202211168518.7A
Authority: CN
Inventors: 夏毅; 罗军勇; 兰明敬; 周刚; 陈晓慧; 卢记仓; 刘铄; 章梦礼; 黄宁博; 孙业鹏; 李珠峰
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-09-24
Filing date: 2022-09-24
Publication date: 2022-12-27

Abstract

The invention belongs to the technical field of knowledge maps, and particularly relates to a multi-agent knowledge reasoning method and a multi-agent knowledge reasoning system based on deep reinforcement learning, wherein a structured chain rule sequence is induced in a target knowledge map by extracting entities and relationships among the entities in the target knowledge map; constructing a hierarchical intelligent agent for reinforcement learning, wherein the hierarchical intelligent agent at least comprises: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask; and on the basis of the extracted entities, the relationships among the entities and the chain rule sequence, performing knowledge reasoning by subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent, and outputting a reasoning result. The invention utilizes the layered multi-agent to carry out knowledge reasoning, and can solve the problems of overlarge search space when carrying out long-distance reasoning in a large-scale knowledge map at present.

Description

Multi-agent knowledge reasoning method and system based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of knowledge maps, and particularly relates to a multi-agent knowledge reasoning method and system based on deep reinforcement learning.

Background

In recent years, artificial intelligence research based on deep learning models is continuously progressing in breakthrough, but most of the artificial intelligence research has black box property, is not beneficial to human cognitive reasoning process, and leads to the common lack of transparency and interpretability of decision making of high-performance complex algorithms, models and systems. In the key fields of national defense, medical treatment, network and information security and the like with strict requirements on interpretability, the unexplainable property of the reasoning method has great influence on reasoning results and related backtracking, so the interpretability needs to be integrated into the algorithms and systems, and a reliable behavior interpretation mechanism is formed by assisting related prediction tasks through explicit interpretable knowledge reasoning. The knowledge graph, as one of the latest knowledge expression modes, describes entities and relationships in the objective world in a structured form by modeling a semantic network, and is widely applied to knowledge reasoning. Knowledge reasoning based on the knowledge graph explains the reasoning process by auxiliary means such as reasoning paths, logic rules and the like on the basis of discrete symbolic representation, and provides an important way for realizing interpretable artificial intelligence.

Knowledge reasoning is a process of generalizing from individual knowledge to general knowledge by obtaining new facts from known knowledge through reasoning and mining or generalizing a large amount of existing knowledge. Early reasoning research is mostly in the field of logic description and knowledge engineering, many scholars advocate formalized methods for describing the objective world, and it is always the focus of their research to consider that all reasoning is based on existing logic knowledge, such as first-order logic and predicate logic, and how to draw correct conclusions from known propositions and predicates. In recent years, with the explosive growth of the scale of internet data, the traditional method based on manual knowledge base establishment cannot meet the mining requirement of the big data era on a large amount of knowledge. Therefore, data-driven reasoning methods are becoming the mainstream of knowledge reasoning research. Knowledge-graph-based knowledge inference techniques are also explicable, not only in terms of the efficiency of knowledge inference. First, knowledge-graphs have interpretable advantages in presentation mode. Knowledge representation is a set of conventions made to describe the world, and is the process of symbolization, formalization or modeling of knowledge. Common knowledge representation methods comprise predicate logic representation, production type representation, distributed knowledge representation and the like, and as a novel knowledge representation method, compared with the traditional knowledge representation methods such as the production type representation method, the knowledge graph has the advantages of rich semantics, friendly structure and easy understanding of knowledge organization structure. Second, knowledge-graph based reasoning has the advantage of being interpretable during reasoning. Most of the processes of understanding the world and things by human beings are understanding and cognizing by concepts, attributes and relationships, for example, for the question "why a bird will fly? "human interpretation may be" bird has wings "which essentially uses the attribute in the interpretation. The knowledge graph is rich in information such as entities, concepts, attributes and relations, massive knowledge is organized in a formalized mode through a graph structure, the model is visually modeled for each inference scene of the real world, and the final decision can be specifically explained by more sources. Finally, the knowledge graph has the advantage of interpretability in storage and use, compared with other storage forms, the knowledge graph constructs and stores knowledge in a triple form, and the knowledge graph is closer to the cognition and learning habits of people generally knowing things 'major and predicate', so that the knowledge graph is more friendly to human understanding, and the interpretability of people is stronger compared with other knowledge representation methods.

Although the prior knowledge inference method based on the knowledge graph has good effect and provides good interpretability, the following defects still exist: (1) In the reasoning process, a large number of paths need to be explored by a reasoning agent, and the complexity exponentially rises every time a walk with one hop is added, so that the search space of the current reasoning model is generally large, and a long-distance reasoning task cannot be performed. (2) At present, most inference models only carry out inference experiments on experimental data sets, but are difficult to apply to large-scale knowledge maps with millions of entities like Wikidata and Freebase in real life due to the problems of overlarge search space, sparse maps and the like.

Disclosure of Invention

Therefore, the invention provides a multi-agent knowledge reasoning method and a multi-agent knowledge reasoning system based on deep reinforcement learning, which utilize layered multi-agents to carry out knowledge reasoning and can solve the problems of overlarge search space when carrying out long-distance reasoning in a large-scale knowledge map at present.

According to the design scheme provided by the invention, a multi-agent knowledge inference method based on deep reinforcement learning is provided, which comprises the following contents:

extracting entities and relationships among the entities in the target knowledge graph, and inducing a structured chain rule sequence in the target knowledge graph;

constructing a hierarchical intelligent agent for reinforcement learning, wherein the hierarchical intelligent agent at least comprises: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask;

and on the basis of the extracted entities, the relationships among the entities and the chain rule sequence, performing knowledge reasoning by subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent, and outputting a reasoning result.

As the multi-agent knowledge inference method based on deep reinforcement learning, further, in the process of mining the rules in the target knowledge map, a structured chain rule is induced in the target knowledge map by using a rule induction method; and using a preset score threshold to screen out the rules with confidence scores higher than the score threshold.

As the multi-agent knowledge inference method based on deep reinforcement learning in the invention, furthermore, the elements of the constructed high-level agent and low-level agent comprise: the intelligent agent comprises a state, an action, a reward, a strategy network and a transfer function, wherein the state represents a node where the intelligent agent is embedded and represented, the action represents all possible next-step operations of the intelligent agent on the node where the intelligent agent is located, the reward represents feedback obtained after the intelligent agent takes the action, the strategy network represents a network for the intelligent agent to carry out reinforcement learning according to the node state, the action, the reward and the transfer function, and the transfer function represents a transfer result of the state after the intelligent agent carries out the next-step actions.

As the multi-agent knowledge inference method based on deep reinforcement learning, the state of the tth high-level agent in the elements of the high-level agent is expressed as

Wherein e is _t Represents the current physical node, r _q Representing the relationship of the inference of the current node,

historical embeddings representing high-level agent search paths; all possible next operations of the high-level agent at the current entity node are expressed as:

wherein，

Representing a high-level agent action space.

As the multi-agent knowledge inference method based on deep reinforcement learning in the invention, furthermore, the state of the t-th hop of the low-level agent in the elements of the low-level agent can be expressed as

Wherein e is _t Representing the current physical node, r _t ^h Representing the sub-goals assigned by the high-level agent,

historical embeddings representing low-level agent search paths; action space of low-level agent

For all possible outgoing edges of the current node in the graph, the possible actions of the next hop of the low-level agent are represented as follows:

as the multi-agent knowledge inference method based on deep reinforcement learning, the invention further utilizes a strategy network to guide the action selection of the agents in the inference environment in knowledge inference by utilizing high-level agents and low-level agents, the high-level agents and the low-level agents are used as the elements of the next decision according to the embedding of the historical search path, and the strategy network parameters are updated by maximizing the reward of the agents.

As the multi-agent knowledge inference method based on deep reinforcement learning, further, a high-level agent strategy function

Expressed as:

an output vector representing the high-level agent through its policy network,

representing the high-level agent next-hop possible actions,

representing the current high-level agent state, theta represents the agent's parameters,

historical embeddings representing high-level agent search paths,

representing an embedded representation of the environment that the agent is currently able to observe, leakyReLU () representing an activation function; low-level agent policy function

Expressed as:

where σ is the softmax function, W ₁ And W ₂ Respectively parameters to be trained in the low-level agent policy network,

representing a low-level agent next-hop possible action,

representing the current low-level agent state,

history embedding, A, representing low-level agent search paths _t Representing a low-level smart body motion space,

representing the current entity node in the low-level agent reasoning.

As the multi-agent knowledge inference method based on deep reinforcement learning, the invention further initializes each agent parameter by using a meta-learning algorithm in agent knowledge inference, and trains on a training sample by applying an MAML method, so that the trained agent parameters are converged after being updated by N times of gradient iteration, wherein N is less than the preset maximum iteration number.

As the multi-agent knowledge reasoning method based on deep reinforcement learning, further, in the process of calculating the reward of the agents, if the sequence of the reasoning path corresponds to the rule, the confidence coefficient corresponding to the rule is used as an additional rule reward; and when the intelligent agent hits the correct tail entity, a hit reward is obtained; the agent reward is denoted R _total ＝λR _r +(1-λ)R _h Wherein R is _h For hit reward, R _r For regular awards, λ is a preset weight parameter.

Further, the invention also provides a multi-agent knowledge inference system based on deep reinforcement learning, comprising: a rule mining module, an agent construction module and a knowledge reasoning module, wherein,

the rule mining module is used for extracting entities and relationships among the entities in the target knowledge graph and inducing a structured chain rule sequence in the target knowledge graph;

an agent construction module for constructing a hierarchical agent for reinforcement learning, the hierarchical agent at least comprising: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask;

and the knowledge reasoning module is used for carrying out knowledge reasoning by subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent on the basis of the extracted entities, the relationships among the entities and the chain rule sequence, and outputting a reasoning result.

The invention has the beneficial effects that:

based on deep reinforcement learning, the invention combines multiple agents to carry out hierarchical reasoning on the entities and the relations of the knowledge graph respectively, and realizes knowledge reasoning which is more efficient and longer in distance for the knowledge graph; the meta information is used for initializing the parameters of the reinforcement learning agent, so that the training samples with few sample relations can be effectively adapted in a large-scale map environment, the prediction effect of the few sample relations is improved, and the training speed of normal relations can be accelerated; the reasoning process can be displayed in a path form, the interpretability is better, compared with the current black box neural network which can not directly output reasoning results in an interpretable manner, the model is relatively more transparent, and the trust degree of model decision making is higher. In addition, the unexplainable property of the reasoning method has a great influence on the reasoning result and the related backtracking, and the explicit reasoning method adopted in the scheme has another advantage of better backtracking the wrong reasoning sample.

Description of the drawings:

FIG. 1 is a schematic diagram of a knowledge inference process in an embodiment;

FIG. 2 is a schematic representation of a generalized structured chain rule sequence in an example embodiment;

FIG. 3 is a schematic diagram of the multi-agent reasoning principle in the embodiment.

The specific implementation mode is as follows:

in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.

An embodiment of the present invention, as shown in fig. 1, provides a multi-agent knowledge inference method based on deep reinforcement learning, including:

s101, extracting entities and relationships among the entities in a target knowledge graph, and inducing a structured chain rule sequence in the target knowledge graph;

s102, constructing a hierarchical intelligent agent for reinforcement learning, wherein the hierarchical intelligent agent at least comprises: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask;

and S103, carrying out knowledge reasoning through subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent on the basis of the extracted entities, the relationship among the entities and the chain rule sequence, and outputting a reasoning result.

Through normal chain-like reasoning, the complexity of the search space increases exponentially with the increase of the hop count, and most of the current reasoning is within 4 hops. In the embodiment of the scheme, referring to fig. 2, the reasoning process is divided into independent subtasks for exploring an abstract relationship, and then specific fine-grained entity-level path exploration is performed on each subtask, so that exponential-level multiplication is performed on an increased search space, the complexity is reduced to a level of accumulating the complexity of two shorter search tasks, the search space is greatly reduced, and meanwhile, the hierarchical reasoning mode is more consistent with a thinking mode of dividing and treating human beings, and the problems encountered are processed hierarchically, so that the method is more interpretable, and the reasoning result and the reasoning process are more acceptable to people.

As a preferred embodiment, further, in mining the rules in the target knowledge graph, a rule induction method is used to induce the structured chain rules in the target knowledge graph; and filters out the rules with confidence scores higher than the score threshold using a preset score threshold.

In the embodiment of the scheme, hierarchical knowledge reasoning is realized by a plurality of intelligent agents through reinforcement learning in two levels, firstly, the high-level intelligent agent decomposes an abstract relation layer, and the low-level intelligent agent explores a path of a concrete entity. Each reinforcement learning agent has the following elements: state (state), action (action), reward (rewarded), policy network (policy network) and transfer function (transition) per step of the agent. The state represents a node embedded representation of the intelligent agent, the action represents all possible next-step operations of the intelligent agent on the current node, the reward represents feedback obtained after the intelligent agent takes action, the strategy network represents a network for the intelligent agent to perform reinforcement learning according to the node state, the action, the reward and a transfer function, and the transfer function represents a transfer result of the state after the intelligent agent performs the next-step action.

High-level agent subordinate entity e _s At the beginning, with r _q And reasoning of the abstract path relation is realized for the reasoning relation. The high-level reasoning focuses more on reasoning the abstract conceptual relationship, so that the decomposition of the subtasks of the high-level complex reasoning tasks is realized, the subtasks are distributed to the low-level intelligent agents, and the path exploration of the concrete entity level is realized. The reinforcement learning elements contained in the high-level intelligent agent are as follows:

the state (state) represents an embedded representation of the node where the agent is currently located. In high-level knowledge reasoning, high-level agents not only consider the current node e _t And relation of reasoning r _q And also takes into account the historical embedding h of the agent's search path _t Thus, the state of the tth jump level agent may be represented as

We represent the sequence of the historical inference path by a Gated current Unit (GRU).

Action (action) represents all possible next operations of the agent at the current node

In the context of a high-level knowledge inference,

if the high-level agent reaches the target node, the high-level agent

And (6) finishing the reasoning. Otherwise

Low-level knowledge reasoning will continue.

The transfer function (transition) represents the result of the state through the next action of the agent. If it is used

At this moment, reasoning is finished, and no state transition is carried out; otherwise, reasoning continues, and the state of the high-level agent is transformed phi (-) from the state of the high-level agent

State of transforming to low-order agent

Namely, it is

Sub-target for simultaneously outputting low-level intelligent agent

Reward (reward) represents feedback obtained after an action is taken by a reinforcement learning agent to assess the agent's current state s _t Take this action a _t The expression (2). In high-level knowledge reasoning, if an agent reaches a targeted node at the end of a search, the high-level agent receives a hit reward

Is 1, otherwise is

The lower-level agent then specifies the current entity e from the higher-level agent _t Initially, with the sub-target relationship r of the allocation _t ^h And realizing fine-grained entity hierarchical path exploration for the reasoning relationship. Low levelAgent reasoning obtained at a high level agent

Is 1, and assigns a relational target r to the lower level agent _t ^h And then carrying out specific entity hierarchy reasoning. The reinforcement learning elements contained in the low-level agent are as follows:

state (state), in a low-level knowledge inference, the agent considers the current node e at the same time _t Reasoning about the target relationship r _t ^h And historical embedding h of agent search path _t . At this time, the target relationship of the low-level agent is the sub-target r allocated by the high-level agent _t ^h . Thus, the low-level agent state for the t-th hop may be represented as

We represent the sequence of the historical inference path by gating the loop unit GRU.

Action (action), in the low-level knowledge inference, the agent realizes the fine-grained entity-level path exploration, and the action of the agent is all possible out-edges of the current node in the map, that is, the agent

Possible actions of next hop as

Meanwhile, because the fixed reasoning hop count of the agent is set, in order to prevent the agent from reaching the target node in advance, a cycle edge is added to each node, which is equivalent to the action space of the agent

There is an action to stop ahead.

Transfer function (transition), low-level knowledge reasoning is path exploration of fine-grained entity level, and the transfer function is

The transfer function of the low-level agent can be defined as

I.e. the current agent is in state s _t ＝(r _q ,e _t ,h _t ) Selects the next action

The state is converted into s _t+1 ＝(r _q ,e _n ,h _t+1 ). Meanwhile, the maximum hop count of the agent is limited to be T, and if the T hop does not reach the target entity, the state of the agent is s finally _T ＝(r _q ,e _T ,h _T )。

Reward (reward), in the model training process, if the low-level agent is subjected to multi-hop reasoning, the target entity is finally hit, namely tau = (e) _s ,r _q ,e _T ) E kg, at which point we define the reward earned by the low-level agent to be 1. At the same time, to better accommodate the open world assumption, i.e., whether a triplet holds or not, based on the current environment, we measure soft rewards by an embedding-based method f (τ)

Thus, the reward function for a low-level agent is as follows:

as a preferred embodiment, further, in knowledge inference by using high-level agents and low-level agents, the action selection of the agents in the inference environment is guided by using a policy network, the high-level agents and the low-level agents are used as elements for next decision according to the embedding of historical search paths, and the policy network parameters are updated by maximizing the reward of the agents. In the agent knowledge inference, agent parameters are initialized by using a meta-learning algorithm, and training is performed on a training sample by using an MAML (maximum likelihood model) method, so that the trained agent parameters are converged after being updated by N times of gradient iteration, wherein N is less than a preset maximum iteration number.

Referring to fig. 3, for the above multi-agent decision process, the policy network is used to guide the multi-agent action selection in different reasoning environments, so as to maximize the rewards of the high-level agents and the low-level agents, update the parameters of the agent policy network, and realize efficient knowledge reasoning. The action of each step is determined by the current entity

And the relation it takes

Of composition i.e. a _t ＝(r _t+1 ,e _t+1 )∈A _t . The history of the path search may be denoted as h _t ＝(e _s ,r ₁ ,e ₁ ,…,r _t ,e _t ) And embedding the high-order and low-order agents according to the historical search path as the elements of the next decision.

In the process of knowledge reasoning, the high-level intelligent agent outputs vectors through a strategy network

If the next action output of the agent is 0, the inference is ended. Otherwise, the vector is output

Reasoning continues as a goal of low-level agents. Policy function for high-level agents

The following were used:

if the high-level reasoning is not finished, the strategy network continues to guide the intelligent agent to carry out low-level reasoning, the low-level intelligent agent carries out specific path exploration among entities, and the action space of the intelligent agent is

Its policy function

The following were used:

where σ is the softmax function, W ₁ And W ₂ Respectively, are parameters that need to be trained in the policy network.

The multi-agent can realize hierarchical efficient reasoning, but one disadvantage brought by the multi-agent is that the parameter quantity of the model is overlarge during training. Without sufficient training samples, the model is difficult to converge. For this purpose, the meta-learning method can be used to perform initialization representation on the parameters of the reinforcement learning agent, wherein the initialization parameters are referred to as meta-information of the agent. Many inference tasks have similar implicit structures in nature, however, the number of different relationships presents a long-tail distribution, and training samples of normal sample and few sample relationships have a great number difference. In the embodiment of the scheme, the parameter initialization of the reinforcement learning agent is carried out based on the optimized meta-learning method MAML. Training on training samples with normal relations by applying the MAML method, wherein the trained parameter theta ^* The agent of (2) can quickly converge by only a few iterative updates of the parameter gradient. Therefore, the agent combined with the meta information can be effectively adapted to the training samples with the less-sample relation, the prediction effect of the less-sample relation is improved, and meanwhile, the training speed of the training samples with the normal relation can be accelerated. The specific algorithm can be designed as follows:

the method is summarized as follows:

I. initializing an embedded representation of entities and relationships in the graph;

II. Extracting mining rules and corresponding confidence scores by using the rules;

III, initializing parameters of the reinforcement learning agent through a meta-learning algorithm;

and IV, calculating the total reward of the intelligent agent. The reward is divided into two parts, regular reward and hit reward. When the agent hits the target entity, the agent receives a hit reward R _h (ii) a When the agent's inference path conforms to the combination of rules, the agent obtains a rule reward R at that time _r (ii) a The total reward ultimately earned by the agent is the sum of two partial rewards, namely: r _total ＝λR _r +(1-λ)R _h ；

V, training the strategy network of the intelligent agents through the expected rewards of the maximum inquiry, wherein the rewards corresponding to the high-level intelligent agents and the low-level intelligent agents are respectively as follows:

wherein the optimization of agent parameters is the expectation of maximizing the rewards of higher-order agents and lower-order agents by the REINFORCE algorithm.

Further, based on the above method, an embodiment of the present invention further provides a multi-agent knowledge inference system based on deep reinforcement learning, including: a rule mining module, an agent construction module and a knowledge reasoning module, wherein,

and the knowledge reasoning module is used for carrying out knowledge reasoning through subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent on the basis of the extracted entities, the relationship among the entities and the chain rule sequence and outputting a reasoning result.

Unless specifically stated otherwise, the relative steps, numerical expressions and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The elements of each example, and method steps, described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and the components and steps of each example have been described in a functional generic sense in the foregoing description for the purpose of illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Those skilled in the art will appreciate that all or part of the steps of the above methods can be implemented by a program instructing relevant hardware, and the program can be stored in a computer readable storage medium, such as: a read-only memory, a magnetic or optical disk, or the like. Alternatively, all or part of the steps of the foregoing embodiments may also be implemented by using one or more integrated circuits, and accordingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A multi-agent knowledge inference method based on deep reinforcement learning is characterized by comprising the following contents:

2. The multi-agent knowledge inference method based on deep reinforcement learning of claim 1, characterized in that in mining the rules in the target knowledge graph, structured chain rules are induced in the target knowledge graph using a rule induction method; and filters out the rules with confidence scores higher than the score threshold using a preset score threshold.

3. The deep reinforcement learning-based multi-agent knowledge inference method according to claim 1, wherein the elements of the constructed high-level agents and low-level agents each include: the intelligent agent comprises a state, an action, a reward, a strategy network and a transfer function, wherein the state represents a node embedded expression of the intelligent agent, the action represents all possible next-step operations of the intelligent agent on the node, the reward represents feedback obtained after the intelligent agent takes the action, the strategy network represents a network for the intelligent agent to carry out reinforcement learning according to the node state, the action, the reward and the transfer function, and the transfer function represents a transfer result of the state after the intelligent agent takes the next-step action.

4. The multi-agent knowledge inference method based on deep reinforcement learning of claim 3, characterized in that, in the elements of high-level agents, the state of the tth high-level agent is represented as

Wherein e is _t Representing the current physical node, r _q Representing the relationship of the inference of the current node,

wherein, the first and the second end of the pipe are connected with each other,

representing a high-level agent action space.

5. The multi-agent knowledge inference method based on deep reinforcement learning as claimed in claim 3, wherein, in the elements of the low-level agents, the t-th hop low-level agent state can be expressed as

historical embedding representing low-level agent search paths; action space of low-level agent

6. the multi-agent knowledge inference method based on deep reinforcement learning of claim 1, characterized in that in the knowledge inference process using high-level agents and low-level agents, the strategy network is used to guide the action selection of the agents in the inference environment, and both the high-level agents and the low-level agents take the embedding of their historical search paths as the elements of the next decision and update the strategy network parameters by maximizing the rewards of the agents.

7. The multi-agent knowledge inference method based on deep reinforcement learning of claim 6, characterized by high-level agent strategy function

Expressed as:

an output vector representing the high-level agent through its policy network,

representing the high-level agent next-hop possible actions,

representing the current high-level agent state, theta represents a parameter in the agent,

an embedded representation representing a high-level agent search history path,

an embedded representation representing the environment currently observed by the agent, leakyReLU () representing an activation function; low-level agent policy function

Expressed as:

where σ is the softmax function, W ₁ And W ₂ Respectively parameters in the low-level agent strategy network which need to be trained,

representing a low-level agent next-hop possible action,

representing the current low-level agent state,

representing the current entity node in the low-level agent reasoning.

8. The multi-agent knowledge inference method based on deep reinforcement learning of claim 6, characterized in that in the agent knowledge inference, the meta-learning algorithm is used to initialize each agent parameter, and the MAML method is applied to train on the training sample, so that the trained agent parameters are updated through N gradient iterations to complete convergence, where N is less than the preset maximum iteration number.

9. The multi-agent knowledge inference method based on deep reinforcement learning of claim 6, characterized in that, in the process of calculating the reward of an agent, if the sequence of inference paths corresponds to a rule, the confidence corresponding to the rule is rewarded as an additional rule; when the intelligent agent hits the correct tail entity, a hit reward is obtained; the agent reward is denoted R _total ＝λR _r +(1-λ)R _h Wherein R is _h For hit awards, R _r For regular awards, λ is a preset weight parameter.

10. A multi-agent knowledge inference system based on deep reinforcement learning, comprising: a rule mining module, an agent construction module and a knowledge reasoning module, wherein,