CN115526317A - Multi-agent knowledge inference method and system based on deep reinforcement learning - Google Patents

Multi-agent knowledge inference method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN115526317A
CN115526317A CN202211168518.7A CN202211168518A CN115526317A CN 115526317 A CN115526317 A CN 115526317A CN 202211168518 A CN202211168518 A CN 202211168518A CN 115526317 A CN115526317 A CN 115526317A
Authority
CN
China
Prior art keywords
agent
level
knowledge
reasoning
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211168518.7A
Other languages
Chinese (zh)
Inventor
夏毅
罗军勇
兰明敬
周刚
陈晓慧
卢记仓
刘铄
章梦礼
黄宁博
孙业鹏
李珠峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202211168518.7A priority Critical patent/CN115526317A/en
Publication of CN115526317A publication Critical patent/CN115526317A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of knowledge maps, and particularly relates to a multi-agent knowledge reasoning method and a multi-agent knowledge reasoning system based on deep reinforcement learning, wherein a structured chain rule sequence is induced in a target knowledge map by extracting entities and relationships among the entities in the target knowledge map; constructing a hierarchical intelligent agent for reinforcement learning, wherein the hierarchical intelligent agent at least comprises: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask; and on the basis of the extracted entities, the relationships among the entities and the chain rule sequence, performing knowledge reasoning by subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent, and outputting a reasoning result. The invention utilizes the layered multi-agent to carry out knowledge reasoning, and can solve the problems of overlarge search space when carrying out long-distance reasoning in a large-scale knowledge map at present.

Description

Multi-agent knowledge reasoning method and system based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of knowledge maps, and particularly relates to a multi-agent knowledge reasoning method and system based on deep reinforcement learning.
Background
In recent years, artificial intelligence research based on deep learning models is continuously progressing in breakthrough, but most of the artificial intelligence research has black box property, is not beneficial to human cognitive reasoning process, and leads to the common lack of transparency and interpretability of decision making of high-performance complex algorithms, models and systems. In the key fields of national defense, medical treatment, network and information security and the like with strict requirements on interpretability, the unexplainable property of the reasoning method has great influence on reasoning results and related backtracking, so the interpretability needs to be integrated into the algorithms and systems, and a reliable behavior interpretation mechanism is formed by assisting related prediction tasks through explicit interpretable knowledge reasoning. The knowledge graph, as one of the latest knowledge expression modes, describes entities and relationships in the objective world in a structured form by modeling a semantic network, and is widely applied to knowledge reasoning. Knowledge reasoning based on the knowledge graph explains the reasoning process by auxiliary means such as reasoning paths, logic rules and the like on the basis of discrete symbolic representation, and provides an important way for realizing interpretable artificial intelligence.
Knowledge reasoning is a process of generalizing from individual knowledge to general knowledge by obtaining new facts from known knowledge through reasoning and mining or generalizing a large amount of existing knowledge. Early reasoning research is mostly in the field of logic description and knowledge engineering, many scholars advocate formalized methods for describing the objective world, and it is always the focus of their research to consider that all reasoning is based on existing logic knowledge, such as first-order logic and predicate logic, and how to draw correct conclusions from known propositions and predicates. In recent years, with the explosive growth of the scale of internet data, the traditional method based on manual knowledge base establishment cannot meet the mining requirement of the big data era on a large amount of knowledge. Therefore, data-driven reasoning methods are becoming the mainstream of knowledge reasoning research. Knowledge-graph-based knowledge inference techniques are also explicable, not only in terms of the efficiency of knowledge inference. First, knowledge-graphs have interpretable advantages in presentation mode. Knowledge representation is a set of conventions made to describe the world, and is the process of symbolization, formalization or modeling of knowledge. Common knowledge representation methods comprise predicate logic representation, production type representation, distributed knowledge representation and the like, and as a novel knowledge representation method, compared with the traditional knowledge representation methods such as the production type representation method, the knowledge graph has the advantages of rich semantics, friendly structure and easy understanding of knowledge organization structure. Second, knowledge-graph based reasoning has the advantage of being interpretable during reasoning. Most of the processes of understanding the world and things by human beings are understanding and cognizing by concepts, attributes and relationships, for example, for the question "why a bird will fly? "human interpretation may be" bird has wings "which essentially uses the attribute in the interpretation. The knowledge graph is rich in information such as entities, concepts, attributes and relations, massive knowledge is organized in a formalized mode through a graph structure, the model is visually modeled for each inference scene of the real world, and the final decision can be specifically explained by more sources. Finally, the knowledge graph has the advantage of interpretability in storage and use, compared with other storage forms, the knowledge graph constructs and stores knowledge in a triple form, and the knowledge graph is closer to the cognition and learning habits of people generally knowing things 'major and predicate', so that the knowledge graph is more friendly to human understanding, and the interpretability of people is stronger compared with other knowledge representation methods.
Although the prior knowledge inference method based on the knowledge graph has good effect and provides good interpretability, the following defects still exist: (1) In the reasoning process, a large number of paths need to be explored by a reasoning agent, and the complexity exponentially rises every time a walk with one hop is added, so that the search space of the current reasoning model is generally large, and a long-distance reasoning task cannot be performed. (2) At present, most inference models only carry out inference experiments on experimental data sets, but are difficult to apply to large-scale knowledge maps with millions of entities like Wikidata and Freebase in real life due to the problems of overlarge search space, sparse maps and the like.
Disclosure of Invention
Therefore, the invention provides a multi-agent knowledge reasoning method and a multi-agent knowledge reasoning system based on deep reinforcement learning, which utilize layered multi-agents to carry out knowledge reasoning and can solve the problems of overlarge search space when carrying out long-distance reasoning in a large-scale knowledge map at present.
According to the design scheme provided by the invention, a multi-agent knowledge inference method based on deep reinforcement learning is provided, which comprises the following contents:
extracting entities and relationships among the entities in the target knowledge graph, and inducing a structured chain rule sequence in the target knowledge graph;
constructing a hierarchical intelligent agent for reinforcement learning, wherein the hierarchical intelligent agent at least comprises: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask;
and on the basis of the extracted entities, the relationships among the entities and the chain rule sequence, performing knowledge reasoning by subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent, and outputting a reasoning result.
As the multi-agent knowledge inference method based on deep reinforcement learning, further, in the process of mining the rules in the target knowledge map, a structured chain rule is induced in the target knowledge map by using a rule induction method; and using a preset score threshold to screen out the rules with confidence scores higher than the score threshold.
As the multi-agent knowledge inference method based on deep reinforcement learning in the invention, furthermore, the elements of the constructed high-level agent and low-level agent comprise: the intelligent agent comprises a state, an action, a reward, a strategy network and a transfer function, wherein the state represents a node where the intelligent agent is embedded and represented, the action represents all possible next-step operations of the intelligent agent on the node where the intelligent agent is located, the reward represents feedback obtained after the intelligent agent takes the action, the strategy network represents a network for the intelligent agent to carry out reinforcement learning according to the node state, the action, the reward and the transfer function, and the transfer function represents a transfer result of the state after the intelligent agent carries out the next-step actions.
As the multi-agent knowledge inference method based on deep reinforcement learning, the state of the tth high-level agent in the elements of the high-level agent is expressed as
Figure BDA0003862500180000031
Wherein e is t Represents the current physical node, r q Representing the relationship of the inference of the current node,
Figure BDA0003862500180000032
historical embeddings representing high-level agent search paths; all possible next operations of the high-level agent at the current entity node are expressed as:
Figure BDA0003862500180000033
wherein,
Figure BDA0003862500180000034
Representing a high-level agent action space.
As the multi-agent knowledge inference method based on deep reinforcement learning in the invention, furthermore, the state of the t-th hop of the low-level agent in the elements of the low-level agent can be expressed as
Figure BDA0003862500180000035
Wherein e is t Representing the current physical node, r t h Representing the sub-goals assigned by the high-level agent,
Figure BDA0003862500180000036
historical embeddings representing low-level agent search paths; action space of low-level agent
Figure BDA0003862500180000037
For all possible outgoing edges of the current node in the graph, the possible actions of the next hop of the low-level agent are represented as follows:
Figure BDA0003862500180000038
as the multi-agent knowledge inference method based on deep reinforcement learning, the invention further utilizes a strategy network to guide the action selection of the agents in the inference environment in knowledge inference by utilizing high-level agents and low-level agents, the high-level agents and the low-level agents are used as the elements of the next decision according to the embedding of the historical search path, and the strategy network parameters are updated by maximizing the reward of the agents.
As the multi-agent knowledge inference method based on deep reinforcement learning, further, a high-level agent strategy function
Figure BDA0003862500180000039
Expressed as:
Figure BDA00038625001800000310
an output vector representing the high-level agent through its policy network,
Figure BDA00038625001800000311
representing the high-level agent next-hop possible actions,
Figure BDA00038625001800000312
representing the current high-level agent state, theta represents the agent's parameters,
Figure BDA00038625001800000313
historical embeddings representing high-level agent search paths,
Figure BDA00038625001800000314
representing an embedded representation of the environment that the agent is currently able to observe, leakyReLU () representing an activation function; low-level agent policy function
Figure BDA00038625001800000315
Expressed as:
Figure BDA00038625001800000316
where σ is the softmax function, W 1 And W 2 Respectively parameters to be trained in the low-level agent policy network,
Figure BDA00038625001800000317
representing a low-level agent next-hop possible action,
Figure BDA00038625001800000318
representing the current low-level agent state,
Figure BDA00038625001800000319
history embedding, A, representing low-level agent search paths t Representing a low-level smart body motion space,
Figure BDA00038625001800000320
representing the current entity node in the low-level agent reasoning.
As the multi-agent knowledge inference method based on deep reinforcement learning, the invention further initializes each agent parameter by using a meta-learning algorithm in agent knowledge inference, and trains on a training sample by applying an MAML method, so that the trained agent parameters are converged after being updated by N times of gradient iteration, wherein N is less than the preset maximum iteration number.
As the multi-agent knowledge reasoning method based on deep reinforcement learning, further, in the process of calculating the reward of the agents, if the sequence of the reasoning path corresponds to the rule, the confidence coefficient corresponding to the rule is used as an additional rule reward; and when the intelligent agent hits the correct tail entity, a hit reward is obtained; the agent reward is denoted R total =λR r +(1-λ)R h Wherein R is h For hit reward, R r For regular awards, λ is a preset weight parameter.
Further, the invention also provides a multi-agent knowledge inference system based on deep reinforcement learning, comprising: a rule mining module, an agent construction module and a knowledge reasoning module, wherein,
the rule mining module is used for extracting entities and relationships among the entities in the target knowledge graph and inducing a structured chain rule sequence in the target knowledge graph;
an agent construction module for constructing a hierarchical agent for reinforcement learning, the hierarchical agent at least comprising: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask;
and the knowledge reasoning module is used for carrying out knowledge reasoning by subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent on the basis of the extracted entities, the relationships among the entities and the chain rule sequence, and outputting a reasoning result.
The invention has the beneficial effects that:
based on deep reinforcement learning, the invention combines multiple agents to carry out hierarchical reasoning on the entities and the relations of the knowledge graph respectively, and realizes knowledge reasoning which is more efficient and longer in distance for the knowledge graph; the meta information is used for initializing the parameters of the reinforcement learning agent, so that the training samples with few sample relations can be effectively adapted in a large-scale map environment, the prediction effect of the few sample relations is improved, and the training speed of normal relations can be accelerated; the reasoning process can be displayed in a path form, the interpretability is better, compared with the current black box neural network which can not directly output reasoning results in an interpretable manner, the model is relatively more transparent, and the trust degree of model decision making is higher. In addition, the unexplainable property of the reasoning method has a great influence on the reasoning result and the related backtracking, and the explicit reasoning method adopted in the scheme has another advantage of better backtracking the wrong reasoning sample.
Description of the drawings:
FIG. 1 is a schematic diagram of a knowledge inference process in an embodiment;
FIG. 2 is a schematic representation of a generalized structured chain rule sequence in an example embodiment;
FIG. 3 is a schematic diagram of the multi-agent reasoning principle in the embodiment.
The specific implementation mode is as follows:
in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.
An embodiment of the present invention, as shown in fig. 1, provides a multi-agent knowledge inference method based on deep reinforcement learning, including:
s101, extracting entities and relationships among the entities in a target knowledge graph, and inducing a structured chain rule sequence in the target knowledge graph;
s102, constructing a hierarchical intelligent agent for reinforcement learning, wherein the hierarchical intelligent agent at least comprises: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask;
and S103, carrying out knowledge reasoning through subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent on the basis of the extracted entities, the relationship among the entities and the chain rule sequence, and outputting a reasoning result.
Through normal chain-like reasoning, the complexity of the search space increases exponentially with the increase of the hop count, and most of the current reasoning is within 4 hops. In the embodiment of the scheme, referring to fig. 2, the reasoning process is divided into independent subtasks for exploring an abstract relationship, and then specific fine-grained entity-level path exploration is performed on each subtask, so that exponential-level multiplication is performed on an increased search space, the complexity is reduced to a level of accumulating the complexity of two shorter search tasks, the search space is greatly reduced, and meanwhile, the hierarchical reasoning mode is more consistent with a thinking mode of dividing and treating human beings, and the problems encountered are processed hierarchically, so that the method is more interpretable, and the reasoning result and the reasoning process are more acceptable to people.
As a preferred embodiment, further, in mining the rules in the target knowledge graph, a rule induction method is used to induce the structured chain rules in the target knowledge graph; and filters out the rules with confidence scores higher than the score threshold using a preset score threshold.
In the embodiment of the scheme, hierarchical knowledge reasoning is realized by a plurality of intelligent agents through reinforcement learning in two levels, firstly, the high-level intelligent agent decomposes an abstract relation layer, and the low-level intelligent agent explores a path of a concrete entity. Each reinforcement learning agent has the following elements: state (state), action (action), reward (rewarded), policy network (policy network) and transfer function (transition) per step of the agent. The state represents a node embedded representation of the intelligent agent, the action represents all possible next-step operations of the intelligent agent on the current node, the reward represents feedback obtained after the intelligent agent takes action, the strategy network represents a network for the intelligent agent to perform reinforcement learning according to the node state, the action, the reward and a transfer function, and the transfer function represents a transfer result of the state after the intelligent agent performs the next-step action.
High-level agent subordinate entity e s At the beginning, with r q And reasoning of the abstract path relation is realized for the reasoning relation. The high-level reasoning focuses more on reasoning the abstract conceptual relationship, so that the decomposition of the subtasks of the high-level complex reasoning tasks is realized, the subtasks are distributed to the low-level intelligent agents, and the path exploration of the concrete entity level is realized. The reinforcement learning elements contained in the high-level intelligent agent are as follows:
the state (state) represents an embedded representation of the node where the agent is currently located. In high-level knowledge reasoning, high-level agents not only consider the current node e t And relation of reasoning r q And also takes into account the historical embedding h of the agent's search path t Thus, the state of the tth jump level agent may be represented as
Figure BDA0003862500180000061
We represent the sequence of the historical inference path by a Gated current Unit (GRU).
Figure BDA0003862500180000062
Figure BDA0003862500180000063
Action (action) represents all possible next operations of the agent at the current node
Figure BDA0003862500180000064
In the context of a high-level knowledge inference,
Figure BDA0003862500180000065
if the high-level agent reaches the target node, the high-level agent
Figure BDA0003862500180000066
And (6) finishing the reasoning. Otherwise
Figure BDA0003862500180000067
Low-level knowledge reasoning will continue.
The transfer function (transition) represents the result of the state through the next action of the agent. If it is used
Figure BDA0003862500180000068
At this moment, reasoning is finished, and no state transition is carried out; otherwise, reasoning continues, and the state of the high-level agent is transformed phi (-) from the state of the high-level agent
Figure BDA0003862500180000069
State of transforming to low-order agent
Figure BDA00038625001800000610
Namely, it is
Figure BDA00038625001800000611
Sub-target for simultaneously outputting low-level intelligent agent
Figure BDA00038625001800000612
Reward (reward) represents feedback obtained after an action is taken by a reinforcement learning agent to assess the agent's current state s t Take this action a t The expression (2). In high-level knowledge reasoning, if an agent reaches a targeted node at the end of a search, the high-level agent receives a hit reward
Figure BDA00038625001800000613
Is 1, otherwise is
Figure BDA00038625001800000614
The lower-level agent then specifies the current entity e from the higher-level agent t Initially, with the sub-target relationship r of the allocation t h And realizing fine-grained entity hierarchical path exploration for the reasoning relationship. Low levelAgent reasoning obtained at a high level agent
Figure BDA00038625001800000615
Is 1, and assigns a relational target r to the lower level agent t h And then carrying out specific entity hierarchy reasoning. The reinforcement learning elements contained in the low-level agent are as follows:
state (state), in a low-level knowledge inference, the agent considers the current node e at the same time t Reasoning about the target relationship r t h And historical embedding h of agent search path t . At this time, the target relationship of the low-level agent is the sub-target r allocated by the high-level agent t h . Thus, the low-level agent state for the t-th hop may be represented as
Figure BDA00038625001800000616
We represent the sequence of the historical inference path by gating the loop unit GRU.
Action (action), in the low-level knowledge inference, the agent realizes the fine-grained entity-level path exploration, and the action of the agent is all possible out-edges of the current node in the map, that is, the agent
Figure BDA00038625001800000617
Possible actions of next hop as
Figure BDA0003862500180000071
Meanwhile, because the fixed reasoning hop count of the agent is set, in order to prevent the agent from reaching the target node in advance, a cycle edge is added to each node, which is equivalent to the action space of the agent
Figure BDA0003862500180000072
There is an action to stop ahead.
Transfer function (transition), low-level knowledge reasoning is path exploration of fine-grained entity level, and the transfer function is
Figure BDA0003862500180000073
The transfer function of the low-level agent can be defined as
Figure BDA0003862500180000074
I.e. the current agent is in state s t =(r q ,e t ,h t ) Selects the next action
Figure BDA0003862500180000075
The state is converted into s t+1 =(r q ,e n ,h t+1 ). Meanwhile, the maximum hop count of the agent is limited to be T, and if the T hop does not reach the target entity, the state of the agent is s finally T =(r q ,e T ,h T )。
Reward (reward), in the model training process, if the low-level agent is subjected to multi-hop reasoning, the target entity is finally hit, namely tau = (e) s ,r q ,e T ) E kg, at which point we define the reward earned by the low-level agent to be 1. At the same time, to better accommodate the open world assumption, i.e., whether a triplet holds or not, based on the current environment, we measure soft rewards by an embedding-based method f (τ)
Figure BDA0003862500180000076
Thus, the reward function for a low-level agent is as follows:
Figure BDA0003862500180000077
as a preferred embodiment, further, in knowledge inference by using high-level agents and low-level agents, the action selection of the agents in the inference environment is guided by using a policy network, the high-level agents and the low-level agents are used as elements for next decision according to the embedding of historical search paths, and the policy network parameters are updated by maximizing the reward of the agents. In the agent knowledge inference, agent parameters are initialized by using a meta-learning algorithm, and training is performed on a training sample by using an MAML (maximum likelihood model) method, so that the trained agent parameters are converged after being updated by N times of gradient iteration, wherein N is less than a preset maximum iteration number.
Referring to fig. 3, for the above multi-agent decision process, the policy network is used to guide the multi-agent action selection in different reasoning environments, so as to maximize the rewards of the high-level agents and the low-level agents, update the parameters of the agent policy network, and realize efficient knowledge reasoning. The action of each step is determined by the current entity
Figure BDA0003862500180000078
And the relation it takes
Figure BDA0003862500180000079
Of composition i.e. a t =(r t+1 ,e t+1 )∈A t . The history of the path search may be denoted as h t =(e s ,r 1 ,e 1 ,…,r t ,e t ) And embedding the high-order and low-order agents according to the historical search path as the elements of the next decision.
In the process of knowledge reasoning, the high-level intelligent agent outputs vectors through a strategy network
Figure BDA00038625001800000710
If the next action output of the agent is 0, the inference is ended. Otherwise, the vector is output
Figure BDA00038625001800000711
Reasoning continues as a goal of low-level agents. Policy function for high-level agents
Figure BDA00038625001800000712
The following were used:
Figure BDA00038625001800000713
Figure BDA0003862500180000081
if the high-level reasoning is not finished, the strategy network continues to guide the intelligent agent to carry out low-level reasoning, the low-level intelligent agent carries out specific path exploration among entities, and the action space of the intelligent agent is
Figure BDA0003862500180000082
Its policy function
Figure BDA0003862500180000083
The following were used:
Figure BDA0003862500180000084
where σ is the softmax function, W 1 And W 2 Respectively, are parameters that need to be trained in the policy network.
The multi-agent can realize hierarchical efficient reasoning, but one disadvantage brought by the multi-agent is that the parameter quantity of the model is overlarge during training. Without sufficient training samples, the model is difficult to converge. For this purpose, the meta-learning method can be used to perform initialization representation on the parameters of the reinforcement learning agent, wherein the initialization parameters are referred to as meta-information of the agent. Many inference tasks have similar implicit structures in nature, however, the number of different relationships presents a long-tail distribution, and training samples of normal sample and few sample relationships have a great number difference. In the embodiment of the scheme, the parameter initialization of the reinforcement learning agent is carried out based on the optimized meta-learning method MAML. Training on training samples with normal relations by applying the MAML method, wherein the trained parameter theta * The agent of (2) can quickly converge by only a few iterative updates of the parameter gradient. Therefore, the agent combined with the meta information can be effectively adapted to the training samples with the less-sample relation, the prediction effect of the less-sample relation is improved, and meanwhile, the training speed of the training samples with the normal relation can be accelerated. The specific algorithm can be designed as follows:
Figure BDA0003862500180000085
Figure BDA0003862500180000091
the method is summarized as follows:
I. initializing an embedded representation of entities and relationships in the graph;
II. Extracting mining rules and corresponding confidence scores by using the rules;
III, initializing parameters of the reinforcement learning agent through a meta-learning algorithm;
and IV, calculating the total reward of the intelligent agent. The reward is divided into two parts, regular reward and hit reward. When the agent hits the target entity, the agent receives a hit reward R h (ii) a When the agent's inference path conforms to the combination of rules, the agent obtains a rule reward R at that time r (ii) a The total reward ultimately earned by the agent is the sum of two partial rewards, namely: r total =λR r +(1-λ)R h
V, training the strategy network of the intelligent agents through the expected rewards of the maximum inquiry, wherein the rewards corresponding to the high-level intelligent agents and the low-level intelligent agents are respectively as follows:
Figure BDA0003862500180000092
Figure BDA0003862500180000093
wherein the optimization of agent parameters is the expectation of maximizing the rewards of higher-order agents and lower-order agents by the REINFORCE algorithm.
Figure BDA0003862500180000094
Figure BDA0003862500180000095
Figure BDA0003862500180000096
Figure BDA0003862500180000101
Further, based on the above method, an embodiment of the present invention further provides a multi-agent knowledge inference system based on deep reinforcement learning, including: a rule mining module, an agent construction module and a knowledge reasoning module, wherein,
the rule mining module is used for extracting entities and relationships among the entities in the target knowledge graph and inducing a structured chain rule sequence in the target knowledge graph;
an agent construction module for constructing a hierarchical agent for reinforcement learning, the hierarchical agent at least comprising: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask;
and the knowledge reasoning module is used for carrying out knowledge reasoning through subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent on the basis of the extracted entities, the relationship among the entities and the chain rule sequence and outputting a reasoning result.
Unless specifically stated otherwise, the relative steps, numerical expressions and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The elements of each example, and method steps, described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and the components and steps of each example have been described in a functional generic sense in the foregoing description for the purpose of illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Those skilled in the art will appreciate that all or part of the steps of the above methods can be implemented by a program instructing relevant hardware, and the program can be stored in a computer readable storage medium, such as: a read-only memory, a magnetic or optical disk, or the like. Alternatively, all or part of the steps of the foregoing embodiments may also be implemented by using one or more integrated circuits, and accordingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A multi-agent knowledge inference method based on deep reinforcement learning is characterized by comprising the following contents:
extracting entities and relationships among the entities in the target knowledge graph, and inducing a structured chain rule sequence in the target knowledge graph;
constructing a hierarchical intelligent agent for reinforcement learning, wherein the hierarchical intelligent agent at least comprises: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask;
and on the basis of the extracted entities, the relationships among the entities and the chain rule sequence, performing knowledge reasoning by subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent, and outputting a reasoning result.
2. The multi-agent knowledge inference method based on deep reinforcement learning of claim 1, characterized in that in mining the rules in the target knowledge graph, structured chain rules are induced in the target knowledge graph using a rule induction method; and filters out the rules with confidence scores higher than the score threshold using a preset score threshold.
3. The deep reinforcement learning-based multi-agent knowledge inference method according to claim 1, wherein the elements of the constructed high-level agents and low-level agents each include: the intelligent agent comprises a state, an action, a reward, a strategy network and a transfer function, wherein the state represents a node embedded expression of the intelligent agent, the action represents all possible next-step operations of the intelligent agent on the node, the reward represents feedback obtained after the intelligent agent takes the action, the strategy network represents a network for the intelligent agent to carry out reinforcement learning according to the node state, the action, the reward and the transfer function, and the transfer function represents a transfer result of the state after the intelligent agent takes the next-step action.
4. The multi-agent knowledge inference method based on deep reinforcement learning of claim 3, characterized in that, in the elements of high-level agents, the state of the tth high-level agent is represented as
Figure FDA0003862500170000011
Wherein e is t Representing the current physical node, r q Representing the relationship of the inference of the current node,
Figure FDA0003862500170000012
historical embeddings representing high-level agent search paths; all possible next operations of the high-level agent at the current entity node are expressed as:
Figure FDA0003862500170000013
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003862500170000014
representing a high-level agent action space.
5. The multi-agent knowledge inference method based on deep reinforcement learning as claimed in claim 3, wherein, in the elements of the low-level agents, the t-th hop low-level agent state can be expressed as
Figure FDA0003862500170000015
Wherein e is t Representing the current physical node, r t h Representing the sub-goals assigned by the high-level agent,
Figure FDA0003862500170000016
historical embedding representing low-level agent search paths; action space of low-level agent
Figure FDA0003862500170000017
For all possible outgoing edges of the current node in the graph, the possible actions of the next hop of the low-level agent are represented as follows:
Figure FDA0003862500170000018
6. the multi-agent knowledge inference method based on deep reinforcement learning of claim 1, characterized in that in the knowledge inference process using high-level agents and low-level agents, the strategy network is used to guide the action selection of the agents in the inference environment, and both the high-level agents and the low-level agents take the embedding of their historical search paths as the elements of the next decision and update the strategy network parameters by maximizing the rewards of the agents.
7. The multi-agent knowledge inference method based on deep reinforcement learning of claim 6, characterized by high-level agent strategy function
Figure FDA0003862500170000021
Expressed as:
Figure FDA0003862500170000022
Figure FDA0003862500170000023
an output vector representing the high-level agent through its policy network,
Figure FDA0003862500170000024
representing the high-level agent next-hop possible actions,
Figure FDA0003862500170000025
representing the current high-level agent state, theta represents a parameter in the agent,
Figure FDA0003862500170000026
an embedded representation representing a high-level agent search history path,
Figure FDA0003862500170000027
an embedded representation representing the environment currently observed by the agent, leakyReLU () representing an activation function; low-level agent policy function
Figure FDA0003862500170000028
Expressed as:
Figure FDA0003862500170000029
where σ is the softmax function, W 1 And W 2 Respectively parameters in the low-level agent strategy network which need to be trained,
Figure FDA00038625001700000210
representing a low-level agent next-hop possible action,
Figure FDA00038625001700000211
representing the current low-level agent state,
Figure FDA00038625001700000212
history embedding, A, representing low-level agent search paths t Representing a low-level smart body motion space,
Figure FDA00038625001700000213
representing the current entity node in the low-level agent reasoning.
8. The multi-agent knowledge inference method based on deep reinforcement learning of claim 6, characterized in that in the agent knowledge inference, the meta-learning algorithm is used to initialize each agent parameter, and the MAML method is applied to train on the training sample, so that the trained agent parameters are updated through N gradient iterations to complete convergence, where N is less than the preset maximum iteration number.
9. The multi-agent knowledge inference method based on deep reinforcement learning of claim 6, characterized in that, in the process of calculating the reward of an agent, if the sequence of inference paths corresponds to a rule, the confidence corresponding to the rule is rewarded as an additional rule; when the intelligent agent hits the correct tail entity, a hit reward is obtained; the agent reward is denoted R total =λR r +(1-λ)R h Wherein R is h For hit awards, R r For regular awards, λ is a preset weight parameter.
10. A multi-agent knowledge inference system based on deep reinforcement learning, comprising: a rule mining module, an agent construction module and a knowledge reasoning module, wherein,
the rule mining module is used for extracting entities and relationships among the entities in the target knowledge graph and inducing a structured chain rule sequence in the target knowledge graph;
an agent construction module for constructing a hierarchical agent for reinforcement learning, the hierarchical agent at least comprising: the system comprises a high-level intelligent agent and a low-level intelligent agent, wherein the high-level intelligent agent is used for dividing a reasoning process into subtasks by extracting a knowledge graph abstract relation, and the low-level intelligent agent is used for performing entity path reasoning on each subtask;
and the knowledge reasoning module is used for carrying out knowledge reasoning through subtask division of the high-level intelligent agent and entity path exploration of the low-level intelligent agent on the basis of the extracted entities, the relationship among the entities and the chain rule sequence and outputting a reasoning result.
CN202211168518.7A 2022-09-24 2022-09-24 Multi-agent knowledge inference method and system based on deep reinforcement learning Pending CN115526317A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211168518.7A CN115526317A (en) 2022-09-24 2022-09-24 Multi-agent knowledge inference method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211168518.7A CN115526317A (en) 2022-09-24 2022-09-24 Multi-agent knowledge inference method and system based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN115526317A true CN115526317A (en) 2022-12-27

Family

ID=84699095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211168518.7A Pending CN115526317A (en) 2022-09-24 2022-09-24 Multi-agent knowledge inference method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115526317A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116010621A (en) * 2023-01-10 2023-04-25 华中师范大学 Rule-guided self-adaptive path generation method
CN116610822A (en) * 2023-07-21 2023-08-18 南京邮电大学 Knowledge graph multi-hop reasoning method for diabetes text
CN116911202A (en) * 2023-09-11 2023-10-20 北京航天晨信科技有限责任公司 Agent training method and device based on multi-granularity simulation training environment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116010621A (en) * 2023-01-10 2023-04-25 华中师范大学 Rule-guided self-adaptive path generation method
CN116010621B (en) * 2023-01-10 2023-08-11 华中师范大学 Rule-guided self-adaptive path generation method
CN116610822A (en) * 2023-07-21 2023-08-18 南京邮电大学 Knowledge graph multi-hop reasoning method for diabetes text
CN116911202A (en) * 2023-09-11 2023-10-20 北京航天晨信科技有限责任公司 Agent training method and device based on multi-granularity simulation training environment
CN116911202B (en) * 2023-09-11 2023-11-17 北京航天晨信科技有限责任公司 Agent training method and device based on multi-granularity simulation training environment

Similar Documents

Publication Publication Date Title
CN115526317A (en) Multi-agent knowledge inference method and system based on deep reinforcement learning
CN111581343B (en) Reinforced learning knowledge graph reasoning method and device based on graph convolution neural network
Hesp et al. A multi-scale view of the emergent complexity of life: A free-energy proposal
US11086938B2 (en) Interpreting human-robot instructions
Chen et al. An improved bat algorithm hybridized with extremal optimization and Boltzmann selection
CN114860893B (en) Intelligent decision-making method and device based on multi-mode data fusion and reinforcement learning
Wang et al. A novel discrete firefly algorithm for Bayesian network structure learning
CN115526321A (en) Knowledge reasoning method and system based on intelligent agent dynamic path completion strategy
Goertzel The general theory of general intelligence: a pragmatic patternist perspective
Keselman et al. Reinforcement learning with a* and a deep heuristic
Zheng et al. Hybrid particle swarm optimizer with fitness-distance balance and individual self-exploitation strategies for numerical optimization problems
CN112264999A (en) Method, device and storage medium for intelligent agent continuous space action planning
CN116128060A (en) Chess game method based on opponent modeling and Monte Carlo reinforcement learning
David et al. DEVS model construction as a reinforcement learning problem
Lu et al. A causal-based symbolic reasoning framework for uncertain knowledge graphs
Simmons-Edler et al. Program synthesis through reinforcement learning guided tree search
Salama et al. Extending the ABC-Miner Bayesian classification algorithm
CN115906673B (en) Combat entity behavior model integrated modeling method and system
CN116841708A (en) Multi-agent reinforcement learning method based on intelligent planning
CN114662693A (en) Reinforced learning knowledge graph reasoning method based on action sampling
Hegde et al. Analysing the practicality of drawing inferences in automation of commonsense reasoning
Iqbal Improving the scalability of XCS-based learning classifier systems
Peng et al. Conservative network for offline reinforcement learning
Durugkar et al. Multi-preference actor critic
Zhu et al. Learning bayesian networks in the space of structures by a hybrid optimization algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination