CN117575007B - Large model knowledge completion method and system based on post-decoding credibility enhancement - Google Patents

Large model knowledge completion method and system based on post-decoding credibility enhancement Download PDF

Info

Publication number
CN117575007B
CN117575007B CN202410063977.1A CN202410063977A CN117575007B CN 117575007 B CN117575007 B CN 117575007B CN 202410063977 A CN202410063977 A CN 202410063977A CN 117575007 B CN117575007 B CN 117575007B
Authority
CN
China
Prior art keywords
post
large model
decoding
knowledge
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410063977.1A
Other languages
Chinese (zh)
Other versions
CN117575007A (en
Inventor
陶建华
车飞虎
张帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202410063977.1A priority Critical patent/CN117575007B/en
Publication of CN117575007A publication Critical patent/CN117575007A/en
Application granted granted Critical
Publication of CN117575007B publication Critical patent/CN117575007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/027Frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a large model knowledge completion method and system based on post-decoding credibility enhancement, and relates to the technical field of knowledge completion. In the embodiment of the invention, aiming at the target knowledge field, before the answer is output by the large model, a post-decoding module is used for carrying out post-decoding processing on the hidden state of the output of the large model, the hidden state of the output of the large model is properly adjusted based on the post-decoding module corresponding to the target knowledge field, the error content can be corrected, and the output of the large model is carried out based on the fusion moduleS n And the decoded state output by the post-decoding moduleR n The fusion is carried out, the final result is obtained through calculation, and the knowledge completion is carried out, so that the 'illusion' output problem of a large model can be relieved, and the accuracy of the knowledge completion process is enhanced; and the method can be effectively expanded in knowledge completion tasks in different knowledge fields, and the missing part of the knowledge can be accurately completed.

Description

Large model knowledge completion method and system based on post-decoding credibility enhancement
Technical Field
The embodiment of the invention relates to the technical field of knowledge completion, in particular to a large model knowledge completion method and system based on post-decoding credibility enhancement.
Background
Knowledge graph knowledge completion is a process of automatically filling missing information or relationships in a knowledge graph by using artificial intelligence technology. Knowledge graph is a structured knowledge representation that models entities, properties, and relationships in the real world in the form of graphs so that a computer can better understand and infer these knowledge. However, real world knowledge is large and complex, and building a complete knowledge graph is a difficult task. Even if an initial knowledge-graph has been established, there may still be a significant amount of missing information or incomplete relationships. At this time, knowledge graph knowledge completion plays a role. By analyzing the existing knowledge and combining the techniques of natural language processing, machine learning, graphic neural network and the like, possible entities, attributes or relations can be predicted, so that the blank part of the knowledge graph is filled.
The knowledge completion system based on the large model adopts the large model as a basic framework of knowledge completion, and uses the knowledge stored in the large model to complete the missing triples in the knowledge graph, so that the knowledge completion system based on the large model has many advantages, and the large model can rapidly complete missing contents by using the self-enriched training data and parameter quantity. However, the content of the large model output may have a "illusion" problem, resulting in some error in the completed triples.
Thus, a new large model knowledge completion system is currently needed.
Disclosure of Invention
The embodiment of the invention provides a large model knowledge completion method and a large model knowledge completion system based on post-decoding credibility enhancement, which are used for at least partially solving the problems existing in the related technology.
The first aspect of the embodiment of the invention provides a large model knowledge completion method based on post-decoding credibility enhancement, which comprises the following steps:
constructing a prompt text prompt based on triples to be complemented in the target knowledge field;
layer-by-layer reasoning based on prompt through large model to obtain hidden layer stateS n
Through a post-decoding module pair based on a multi-layer perceptronS n Post-decoding to output the large modelSnPerforming adjustment and correction to generate decoded stateR n
Outputting the large model through a fusion moduleS n And the decoded stateR n Fusing and calculating to obtain a final result G;
and completing the triples to be completed according to the final result G.
Optionally, the method further comprises:
acquiring a known triplet in the target knowledge domain;
hiding any entity in the known triples, and constructing a training sample based on the triples after hiding any entity;
and training the post-decoding module to be trained and the fusion module to be trained based on the training samples.
Optionally, hiding any entity in the known triples, and constructing a training sample based on the triples after hiding any entity, including:
determining the occurrence frequency of each entity in the known triples;
hiding low-frequency entities with low occurrence frequency in the known triples;
training samples are constructed based on triplets after concealment of low frequency entities.
Optionally, based on the training samples, training a large model to be trained, a post-decoding module to be trained, and a fusion module to be trained, including:
inputting the training sample into a large model, inputting a sample hidden layer output by the large model into a post-decoding module to be trained, adjusting and correcting the sample hidden layer output by the large model through the post-decoding module to be trained to generate a sample decoded state, and inputting the sample hidden layer state and the sample decoded state into a fusion module to be trained to obtain a sample final result;
and aiming at minimizing the difference between the final result of the sample and the hidden entity, keeping the parameters of the large model unchanged, and updating the parameters of the post-decoding module to be trained and the fusion module to be trained.
Optionally, the model structure of the large model is based on a transducer architecture, and the large model obtains hidden layers based on a prompt layer-by-layer reasoning through the following formulaS n
Wherein the method comprises the steps ofRepresenting head entity, relation, tail entity, respectively, in the triplet->Representing the missing part in the triplet to be completed, and (2)>Represents->Layer transducer->Representing the state of the transform hidden layer of the nth output.
Optionally, the post-decoding module is of a neural network structure, and the post-decoding module is configured by the following formula pairS n Post-decoding to output the large modelSnPerforming adjustment and correction to generate decoded stateR n :
……;
Wherein the method comprises the steps of、/>、/>Representing weight parameters->、/>、/>Represents a bias parameter->Representing a nonlinear activation function.
Optionally, the fusion module calculates the final result G based on the following formula:
wherein the method comprises the steps of、/>Weight parameter representing gating mechanism, +.>、/>As a biasing parameter for the gating mechanism,and->All represent the corresponding nonlinear activation function, +.>And->Representing intermediate results of the gating mechanism, respectively.
The second aspect of the embodiment of the invention provides a large model knowledge completion system based on post-decoding credibility enhancement, which comprises the following components: the system comprises a large model, a post-decoding module based on a multi-layer perceptron and a fusion module, wherein the large model knowledge completion system is used for executing the steps in the method of the first aspect of the invention.
A third aspect of the embodiments of the present invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method according to the first aspect of the invention when the computer program is executed.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to the first aspect of the present invention.
In the embodiment of the invention, aiming at the target knowledge field, a post-decoding module is used for outputting answers to a big modelThe output hidden state is subjected to post-decoding processing, the hidden state output by the large model is properly adjusted based on a post-decoding module corresponding to the target knowledge field, error content can be corrected, and the large model is output based on a fusion moduleS n And the decoded state output by the post-decoding moduleR n The fusion is carried out, the final result is obtained through calculation, and the knowledge completion is carried out, so that the 'illusion' output problem of a large model can be relieved, and the accuracy of the knowledge completion process is enhanced; and the method can be effectively expanded in knowledge completion tasks in different knowledge fields, and the missing part of the knowledge can be accurately completed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a large model knowledge completion method based on post-decoding trust enhancement in accordance with an embodiment of the invention;
FIG. 2 is an exemplary flow diagram of a large model knowledge completion method based on post-decoding trust enhancement in accordance with an embodiment of the invention;
FIG. 3 is a schematic diagram of a model structure of a large model involved in a large model knowledge completion method based on post-decoding trusted enhancement in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a model structure of a post-decoding module involved in a large model knowledge completion method based on post-decoding trusted enhancement in accordance with an embodiment of the present invention;
fig. 5 is a schematic diagram of a model structure of a fusion module involved in a large model knowledge completion method based on post-decoding trusted enhancement according to an embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Referring to fig. 1, a flowchart of a large model knowledge completion method based on post-decoding trusted enhancement according to an embodiment of the present invention is shown, where the large model knowledge completion method based on post-decoding trusted enhancement provided by the embodiment of the present invention is applied to a synthesized audio detection system, and the synthesized audio detection system includes: the method comprises the following steps of:
s101, constructing a prompt text prompt based on triples to be completed in the target knowledge field.
S102, performing layer-by-layer reasoning based on promtt through a large model to obtain hidden layersS n
S103, passing through a post-decoding module pair based on a multi-layer perceptronS n Post-decoding to output the large modelSnPerforming adjustment and correction to generate decoded stateR n
S104, outputting the large model through a fusion moduleS n And the decoded stateR n And (5) fusing and calculating to obtain a final result G.
S105, completing the triples to be completed according to the final result G.
In the embodiment of the invention, the parameters of the post-decoding module and the fusion module are obtained by training based on sample prompt and final result label of known triplet construction in the field of target knowledge, and the parameters of the large model module are unchanged in the training process.
In the embodiment of the invention, hidden layers output by the last preset number of layers of the large model can be input into the post-decoding module, for example: last layer, last two layers, etc.
In the embodiment of the invention, the hidden layer state of the last layer of the large model can be outputS n And inputting the decoding module.
Knowledge graph is a structured knowledge representation method for modeling entities, attributes and relationships between them. Knowledge graph is a large-scale semantic network capable of storing and presenting knowledge in various fields and providing visual means for query and reasoning. Knowledge maps are generally composed of three elements: entities, attributes, and relationships. An entity represents a particular thing or concept in the real world, and attributes describe the characteristics or properties of the entity. Relationships then represent connections or dependencies between entities. Knowledge can be organized and managed in a structured manner by knowledge maps and useful information can be obtained therefrom. Knowledge maps can be applied to a plurality of fields, such as search engines, intelligent questions and answers, natural language processing, data analysis and the like. In the embodiment of the invention, the knowledge graph obtained after knowledge completion can be further applied to a plurality of fields.
For ease of understanding, the large model knowledge completion method based on post-decoding trusted enhancement provided by the embodiment of the present invention is explained below in conjunction with fig. 2, and fig. 2 shows an exemplary flow diagram of the large model knowledge completion method based on post-decoding trusted enhancement provided by the embodiment of the present invention.
In the embodiment of the present invention, the triples in the knowledge graph include a head entity, a relationship and a tail entity, and in step S101, a target relationship template may be determined from a preset relationship template in the target knowledge domain based on a relationship type to which the relationship in the triples to be completed belongs, and a prompt text may be constructed for the triples to be completed based on the target relationship template. In the embodiment of the invention, the preset relationship template in the target knowledge field is constructed according to the common relationship in the target knowledge field. By way of example, taking the triplet (paris, as capitals, france) as an example, a relationship type of "as capitals", a relationship template that may be "(tail entity) is (head entity)" if it is the head entity to be completed, i.e., (: 1. what is the first of france? 2. Which city is the first of france? 3. Which city is the capital of france? If the tail entity is to be complemented, i.e. (paris, as the first capital,: 1. is the capital of which country paris? 2. Is the first of which country Paris? 3. Is the head of any country paris?
In the embodiment of the invention, the model structure of the large model is based on a transform architecture, as shown in fig. 3, which shows a schematic diagram of the model structure of the large model related to the large model knowledge completion method based on post-decoding credibility enhancement in the embodiment of the invention. Specifically, the large model obtains hidden layers based on promtt layer-by-layer reasoning through the following formulaS n :
Wherein the method comprises the steps ofRepresenting head entity, relation, tail entity, respectively, in the triplet->Representing the missing part in the triplet to be completed, and (2)>Represents->Layer transducer->Representing the state of the transform hidden layer of the nth output.
Wherein omitted parts refer to S 4 ~S n-1 As can be understood, S 4 ~S n-1 And S as described above 2 、S 3 Similarly, it can be analogically obtained.
The transducer is a deep learning model based on a self-attention mechanism and is used for processing sequence data and widely applied to the field of natural language processing. The core of the transducer is a multi-headed self-attention mechanism that captures global context by self-attention computation for each position of the input sequence. The transducer comprises an encoder and a decoder, and the hidden layer mainly exists in the encoder. The encoder is composed of multiple identical layers, each of which contains a self-attention sub-layer and a feed-forward neural network sub-layer. In each sublayer, after the input sequence is subjected to linear transformation, the weight of each position to all positions is obtained through self-attention mechanism calculation, and then the context vector of each position is generated. Then, these context vectors are subjected to nonlinear transformation once through a feedforward neural network to obtain the final hidden layer state. That is, the hidden layer state of a transducer can be understood as a representation of each position of the input sequence after multiple layers of self-attention calculations and nonlinear transformations. It contains global context information of the input sequence and can be used for subsequent tasks such as machine translation, text generation, etc.
In the embodiment of the invention, considering the 'phantom' output problem of the large model, before the large model outputs an answer, a post-decoding module is used for carrying out post-decoding processing on the hidden state of the output of the large model, and the hidden state of the output of the large model is properly adjusted based on the post-decoding module corresponding to the target knowledge field.
In the embodiment of the invention, the post-decoding module based on the multi-layer perceptron is of a multi-layer neural network structure, as shown in fig. 4, which shows a schematic diagram of the model structure of the post-decoding module related to the large model knowledge complement method based on the post-decoding credibility enhancement in the embodiment of the invention. The post-decoding module is coupled by the following formulaS n Post-decoding to output the large modelSnPerforming adjustment and correction to generate decoded stateR n :
……;
Wherein the method comprises the steps of、/>、/>Representing weight parameters->、/>、/>Represents a bias parameter->Representing a nonlinear activation functionA number.
Wherein the omitted part refers to R 3 ~R n-1 As can be appreciated, R 3 ~R n-1 And R as above 1 、R 1 Similarly, it can be analogically obtained.
The multilayer perceptron (Multilayer Perceptron, MLP for short) is a classical feedforward artificial neural network model, consisting of multiple layers of neurons. The structure of the multi-layer perceptron comprises an input layer, a hidden layer and an output layer. Each layer is composed of a plurality of neurons which are connected through weights and are subjected to nonlinear transformation of an activation function. In the multi-layer perceptron, information flow enters from an input layer, sequentially passes through the processing of a hidden layer, and finally obtains a prediction result through an output layer. Each neuron will weight the inputs and pass the output to the next layer through the processing of the activation function. The training process of multi-layer perceptrons typically uses a back propagation algorithm (Backpropagation) to optimize the model parameters. The algorithm minimizes the loss function by calculating the gradient of the loss function to each parameter and then updating the parameters according to the opposite direction of the gradient. Multi-layer perceptrons can be used to solve classification and regression problems. It can learn complex nonlinear relation and has stronger expression ability.
In the embodiment of the invention, model parameters of the post-decoding module based on the multi-layer perceptron are optimized through training samples in the field of target knowledge, so that hidden layers output by a large model can be properly adjusted, and error contents output by the large model can be corrected.
In the embodiment of the invention, the fusion module is connected with the large model and the post-decoding module, combines the state output by the large model and the state output by the post-decoding module, and calculates a new knowledge complement result.
In the embodiment of the invention, the fusion module can realize state fusion through a gating mechanism, as shown in fig. 5, which shows a schematic diagram of a model structure of the fusion module related to the large model knowledge completion method based on post-decoding credibility enhancement in the embodiment of the invention. Specifically, the fusion module receives an output state generated by a large model based on a transducer and a decoding state generated by a post-decoding module based on a multi-layer perceptron as inputs, fuses the two states together through a gating mechanism to generate a final result, and further can predict a new fact triplet. Specifically, the fusion module calculates and obtains a final result G based on the following formula:
wherein the method comprises the steps of、/>Weight parameter representing gating mechanism, +.>、/>As a biasing parameter for the gating mechanism,and->All represent the corresponding nonlinear activation function, +.>And->Representing intermediate results of the gating mechanism, respectively.
In the embodiment of the invention, the final result G output based on the fusion module can complement the triplet to be complemented, enriches the existing knowledge graph and lays a foundation for downstream tasks.
In the embodiment of the invention, aiming at the target knowledge field, before the answer is output by the large model, a post-decoding module is used for carrying out post-decoding processing on the hidden state of the output of the large model, the hidden state of the output of the large model is properly adjusted based on the post-decoding module corresponding to the target knowledge field, the error content can be corrected, and the output of the large model is carried out based on the fusion moduleS n And the decoded state output by the post-decoding moduleR n The fusion is carried out, the final result is obtained through calculation, and the knowledge completion is carried out, so that the 'illusion' output problem of a large model can be relieved, and the accuracy of the knowledge completion process is enhanced; and the method can be effectively expanded in knowledge completion tasks in different knowledge fields, and the missing part of the knowledge can be accurately completed.
In the large model knowledge completion method based on post-decoding trusted enhancement according to the embodiment of the present invention, in combination with the above embodiment, the method may further include the following steps:
s201, acquiring a known triplet in the target knowledge field.
S202, hiding any entity in the known triples, and training samples based on the triples after hiding any entity.
S203, training the post-decoding module to be trained and the fusion module to be trained based on the training samples.
In the embodiment of the invention, the target knowledge field is the knowledge field of the to-be-completed triples which need to be completed in actual application. In the embodiment of the invention, the knowledge field is obtained by dividing various triples in a knowledge graph according to the field. In the embodiment of the invention, the complete triples known in the field of target knowledge can be obtained from the existing knowledge graph.
In the embodiment of the present invention, in the step S202, the training samples include: sample promt and final result label, wherein the construction method of sample promt is similar to the construction method of promt provided in step S101 above. Specifically, a head entity or a tail entity in a known triplet can be randomly hidden, and then a sample template is constructed based on a relation template. By way of example, taking the known triplet (paris, as capital, france) as an example, a relationship type of "as capital", a capital whose relationship template may be "(tail entity) is (head entity)" if the head entity is hidden, i.e., (: 1. what is the first of france? 2. Which city is the first of france? 3. Which city is the capital of france? If the tail entity is hidden, i.e. (paris, as the capital: 1. is the capital of which country paris? 2. Is the first of which country Paris? 3. Is the head of any country paris?
Specifically, the step S202 includes the following sub-steps:
s2021, determining the frequency of occurrence of each entity in the known triplet.
And S2022, hiding the low-frequency entity with low occurrence frequency in the known triples.
S2023, training samples are constructed based on the triplets after hiding the low frequency entities.
In the embodiment of the invention, in the construction process of the training sample, the content of the large model which tends to output high frequency occurrence is considered, so that the large model focuses on the knowledge of low frequency occurrence more, and when the hidden entity is selected, the entity with low frequency occurrence can be selected preferentially to participate in more training. In particular, for entitiesIt is assumed that its number of occurrences in the head entity is +.>The total number of triples is N, so that the probability and the probability of hiding the entity in the process of constructing training samplesPositive correlation. The entity->In the case of a tail entity, the probability of hiding the entity is similar.
Specifically, the step S203 includes the following sub-steps:
s2031, inputting the training sample into a large model, inputting the sample hidden layer state output by the large model into a post-decoding module to be trained, adjusting and correcting the sample hidden layer state output by the large model through the post-decoding module to be trained to generate a sample decoded state, and inputting the sample hidden layer state and the sample decoded state into a fusion module to be trained to obtain a sample final result.
And S2032, aiming at minimizing the difference between the final result of the sample and the hidden entity, keeping the parameters of the large model unchanged, and updating the parameters of the post-decoding module to be trained and the fusion module to be trained.
In the embodiment of the invention, the post-decoding module and the fusion module can be obtained based on the training of the steps S201-S203.
In the embodiment of the invention, the corresponding post-decoding module and the corresponding fusion module can be respectively trained aiming at different target knowledge fields, so that the knowledge completion method provided by the embodiment of the invention can be efficiently expanded in knowledge completion tasks of different knowledge fields, and the missing part of knowledge can be accurately completed. In the embodiment of the invention, the post-decoding module corresponding to the target knowledge field can be used for properly adjusting the output content of the large model and correcting the error content, thereby reducing the possibility of 'illusion' of the large model and enhancing the credibility of the large model.
In an optional implementation manner, in the embodiment of the present invention, training samples of all hidden head entities may be further used to train to obtain a post-decoding module and a fusion module for complementing the head entities, and training samples of all hidden tail entities may be further used to train to obtain a post-decoding module and a fusion module for complementing the tail entities, so that accuracy of knowledge complementation may be further improved.
Based on the same inventive concept, the embodiment of the invention also provides a large model knowledge completion system based on post-decoding credibility enhancement, which comprises: the large model, the post-decoding module based on the multi-layer perceptron and the fusion module, wherein the large model knowledge completion system is used for executing the steps in the method described in any embodiment.
In the embodiment of the present invention, the large model knowledge completion system may further include: the input module can determine a target relation template from a preset relation template in the target knowledge field based on the relation type of the triples to be complemented, and prompt text prompt for the triples to be complemented based on the target relation template.
Based on the same inventive concept, the embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the steps in the method described in any of the foregoing embodiments.
Based on the same inventive concept, the embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the method according to any of the embodiments described above.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable terminal device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable terminal device to cause a series of operational steps to be performed on the computer or other programmable terminal device to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal device provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The invention provides a large model knowledge completion method and system based on post-decoding credibility enhancement, which are described in detail, wherein specific examples are applied to illustrate the principle and the implementation of the invention, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (7)

1. A large model knowledge completion method based on post-decoding trust enhancement, the method comprising:
constructing a prompt text prompt based on triples to be complemented in the target knowledge field;
layer-by-layer reasoning based on prompt through large model to obtain hidden layer stateS n
Through a post-decoding module pair based on a multi-layer perceptronS n Post-decoding to output the large modelSnPerforming adjustment and correction to generate decoded stateR n
Outputting the large model through a fusion moduleS n And the decoded stateR n Fusing and calculating to obtain a final result G;
completing the triples to be completed according to the final result G;
the model structure of the large model is based on a transducer architecture, and the large model is based on the promt layer-by-layer reasoning to obtain hidden layers through the following formulaS n :
Wherein the method comprises the steps ofRepresenting head entity, relation, tail entity, respectively, in the triplet->Representing the missing part in the triplet to be completed, and (2)>Represents->Layer transducer->Representing a transform hidden layer state of the nth layer output;
the post-decoding module is of a neural network structure and is paired by the following formulaS n Post-decoding to output the large modelSnPerforming adjustment and correction to generate decoded stateR n :
……;
Wherein the method comprises the steps of、/>、/>Representing weight parameters->、/>、/>Represents a bias parameter->Representing a nonlinear activation function;
the fusion module calculates and obtains a final result G based on the following formula:
wherein the method comprises the steps of、/>Weight parameter representing gating mechanism, +.>、/>As a biasing parameter for the gating mechanism,and->All represent the corresponding nonlinear activation function, +.>And->Representing intermediate results of the gating mechanism, respectively.
2. The post-decoding trust-enhancement based large model knowledge completion method of claim 1, further comprising:
acquiring a known triplet in the target knowledge domain;
hiding any entity in the known triples, and constructing a training sample based on the triples after hiding any entity;
and training the post-decoding module to be trained and the fusion module to be trained based on the training samples.
3. The post-decoding trust-enhancement based large model knowledge completion method of claim 2, wherein hiding any entity in the known triples, constructing training samples based on triples after hiding any entity, comprises:
determining the occurrence frequency of each entity in the known triples;
hiding the low-frequency entity with the lowest occurrence frequency in the known triples;
training samples are constructed based on triplets after concealment of low frequency entities.
4. The post-decoding credible enhancement based large model knowledge completion method according to claim 2, wherein training a large model to be trained, a post-decoding module to be trained, and a fusion module to be trained based on the training samples comprises:
inputting the training sample into a large model, inputting a sample hidden layer output by the large model into a post-decoding module to be trained, adjusting and correcting the sample hidden layer output by the large model through the post-decoding module to be trained to generate a sample decoded state, and inputting the sample hidden layer state and the sample decoded state into a fusion module to be trained to obtain a sample final result;
and aiming at minimizing the difference between the final result of the sample and the hidden entity, keeping the parameters of the large model unchanged, and updating the parameters of the post-decoding module to be trained and the fusion module to be trained.
5. A large model knowledge completion system based on post-decoding trust enhancement, the large model knowledge completion system comprising: the system comprises a large model, a post-decoding module based on a multi-layer perceptron and a fusion module, wherein the large model knowledge completion system is used for realizing the large model knowledge completion method based on post-decoding credibility enhancement as set forth in any one of claims 1-4.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the post-decoding trust-enhancement based large model knowledge completion method of any of claims 1-4 when the computer program is executed.
7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the post-decoding trust enhancement based large model knowledge completion method according to any one of claims 1-4.
CN202410063977.1A 2024-01-17 2024-01-17 Large model knowledge completion method and system based on post-decoding credibility enhancement Active CN117575007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410063977.1A CN117575007B (en) 2024-01-17 2024-01-17 Large model knowledge completion method and system based on post-decoding credibility enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410063977.1A CN117575007B (en) 2024-01-17 2024-01-17 Large model knowledge completion method and system based on post-decoding credibility enhancement

Publications (2)

Publication Number Publication Date
CN117575007A CN117575007A (en) 2024-02-20
CN117575007B true CN117575007B (en) 2024-04-05

Family

ID=89886692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410063977.1A Active CN117575007B (en) 2024-01-17 2024-01-17 Large model knowledge completion method and system based on post-decoding credibility enhancement

Country Status (1)

Country Link
CN (1) CN117575007B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348191A (en) * 2020-10-26 2021-02-09 福州大学 Knowledge base completion method based on multi-mode representation learning
KR102234850B1 (en) * 2019-11-15 2021-04-02 숭실대학교산학협력단 Method and apparatus for complementing knowledge based on relation network
CN114579769A (en) * 2022-05-07 2022-06-03 中国科学技术大学 Small sample knowledge graph completion method, system, equipment and storage medium
CN115357728A (en) * 2022-08-22 2022-11-18 浙江大学 Large model knowledge graph representation method based on Transformer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102234850B1 (en) * 2019-11-15 2021-04-02 숭실대학교산학협력단 Method and apparatus for complementing knowledge based on relation network
CN112348191A (en) * 2020-10-26 2021-02-09 福州大学 Knowledge base completion method based on multi-mode representation learning
CN114579769A (en) * 2022-05-07 2022-06-03 中国科学技术大学 Small sample knowledge graph completion method, system, equipment and storage medium
CN115357728A (en) * 2022-08-22 2022-11-18 浙江大学 Large model knowledge graph representation method based on Transformer

Also Published As

Publication number Publication date
CN117575007A (en) 2024-02-20

Similar Documents

Publication Publication Date Title
Kim et al. Forecasting stock prices with a feature fusion LSTM-CNN model using different representations of the same data
US11449684B2 (en) Contrastive pre-training for language tasks
US11715008B2 (en) Neural network training utilizing loss functions reflecting neighbor token dependencies
CN108960407A (en) Recurrent neural network language model training method, device, equipment and medium
Ammar et al. Automatically mapped transfer between reinforcement learning tasks via three-way restricted boltzmann machines
JP2022530127A (en) Training of machine learning models with unsupervised data augmentation
Tjandra et al. Gated recurrent neural tensor network
KR20230141683A (en) Method, apparatus and computer program for buildding knowledge graph using qa model
CN115545160A (en) Knowledge tracking method and system based on multi-learning behavior cooperation
Silver et al. Consolidation using sweep task rehearsal: overcoming the stability-plasticity problem
Berry Prolegomenon to a media theory of machine learning: compute-computing and compute-computed
CN117275609A (en) Molecular design method based on variation self-encoder and transducer model
CN117575007B (en) Large model knowledge completion method and system based on post-decoding credibility enhancement
CN112463935A (en) Open domain dialogue generation method and model with strong generalized knowledge selection
EP3931755A1 (en) Method and system for training a machine learning system using context injection
Hu et al. Time series prediction with a weighted bidirectional multi-stream extended Kalman filter
Wu et al. Improved saddle point prediction in stochastic two-player zero-sum games with a deep learning approach
WO2023107207A1 (en) Automated notebook completion using sequence-to-sequence transformer
Roberts Neural networks for Lorenz map prediction: A trip through time
Vavra et al. Optimization of the novelty detection model based on LSTM autoencoder for ICS environment
US11886233B2 (en) Architecture for generating QA pairs from contexts
Xi et al. MLP training in a self-organizing state space model using unscented Kalman particle filter
CN116227484B (en) Model training method, apparatus, device, storage medium and computer program product
US20240135178A1 (en) Forward signal propagation learning
Winqvist Neural Network Approaches for Model Predictive Control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant