WO2024066045A1

WO2024066045A1 - Guarantee information extraction and value prediction method and system, terminal, and storage medium

Info

Publication number: WO2024066045A1
Application number: PCT/CN2022/137062
Authority: WO
Inventors: 吴承科; 郭媛君; 杨之乐; 王尧; 刘祥飞
Original assignee: 深圳先进技术研究院
Priority date: 2022-09-27
Filing date: 2022-12-06
Publication date: 2024-04-04
Also published as: CN115619573A

Abstract

Disclosed in the present invention are a guarantee information extraction and value prediction method and system, a terminal, and a storage medium. The method comprises: obtaining a guarantee text and a trained target extraction model, and inputting the guarantee text into the target extraction model to obtain a plurality of triples corresponding to the guarantee text, wherein each triple is used for reflecting the relationship between two entities in the guarantee text; according to the triples, generating a guarantee knowledge graph corresponding to the guarantee text; determining a guarantee type corresponding to the guarantee text, and according to the guarantee type, determining a trained target prediction model corresponding to the guarantee text; and inputting the guarantee knowledge graph into the target prediction model to obtain a predicted guarantee value. According to the present invention, by means of the target extraction model and the target prediction model constructed on the basis of a deep learning method, information in the guarantee text can be accurately extracted, and the value of the guarantee text is accurately estimated on the basis of the extracted information. The problem in the prior art of the difficulty in obtaining an objective guarantee value by manual guarantee value estimation is solved.

Description

保函信息提取与价值预测方法、***、终端及存储介质Guarantee information extraction and value prediction method, system, terminal and storage medium

技术领域Technical Field

本发明涉及保函应用领域，尤其涉及的是保函信息提取与价值预测方法、***、终端及存储介质。The present invention relates to the application field of letters of guarantee, and in particular to a method, system, terminal and storage medium for extracting and predicting the value of letters of guarantee.

背景技术Background technique

建设工程电子保函为建设工程领域不可或缺的一环，其指的是银行、保险公司、担保公司应申请人的请求，向第三方开具的一种书面信用担保凭证。保函多由具有强大资金实力的保险公司出具，可以以信用为基础，无需传统抵质押担保，从而缓解企业在公共资源交易中的资金压力。因此保函具有一定的经济价值。然而，目前保函的价值采用的是人工评估的方法，因此难以得到客观的保函价值。The electronic guarantee for construction projects is an indispensable part of the construction project field. It refers to a written credit guarantee certificate issued by a bank, insurance company, or guarantee company to a third party at the request of the applicant. Guarantees are mostly issued by insurance companies with strong financial strength. They can be based on credit and do not require traditional mortgage guarantees, thereby alleviating the financial pressure of enterprises in public resource transactions. Therefore, the guarantee has a certain economic value. However, the value of the guarantee is currently assessed manually, so it is difficult to obtain an objective guarantee value.

因此，现有技术还有待改进和发展。Therefore, the existing technology still needs to be improved and developed.

技术问题technical problem

本发明要解决的技术问题在于，针对现有技术的上述缺陷，提供一种保函信息提取与价值预测方法、***、终端及存储介质，旨在解决现有技术中采用人工评估保函的价值，难以得到客观的保函价值的问题。The technical problem to be solved by the present invention is that, in view of the above-mentioned defects of the prior art, a method, system, terminal and storage medium for extracting and predicting the value of a letter of guarantee is provided, aiming to solve the problem in the prior art that it is difficult to obtain an objective value of a letter of guarantee by manually evaluating the value of the letter of guarantee.

技术解决方案Technical Solutions

本发明解决问题所采用的技术方案如下：The technical solution adopted by the present invention to solve the problem is as follows:

第一方面，本发明实施例提供一种保函信息提取与价值预测方法，其在，所述方法包括：In a first aspect, an embodiment of the present invention provides a method for extracting guarantee information and predicting value, wherein the method comprises:

获取保函文本和已训练的目标提取模型，将所述保函文本输入所述目标提取模型，得到所述保函文本对应的若干三元组，每一所述三元组用于反映所述保函文本中两个实体之间的关系；Obtaining a letter of guarantee text and a trained target extraction model, inputting the letter of guarantee text into the target extraction model, and obtaining a plurality of triples corresponding to the letter of guarantee text, each of which is used to reflect a relationship between two entities in the letter of guarantee text;

根据各所述三元组，生成所述保函文本对应的保函知识图谱；Generate a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples;

确定所述保函文本对应的保函类型，根据所述保函类型确定所述保函文本对应的已训练的目标预测模型；Determine the letter of guarantee type corresponding to the letter of guarantee text, and determine the trained target prediction model corresponding to the letter of guarantee text according to the letter of guarantee type;

将所述保函知识图谱输入所述目标预测模型，得到预测保函价值。The letter of guarantee knowledge graph is input into the target prediction model to obtain the predicted letter of guarantee value.

在一种实施方式中，所述目标提取模型对应的训练过程包括：In one embodiment, the training process corresponding to the target extraction model includes:

获取历史保函文本，根据所述历史保函文本确定若干第一训练数据，其中，每一所述第一训练数据包含所述历史保函文本中的语句和该语句对应的标注信息，所述标注信息用于反映该语句中包含的实体和各实体之间的关系；Acquire a historical letter of guarantee text, and determine a plurality of first training data according to the historical letter of guarantee text, wherein each of the first training data includes a sentence in the historical letter of guarantee text and annotation information corresponding to the sentence, and the annotation information is used to reflect the entities included in the sentence and the relationship between the entities;

获取预先经过训练的目标双向语言模型，其中，所述目标双向语言模型的输入数据为通过掩码掩盖后的保函文本语句，所述目标双向语言模型的输出数据为预测的被所述掩码掩盖的词语；Obtaining a pre-trained target bidirectional language model, wherein the input data of the target bidirectional language model is the text sentence of the letter of guarantee masked by the mask, and the output data of the target bidirectional language model is the predicted words masked by the mask;

对所述目标双向语言模型进行调整，得到提取模型，其中，所述提取模型的输入数据为所述历史保函文本中的语句，所述提取模型的输出数据为预测的该语句中包含的实体和各实体之间的关系；The target bidirectional language model is adjusted to obtain an extraction model, wherein the input data of the extraction model is a sentence in the historical letter of guarantee text, and the output data of the extraction model is the predicted entities contained in the sentence and the relationship between the entities;

根据各所述第一训练数据对所述提取模型进行迭代训练，得到所述目标提取模型。The extraction model is iteratively trained according to each of the first training data to obtain the target extraction model.

在一种实施方式中，所述目标双向语言模型对应的训练过程包括：In one implementation, the training process corresponding to the target bidirectional language model includes:

根据所述历史保函文本，确定若干所述保函文本语句；Determining a number of statements in the letter of guarantee based on the historical letter of guarantee text;

通过掩码对所述保函文本语句中的词语进行掩盖得到掩盖语句，根据被掩盖的词语生成所述掩盖语句对应的标签信息；Masking the words in the text sentence of the letter of guarantee by means of a mask to obtain a masked sentence, and generating label information corresponding to the masked sentence according to the masked words;

将所述掩盖语句输入未完成训练的双向语言模型，得到所述双向语言模型基于所述掩盖语句生成的预测词语；Inputting the masked sentence into a bidirectional language model that has not completed training, to obtain predicted words generated by the bidirectional language model based on the masked sentence;

根据所述预测词语和所述标签信息，生成所述双向语言模型对应的第一损失函数值；Generating a first loss function value corresponding to the bidirectional language model according to the predicted words and the label information;

判断所述第一损失函数值是否收敛至目标值，若否，根据所述第一损失函数值对所述双向语言模型进行参数更新，得到更新双向语言模型；Determine whether the first loss function value converges to a target value, and if not, update parameters of the bidirectional language model according to the first loss function value to obtain an updated bidirectional language model;

将所述更新双向语言模型作为所述双向语言模型，继续执行通过掩码对所述保函文本语句中的词语进行掩盖得到掩盖语句，直至所述第一损失函数值收敛至所述目标值，得到所述目标双向语言模型。The updated bidirectional language model is used as the bidirectional language model, and the masking of words in the guarantee text sentence is continued to obtain a masked sentence, until the first loss function value converges to the target value, thereby obtaining the target bidirectional language model.

在一种实施方式中，所述通过掩码对所述保函文本语句中的词语进行掩盖得到掩盖语句，包括：In one embodiment, masking the words in the text of the letter of guarantee by masking to obtain a masked sentence includes:

判断前一轮训练对应的所述第一损失函数值是否收敛至中间值，其中，所述中间值为首轮训练对应的所述第一损失函数值与所述目标值的中间数值；Determine whether the first loss function value corresponding to the previous round of training converges to an intermediate value, wherein the intermediate value is an intermediate value between the first loss function value corresponding to the first round of training and the target value;

若前一轮训练对应的所述第一损失函数值未收敛至所述中间值，通过所述掩码对所述保函文本语句中的词语进行随机掩盖得到所述掩盖语句；If the first loss function value corresponding to the previous round of training does not converge to the intermediate value, randomly masking the words in the text sentence of the letter of guarantee by using the mask to obtain the masked sentence;

若前一轮训练对应的所述第一损失函数值收敛至所述中间值时，获取所述保函文本语句中各词语分别对应的掩盖概率，其中，每一词语对应的所述掩盖概率与该词语对所述历史保函文本的保函价值的贡献度成正比关系；If the first loss function value corresponding to the previous round of training converges to the intermediate value, obtain the masking probability corresponding to each word in the guarantee text sentence, wherein the masking probability corresponding to each word is proportional to the contribution of the word to the guarantee value of the historical guarantee text;

通过所述掩码基于所述保函文本语句中各词语分别对应的所述掩盖概率，对所述保函文本语句中的词语进行掩盖得到所述掩盖语句。The masked sentence is obtained by masking the words in the letter of guarantee text sentence based on the masking probabilities respectively corresponding to the words in the letter of guarantee text sentence through the masking code.

在一种实施方式中，所述根据各所述三元组，生成所述保函文本对应的保函知识图谱，包括：In one implementation, generating a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples includes:

根据各所述三元组中包含的实体一一对应地生成所述保函知识图谱中的节点；Generate nodes in the letter of guarantee knowledge graph according to the entities contained in each triplet in a one-to-one correspondence;

根据各所述三元组中包含的实体之间的关系，对所述保函知识图谱中的各节点进行连线，得到所述保函知识图谱，其中，不同类型的关系分别对应不同类型的连线。According to the relationship between the entities contained in each of the triples, each node in the guarantee knowledge graph is connected to obtain the guarantee knowledge graph, wherein different types of relationships correspond to different types of connections.

在一种实施方式中，所述保函知识图谱还包括各所述节点分别对应的注意力权重，每一所述节点对应的所述注意力权重的确定过程包括：In one implementation, the letter of guarantee knowledge graph further includes attention weights corresponding to each of the nodes, and the process of determining the attention weights corresponding to each of the nodes includes:

以该节点为起始点，对所述保函知识图谱进行邻域搜索得到该节点对应的所有关联节点；Taking the node as the starting point, a neighborhood search is performed on the letter of guarantee knowledge graph to obtain all associated nodes corresponding to the node;

根据各所述关联节点，确定该节点对应的关系框，其中，所述关系框为包含有各所述关联节点的最小包围框；According to each of the associated nodes, determining a relationship box corresponding to the node, wherein the relationship box is a minimum bounding box including each of the associated nodes;

根据所述关系框的大小，确定该节点对应的所述注意力权重。The attention weight corresponding to the node is determined according to the size of the relationship box.

在一种实施方式中，所述目标预测模型对应的训练过程包括：In one implementation, the training process corresponding to the target prediction model includes:

获取历史保函知识图谱和所述历史保函知识图谱对应的保函价值，将所述历史保函知识图谱输入未完成训练的预测模型，得到训练保函价值；其中，所述历史保函知识图谱对应的保函类型与所述保函文本相同，所述预测模型为图注意力模型；Obtain a historical letter of guarantee knowledge graph and a letter of guarantee value corresponding to the historical letter of guarantee knowledge graph, input the historical letter of guarantee knowledge graph into an untrained prediction model, and obtain a trained letter of guarantee value; wherein the letter of guarantee type corresponding to the historical letter of guarantee knowledge graph is the same as the letter of guarantee text, and the prediction model is a graph attention model;

根据所述保函价值和所述训练保函价值的最小均方误差，确定所述预测模型对应的第二损失函数值；Determining a second loss function value corresponding to the prediction model according to a minimum mean square error between the letter of guarantee value and the training letter of guarantee value;

判断所述第二损失函数值是否收敛至预设值，若否，根据所述第二损失函数值对所述预测模型进行参数更新，得到更新预测模型；Determine whether the second loss function value converges to a preset value, and if not, update the parameters of the prediction model according to the second loss function value to obtain an updated prediction model;

将所述更新预测模型作为所述预测模型，继续执行获取历史保函知识图谱和所述历史保函知识图谱对应的保函价值，将所述历史保函知识图谱输入未完成训练的预测模型的步骤，直至得到的所述第二损失函数值收敛至所述预设值，得到所述目标预测模型。Taking the updated prediction model as the prediction model, continue to execute the steps of obtaining the historical guarantee knowledge graph and the guarantee value corresponding to the historical guarantee knowledge graph, and inputting the historical guarantee knowledge graph into the prediction model that has not completed training, until the obtained second loss function value converges to the preset value, and the target prediction model is obtained.

第二方面，本发明实施例还提供一种保函信息提取与价值预测***，其中，所述***包括：In a second aspect, an embodiment of the present invention further provides a system for extracting guarantee information and predicting value, wherein the system comprises:

信息提取模块，用于获取保函文本和已训练的目标提取模型，将所述保函文本输入所述目标提取模型，得到所述保函文本对应的若干三元组，每一所述三元组用于反映所述保函文本中两个实体之间的关系；An information extraction module, used for obtaining a letter of guarantee text and a trained target extraction model, inputting the letter of guarantee text into the target extraction model, and obtaining a plurality of triples corresponding to the letter of guarantee text, each of which is used for reflecting a relationship between two entities in the letter of guarantee text;

图谱生成模块，用于根据各所述三元组，生成所述保函文本对应的保函知识图谱；A graph generation module, used for generating a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples;

模型选择模块，用于确定所述保函文本对应的保函类型，根据所述保函类型确定所述保函文本对应的已训练的目标预测模型；A model selection module, used to determine the type of letter of guarantee corresponding to the letter of guarantee text, and determine a trained target prediction model corresponding to the letter of guarantee text according to the letter of guarantee type;

价值预测模块，用于将所述保函知识图谱输入所述目标预测模型，得到预测保函价值。The value prediction module is used to input the guarantee knowledge graph into the target prediction model to obtain the predicted guarantee value.

第三方面，本发明实施例还提供一种终端，其中，所述终端包括有存储器和一个以上处理器；所述存储器存储有一个以上的程序；所述程序包含用于执行如上述任一所述的保函信息提取与价值预测方法的指令；所述处理器用于执行所述程序。In the third aspect, an embodiment of the present invention further provides a terminal, wherein the terminal includes a memory and one or more processors; the memory stores one or more programs; the program contains instructions for executing any of the guarantee information extraction and value prediction methods described above; and the processor is used to execute the program.

第四方面，本发明实施例还提供一种计算机可读存储介质，其上存储有多条指令，其中，所述指令适用于由处理器加载并执行，以实现上述任一所述的保函信息提取与价值预测方法的步骤。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium on which a plurality of instructions are stored, wherein the instructions are suitable for being loaded and executed by a processor to implement the steps of any of the above-mentioned methods for extracting letter of guarantee information and predicting value.

有益效果Beneficial Effects

本发明实施例通过获取保函文本和已训练的目标提取模型，将所述保函文本输入所述目标提取模型，得到所述保函文本对应的若干三元组，每一所述三元组用于反映所述保函文本中两个实体之间的关系；根据各所述三元组，生成所述保函文本对应的保函知识图谱；确定所述保函文本对应的保函类型，根据所述保函类型确定所述保函文本对应的已训练的目标预测模型；将所述保函知识图谱输入所述目标预测模型，得到预测保函价值。本发明采用基于深度学习方法构建的目标提取模型和目标预测模型可以准确提取保函文本中的信息，并基于提取出的信息准确地评估保函文本的价值。解决了现有技术中采用人工评估保函的价值，难以得到客观的保函价值的问题。The embodiment of the present invention obtains a letter of guarantee text and a trained target extraction model, inputs the letter of guarantee text into the target extraction model, obtains a number of triples corresponding to the letter of guarantee text, each triple is used to reflect the relationship between two entities in the letter of guarantee text; generates a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each triple; determines the letter of guarantee type corresponding to the letter of guarantee text, and determines the trained target prediction model corresponding to the letter of guarantee text according to the letter of guarantee type; inputs the letter of guarantee knowledge graph into the target prediction model to obtain a predicted letter of guarantee value. The present invention uses a target extraction model and a target prediction model constructed based on a deep learning method to accurately extract information from the letter of guarantee text, and accurately evaluates the value of the letter of guarantee text based on the extracted information. This solves the problem in the prior art that it is difficult to obtain an objective letter of guarantee value by manually evaluating the value of the letter of guarantee.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明中记载的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本发明实施例提供的保函信息提取与价值预测方法的流程示意图。FIG1 is a flow chart of a method for extracting guarantee information and predicting value provided in an embodiment of the present invention.

图2是本发明实施例提供的保函信息提取与价值预测***的模块示意图。FIG. 2 is a module diagram of a system for extracting guarantee information and predicting value provided by an embodiment of the present invention.

图3是本发明实施例提供的终端的原理框图。FIG3 is a functional block diagram of a terminal provided by an embodiment of the present invention.

本发明的实施方式Embodiments of the present invention

本发明公开了保函信息提取与价值预测方法、***、终端及存储介质，为使本发明的目的、技术方案及效果更加清楚、明确，以下参照附图并举实施例对本发明进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。The present invention discloses a method, system, terminal and storage medium for extracting and predicting the value of a letter of guarantee. In order to make the purpose, technical solution and effect of the present invention clearer and more specific, the present invention is further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not used to limit the present invention.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解，当我们称元件被“连接”或“耦接”到另一元件时，它可以直接连接或耦接到其他元件，或者也可以存在中间元件。此外，这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。It will be understood by those skilled in the art that, unless expressly stated, the singular forms "one", "said", and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of the present invention refers to the presence of the features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof. It should be understood that when we refer to an element as being "connected" or "coupled" to another element, it may be directly connected or coupled to the other element, or there may be intermediate elements. In addition, the "connection" or "coupling" used herein may include wireless connection or wireless coupling. The term "and/or" used herein includes all or any unit and all combinations of one or more associated listed items.

本技术领域技术人员可以理解，除非另外定义，这里使用的所有术语(包括技术术语和科学术语)，具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语，应该被理解为具有与现有技术的上下文中的意义一致的意义，并且除非像这里一样被特定定义，否则不会用理想化或过于正式的含义来解释。It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as those generally understood by those skilled in the art in the art to which the present invention belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have meanings consistent with the meanings in the context of the prior art, and will not be interpreted with idealized or overly formal meanings unless specifically defined as herein.

针对现有技术的上述缺陷，本发明提供一种保函信息提取与价值预测方法，所述方法通过获取保函文本和已训练的目标提取模型，将所述保函文本输入所述目标提取模型，得到所述保函文本对应的若干三元组，每一所述三元组用于反映所述保函文本中两个实体之间的关系；根据各所述三元组，生成所述保函文本对应的保函知识图谱；确定所述保函文本对应的保函类型，根据所述保函类型确定所述保函文本对应的已训练的目标预测模型；将所述保函知识图谱输入所述目标预测模型，得到预测保函价值。本发明采用基于深度学习方法构建的目标提取模型和目标预测模型可以准确提取保函文本中的信息，并基于提取出的信息准确地评估保函文本的价值。解决了现有技术中采用人工评估保函的价值，难以得到客观的保函价值的问题。In view of the above-mentioned defects of the prior art, the present invention provides a method for extracting and predicting the value of a letter of guarantee. The method obtains a letter of guarantee text and a trained target extraction model, inputs the letter of guarantee text into the target extraction model, and obtains a number of triples corresponding to the letter of guarantee text, each of which is used to reflect the relationship between two entities in the letter of guarantee text; generates a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples; determines the letter of guarantee type corresponding to the letter of guarantee text, and determines the trained target prediction model corresponding to the letter of guarantee text according to the letter of guarantee type; inputs the letter of guarantee knowledge graph into the target prediction model to obtain the predicted letter of guarantee value. The present invention uses a target extraction model and a target prediction model constructed based on a deep learning method to accurately extract information from the letter of guarantee text, and accurately evaluates the value of the letter of guarantee text based on the extracted information. This solves the problem that the value of a letter of guarantee is manually evaluated in the prior art, and it is difficult to obtain an objective letter of guarantee value.

如图1所示，所述方法包括如下步骤：As shown in FIG1 , the method comprises the following steps:

步骤S100、获取保函文本和已训练的目标提取模型，将所述保函文本输入所述目标提取模型，得到所述保函文本对应的若干三元组，每一所述三元组用于反映所述保函文本中两个实体之间的关系。Step S100, obtaining a letter of guarantee text and a trained target extraction model, inputting the letter of guarantee text into the target extraction model, and obtaining a number of triples corresponding to the letter of guarantee text, each of which is used to reflect the relationship between two entities in the letter of guarantee text.

本实施例中保函文本可以是任意一个需要进行价值评估电子保函的文本。为了准确地评估该保函文本的价值，首先需要对保函文本进行信息提取。具体地，本实施例预先构建了一个目标提取模型，该目标提取模型预先经过海量数据训练，学习了对于保函文本中担保公司所关心的关键信息的自动提取。因此将新的保函文本输入到已完成训练的目标提取模型，目标提取模型即可提取出该保函文本对应的各项信息，并将各项信息以三元组的形式呈现。每一个三元组包含三个元素，分别为实体-关系-实体，因此每一个三元组可以反映保函文本中两个实体之间的关系。本实施例中的不同实体分别对应保函文本中不同类型的信息，例如项目参与方和各项目参与方的属性。其中，项目参与方包括：承包商、供应商、劳务分包商；各项目参与方的属性包括：承包合同价、劳务人员派遣数量、供应建材数量或类型等。In this embodiment, the text of the letter of guarantee can be any text of an electronic letter of guarantee that needs to be valued. In order to accurately evaluate the value of the letter of guarantee text, it is first necessary to extract information from the letter of guarantee text. Specifically, this embodiment pre-constructs a target extraction model, which has been pre-trained with massive data and has learned to automatically extract key information that the guarantee company is concerned about in the letter of guarantee text. Therefore, the new letter of guarantee text is input into the trained target extraction model, and the target extraction model can extract the various information corresponding to the letter of guarantee text and present the various information in the form of triples. Each triple contains three elements, namely entity-relationship-entity, so each triple can reflect the relationship between two entities in the letter of guarantee text. Different entities in this embodiment correspond to different types of information in the letter of guarantee text, such as project participants and the attributes of each project participant. Among them, the project participants include: contractors, suppliers, and labor subcontractors; the attributes of each project participant include: contract price, number of labor personnel dispatched, quantity or type of building materials supplied, etc.

在一种实现方式中，所述目标提取模型对应的训练过程包括：In one implementation, the training process corresponding to the target extraction model includes:

步骤S10、获取历史保函文本，根据所述历史保函文本确定若干第一训练数据，其中，每一所述第一训练数据包含所述历史保函文本中的语句和该语句对应的标注信息，所述标注信息用于反映该语句中包含的实体和各实体之间的关系；Step S10: obtaining a historical letter of guarantee text, and determining a plurality of first training data according to the historical letter of guarantee text, wherein each of the first training data comprises a sentence in the historical letter of guarantee text and annotation information corresponding to the sentence, and the annotation information is used to reflect the entities contained in the sentence and the relationship between the entities;

步骤S20、获取预先经过训练的目标双向语言模型，其中，所述目标双向语言模型的输入数据为所述历史保函文本中通过掩码掩盖后的语句，所述目标双向语言模型的输出数据为预测的被所述掩码掩盖的词语；Step S20, obtaining a pre-trained target bidirectional language model, wherein the input data of the target bidirectional language model is the sentences in the historical letter of guarantee text that are masked by the mask, and the output data of the target bidirectional language model is the predicted words masked by the mask;

步骤S30、对所述目标双向语言模型进行调整，得到提取模型，其中，所述提取模型的输入数据为所述历史保函文本中的语句，所述提取模型的输出数据为预测的该语句中包含的实体和各实体之间的关系；Step S30: adjusting the target bidirectional language model to obtain an extraction model, wherein the input data of the extraction model is the sentence in the historical letter of guarantee text, and the output data of the extraction model is the predicted entities contained in the sentence and the relationship between the entities;

步骤S40根据各所述第一训练数据对所述提取模型进行迭代训练，得到所述目标提取模型。Step S40 iteratively trains the extraction model according to each of the first training data to obtain the target extraction model.

简单来说，为了节约训练时间，本实施例采用预先经过训练的目标双向语言模型进行微调来生成初始的提取模型。由于提取模型已经包含了大量语义规律，因此只需要采用少量的训练数据即可获得较高的精确度。具体地，本实施例中的目标双向语言模型即为DeBERTa模型，目标双向语言模型的输入数据为通过掩码掩盖后的保函文本语句，其通过分析掩码的前后文字来预测的被掩码掩盖的词语，因此完成训练的目标双向语言数据已经学习了大量的语义规律。在目标双向语言模型的基础上调整模型的输入和输出关系，使调整后的模型适配于信息提取任务，即得到提取模型。然后用采用少量人工标注的第一训练数据对提取模型进行迭代训练，即可得到准确度较高的目标提取模型。In short, in order to save training time, this embodiment uses a pre-trained target bidirectional language model for fine-tuning to generate an initial extraction model. Since the extraction model already contains a large number of semantic rules, only a small amount of training data is needed to obtain a higher accuracy. Specifically, the target bidirectional language model in this embodiment is the DeBERTa model. The input data of the target bidirectional language model is the text sentence of the letter of guarantee masked by a mask. It predicts the masked words by analyzing the text before and after the mask. Therefore, the trained target bidirectional language data has learned a large number of semantic rules. On the basis of the target bidirectional language model, the input and output relationship of the model is adjusted so that the adjusted model is suitable for the information extraction task, and the extraction model is obtained. Then, the extraction model is iteratively trained using a small amount of manually annotated first training data to obtain a target extraction model with higher accuracy.

在一种实现方式中，所述目标双向语言模型对应的训练过程包括：In one implementation, the training process corresponding to the target bidirectional language model includes:

步骤S21、根据所述历史保函文本，确定若干所述保函文本语句；Step S21, determining a number of guarantee text statements according to the historical guarantee text;

步骤S22、通过掩码对所述保函文本语句中的词语进行掩盖得到掩盖语句，根据被掩盖的词语生成所述掩盖语句对应的标签信息；Step S22: masking the words in the text of the letter of guarantee by means of a mask to obtain a masked sentence, and generating label information corresponding to the masked sentence according to the masked words;

步骤S23、将所述掩盖语句输入未完成训练的双向语言模型，得到所述双向语言模型基于所述掩盖语句生成的预测词语；Step S23, inputting the masked sentence into a bidirectional language model that has not yet completed training, to obtain predicted words generated by the bidirectional language model based on the masked sentence;

步骤S24、根据所述预测词语和所述标签信息，生成所述双向语言模型对应的第一损失函数值；Step S24: generating a first loss function value corresponding to the bidirectional language model according to the predicted words and the label information;

步骤S25、判断所述第一损失函数值是否收敛至目标值，若否，根据所述第一损失函数值对所述双向语言模型进行参数更新，得到更新双向语言模型；Step S25, determining whether the first loss function value converges to a target value, and if not, updating parameters of the bidirectional language model according to the first loss function value to obtain an updated bidirectional language model;

步骤S26、将所述更新双向语言模型作为所述双向语言模型，继续执行通过掩码对所述保函文本语句中的词语进行掩盖得到掩盖语句，直至所述第一损失函数值收敛至所述目标值，得到所述目标双向语言模型。Step S26: Use the updated bidirectional language model as the bidirectional language model, and continue to mask the words in the letter of guarantee text sentence by masking to obtain a masked sentence, until the first loss function value converges to the target value, and obtains the target bidirectional language model.

具体地，本实施例将未完成训练的模型定义为双向语言模型，完成训练的模型定义为目标双向语言模型，模型训练过程为迭代训练过程。以一轮训练为例，首先从历史保函文本中提取保函文本语句，通过掩码掩盖该保函文本语句中的一个词语，得到掩盖语句，并采用人工标注的方式生成该掩盖语句的标签信息，该标签信息即用于反映被掩盖位置的词语。将掩盖语句输入双向语言模型，通过双向语言模型预测被掩盖位置的词语，即得到预测词语。然后对比标签信息和预测词语，可以获知模型的预测偏差，即生成第一损失函数值，第一损失函数值越大，表示模型的预测偏差越大。本实施例预先基于训练要求设定了目标值，若第一损失函数值未收敛至目标值，说明模型当前的模型精度还未达到训练要求，则继续进行下一轮训练；若第一损失函数值收敛至目标值，说明模型当前的模型精度已达到训练要求，则停止训练，得到目标双向语言模型。Specifically, in this embodiment, the model that has not completed training is defined as a bidirectional language model, and the model that has completed training is defined as a target bidirectional language model, and the model training process is an iterative training process. Taking one round of training as an example, firstly, a guarantee text sentence is extracted from the historical guarantee text, and a word in the guarantee text sentence is masked by masking to obtain a masked sentence, and the label information of the masked sentence is generated by manual annotation, and the label information is used to reflect the words in the masked position. The masked sentence is input into the bidirectional language model, and the words in the masked position are predicted by the bidirectional language model to obtain the predicted words. Then, by comparing the label information and the predicted words, the prediction deviation of the model can be obtained, that is, the first loss function value is generated. The larger the first loss function value, the larger the prediction deviation of the model. In this embodiment, the target value is set in advance based on the training requirements. If the first loss function value does not converge to the target value, it means that the current model accuracy of the model has not met the training requirements, and then the next round of training is continued; if the first loss function value converges to the target value, it means that the current model accuracy of the model has met the training requirements, then the training is stopped to obtain the target bidirectional language model.

在一种实现方式中，所述通过掩码对所述保函文本语句中的词语进行掩盖得到掩盖语句，包括：In one implementation, masking the words in the text of the letter of guarantee by masking to obtain a masked sentence includes:

步骤S221、判断前一轮训练对应的所述第一损失函数值是否收敛至中间值，其中，所述中间值为首轮训练对应的所述第一损失函数值与所述目标值的中间数值；Step S221, determining whether the first loss function value corresponding to the previous round of training converges to an intermediate value, wherein the intermediate value is an intermediate value between the first loss function value corresponding to the first round of training and the target value;

步骤S222、若前一轮训练对应的所述第一损失函数值未收敛至所述中间值，通过所述掩码对所述保函文本语句中的词语进行随机掩盖得到所述掩盖语句；Step S222: if the first loss function value corresponding to the previous round of training does not converge to the intermediate value, randomly masking the words in the text sentence of the letter of guarantee by using the mask to obtain the masked sentence;

步骤S223、若前一轮训练对应的所述第一损失函数值收敛至所述中间值时，获取所述保函文本语句中各词语分别对应的掩盖概率，其中，每一词语对应的所述掩盖概率与该词语对所述历史保函文本的保函价值的贡献度成正比关系；Step S223: if the first loss function value corresponding to the previous round of training converges to the intermediate value, obtain the masking probability corresponding to each word in the guarantee text sentence, wherein the masking probability corresponding to each word is proportional to the contribution of the word to the guarantee value of the historical guarantee text;

步骤S224、通过所述掩码基于所述保函文本语句中各词语分别对应的所述掩盖概率，对所述保函文本语句中的词语进行掩盖得到所述掩盖语句。Step S224: masking the words in the letter of guarantee text sentence based on the masking probabilities respectively corresponding to the words in the letter of guarantee text sentence by using the mask to obtain the masked sentence.

具体地，本实施例预先设定了两种掩盖模式，一种是随机掩盖模式，即保函文本语句中各词语的掩盖概率相同，采用掩码对保函文本语句中的词语进行随机掩盖；另一种是非随机掩盖模式，即保函文本语句中各词语的掩盖概率不相同，在采用掩码对保函文本语句进行掩盖时需要考虑各词语的掩盖概率，对保函文本的保函价值的影响更高的词语被掩盖的概率更高。针对当前轮的训练，需要先判断前一轮的第一损失函数值是否收敛至中间值，即首轮训练的模型损失函数值与目标值的中间数值。若前一轮的第一损失函数值未收敛至中间值，表示当前模型精度较低，需要通过随机掩盖模式来学习更多的语义规律，以快速收敛模型的损失函数值；若前一轮的第一损失函数值收敛至中间值，表示当前模型精度较高，则需要采用非随机掩盖模式进行精细化训练，使得模型能够深度学习对保函价值影响程度更高的词语的语义规律，提高后续保函价值预估的准确度。Specifically, this embodiment pre-sets two masking modes, one is a random masking mode, that is, the masking probability of each word in the guarantee text sentence is the same, and the words in the guarantee text sentence are randomly masked by using a mask; the other is a non-random masking mode, that is, the masking probability of each word in the guarantee text sentence is different, and the masking probability of each word needs to be considered when masking the guarantee text sentence by using a mask, and the words with a higher impact on the guarantee value of the guarantee text have a higher probability of being masked. For the current round of training, it is necessary to first determine whether the first loss function value of the previous round converges to the intermediate value, that is, the intermediate value between the model loss function value of the first round of training and the target value. If the first loss function value of the previous round does not converge to the intermediate value, it means that the current model accuracy is low, and it is necessary to learn more semantic rules through a random masking mode to quickly converge the loss function value of the model; if the first loss function value of the previous round converges to the intermediate value, it means that the current model accuracy is high, and it is necessary to use a non-random masking mode for refined training, so that the model can deeply learn the semantic rules of words that have a higher degree of impact on the guarantee value, and improve the accuracy of subsequent guarantee value estimation.

如图1所示，所述方法还包括：As shown in FIG1 , the method further includes:

步骤S200、根据各所述三元组，生成所述保函文本对应的保函知识图谱。Step S200: Generate a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples.

具体地，为了更直观的呈现保函文本中的各项信息和各项信息之间的关联关系，本实施例将提取出的各三元组转换为图谱的形式，即得到保函知识图谱，保函知识图谱中包括有各三元组中的实体分别对应的节点，通过观察保函知识图谱中包含的节点即可获知保函文本中保函的信息类型，通过节点之间的连接关系即可获知各项信息之间的关联关系。Specifically, in order to more intuitively present the various information in the letter of guarantee text and the association between the various information, this embodiment converts the extracted triples into the form of a graph, that is, obtains a letter of guarantee knowledge graph, which includes nodes corresponding to the entities in each triple. By observing the nodes contained in the letter of guarantee knowledge graph, the information type of the letter of guarantee in the letter of guarantee text can be known, and the association between the various information can be known through the connection relationship between the nodes.

在一种实现方式中，所述步骤S200具体包括：In one implementation, step S200 specifically includes:

步骤S201、根据各所述三元组中包含的实体一一对应地生成所述保函知识图谱中的节点；Step S201: Generate nodes in the letter of guarantee knowledge graph in one-to-one correspondence with the entities contained in each triple;

步骤S202、根据各所述三元组中包含的实体之间的关系，对所述保函知识图谱中的各节点进行连线，得到所述保函知识图谱。Step S202: Connect the nodes in the guarantee knowledge graph according to the relationship between the entities contained in each triple to obtain the guarantee knowledge graph.

具体地，本实施例中的保函知识图谱是由节点和连线组成的，每一个节点用于反映保函文本对应的一个实体，即一个类型的信息。两个节点之间若存在连线，表示这两个节点分别对应的实体之间具有关联关系，具体的关系类型可以基于连线的类型体现。例如，两个节点分别对应的实体之间具有包含关系，则两个节点可以采用实线连线；两个节点分别对应的实体之间具有先后生成关系，则两个节点可以采用带箭头的虚线连线。Specifically, the guarantee knowledge graph in this embodiment is composed of nodes and lines, and each node is used to reflect an entity corresponding to the guarantee text, that is, a type of information. If there is a line between two nodes, it means that there is an association relationship between the entities corresponding to the two nodes, and the specific relationship type can be reflected based on the type of line. For example, if there is a containment relationship between the entities corresponding to the two nodes, the two nodes can be connected by a solid line; if there is a sequential generation relationship between the entities corresponding to the two nodes, the two nodes can be connected by a dotted line with an arrow.

在一种实现方式中，所述保函知识图谱还包括各所述节点分别对应的注意力权重，每一所述节点对应的所述注意力权重的确定过程包括：In one implementation, the letter of guarantee knowledge graph further includes attention weights corresponding to each of the nodes, and the process of determining the attention weights corresponding to each of the nodes includes:

步骤S203、以该节点为起始点，对所述保函知识图谱进行邻域搜索得到该节点对应的所有关联节点；Step S203: Taking the node as the starting point, a neighborhood search is performed on the letter of guarantee knowledge graph to obtain all associated nodes corresponding to the node;

步骤S204、根据各所述关联节点，确定该节点对应的关系框，其中，所述关系框为包含有各所述关联节点的最小包围框；Step S204: determining a relationship box corresponding to each associated node according to each associated node, wherein the relationship box is a minimum bounding box including each associated node;

步骤S205、根据所述关系框的大小，确定该节点对应的所述注意力权重。Step S205: Determine the attention weight corresponding to the node according to the size of the relationship box.

具体地，针对保函知识图谱中的每一节点，通过邻域搜索的方式确定图谱中与该节点有直接连接关系或者间接连接关系的所有关联节点，然后构建能够包含该节点的所有关联节点的最小包围框，即得到该节点的关系框。可以理解的是，关系框越大，表示该节点对保函知识图谱的重要性越高；关系框越小，表示该节点对保函知识图谱的重要性越低。因此可以根据该节点的关系框的大小来确定该节点的注意力权重，使得模型可以将更多的注意力集中在保函知识图谱中重要性高的节点上，从而提高模型的预测精度。Specifically, for each node in the guarantee knowledge graph, all associated nodes in the graph that have a direct or indirect connection relationship with the node are determined by neighborhood search, and then the minimum bounding box that can contain all associated nodes of the node is constructed, that is, the relationship box of the node is obtained. It can be understood that the larger the relationship box, the more important the node is to the guarantee knowledge graph; the smaller the relationship box, the less important the node is to the guarantee knowledge graph. Therefore, the attention weight of the node can be determined according to the size of the relationship box of the node, so that the model can focus more attention on the nodes with high importance in the guarantee knowledge graph, thereby improving the prediction accuracy of the model.

如图１所示，所述方法还包括：As shown in FIG1 , the method further includes:

步骤S300、确定所述保函文本对应的保函类型，根据所述保函类型确定所述保函文本对应的已训练的目标预测模型。Step S300: determine the letter of guarantee type corresponding to the letter of guarantee text, and determine the trained target prediction model corresponding to the letter of guarantee text according to the letter of guarantee type.

具体地，现有的保函文本包括多种类型，例如存在建设工程保函文本、信用保函、投标保函、质量保函等等，不同的保函的内容不同，因此评估价值的方法也不同。本实施例针对各种保函类型预先构建了多种预测模型，各预测模型预先基于海量数据训练，已经学习了各类型的保函知识图谱与保函价值之间的复杂映射关系。因此为了预测当前的保函文本的保函价值，需要首先根据该保函文本对应的保函类型，从各预测模型钟确定该保函文本对应的目标预测模型。Specifically, the existing guarantee texts include various types, such as construction project guarantee texts, credit guarantees, bid guarantees, quality guarantees, etc. Different guarantees have different contents, so the methods of evaluating value are also different. This embodiment pre-builds multiple prediction models for various types of guarantees. Each prediction model is pre-trained based on massive data and has learned the complex mapping relationship between various types of guarantee knowledge graphs and guarantee values. Therefore, in order to predict the guarantee value of the current guarantee text, it is necessary to first determine the target prediction model corresponding to the guarantee text from each prediction model based on the guarantee type corresponding to the guarantee text.

步骤S400、将所述保函知识图谱输入所述目标预测模型，得到预测保函价值。Step S400: input the letter of guarantee knowledge graph into the target prediction model to obtain the predicted letter of guarantee value.

具体地，由于目标预测模型预先基于海量数据训练，已经学习了保函知识图谱与保函价值之间的复杂映射关系，因此将当前的保函文本的保函知识图谱输入该目标预测模型，即可得到预测保函价值。Specifically, since the target prediction model has been pre-trained based on massive data and has learned the complex mapping relationship between the letter of guarantee knowledge graph and the letter of guarantee value, the predicted letter of guarantee value can be obtained by inputting the letter of guarantee knowledge graph of the current letter of guarantee text into the target prediction model.

在一种实现方式中，所述目标预测模型对应的训练过程包括：In one implementation, the training process corresponding to the target prediction model includes:

步骤S401、获取历史保函知识图谱和所述历史保函知识图谱对应的保函价值，将所述历史保函知识图谱输入未完成训练的预测模型，得到训练保函价值；其中，所述历史保函知识图谱对应的保函类型与所述保函文本相同，所述预测模型为图注意力模型；Step S401, obtaining a knowledge graph of historical letters of guarantee and a letter of guarantee value corresponding to the knowledge graph of historical letters of guarantee, inputting the knowledge graph of historical letters of guarantee into a prediction model that has not been trained, and obtaining a trained letter of guarantee value; wherein the letter of guarantee type corresponding to the knowledge graph of historical letters of guarantee is the same as the letter of guarantee text, and the prediction model is a graph attention model;

步骤S402、根据所述保函价值和所述训练保函价值的最小均方误差，确定所述预测模型对应的第二损失函数值；Step S402: determining a second loss function value corresponding to the prediction model according to the minimum mean square error between the guarantee value and the training guarantee value;

步骤S403、判断所述第二损失函数值是否收敛至预设值，若否，根据所述第二损失函数值对所述预测模型进行参数更新，得到更新预测模型；Step S403: determining whether the second loss function value converges to a preset value; if not, updating the parameters of the prediction model according to the second loss function value to obtain an updated prediction model;

步骤S404、将所述更新预测模型作为所述预测模型，继续执行获取历史保函知识图谱和所述历史保函知识图谱对应的保函价值，将所述历史保函知识图谱输入未完成训练的预测模型的步骤，直至得到的所述第二损失函数值收敛至所述预设值，得到所述目标预测模型。Step S404: use the updated prediction model as the prediction model, continue to execute the steps of obtaining the historical guarantee knowledge graph and the guarantee value corresponding to the historical guarantee knowledge graph, and input the historical guarantee knowledge graph into the prediction model that has not been trained, until the obtained second loss function value converges to the preset value, and the target prediction model is obtained.

具体地，本实施例中的预测模型为未完成训练的模型，目标预测模型未对应的已完成训练的模型。预测模型采用的是迭代训练的方式，每一轮训练采用一组训练数据，包括一个历史保函知识图谱和该历史保函知识图谱对应的真实的保函价值。将该历史保函知识图谱输入预测模型，得到预测模型预测出的保函价值，即得到训练保函价值。通过训练保函价值与真实的保函价值的最小均方误差计算第二损失函数值，第二损失函数值即可反映预测模型的预测误差。本实施例中的训练目标是将第二损失函数值收敛至预设值，因此若当前轮的第二损失函数值未收敛至预设值，则根据第二损失函数值进行模型参数更新，并继续下一轮训练；若当前轮的第二损失函数值已收敛至预设值，则停止模型参数更新和模型训练，将当前的预测模型作为目标预测模型。Specifically, the prediction model in this embodiment is a model that has not completed training, and the target prediction model does not correspond to the completed training model. The prediction model adopts an iterative training method, and each round of training uses a set of training data, including a historical guarantee knowledge graph and the real guarantee value corresponding to the historical guarantee knowledge graph. The historical guarantee knowledge graph is input into the prediction model to obtain the guarantee value predicted by the prediction model, that is, the training guarantee value. The second loss function value is calculated by the minimum mean square error between the training guarantee value and the real guarantee value, and the second loss function value can reflect the prediction error of the prediction model. The training goal in this embodiment is to converge the second loss function value to the preset value. Therefore, if the second loss function value of the current round has not converged to the preset value, the model parameters are updated according to the second loss function value, and the next round of training continues; if the second loss function value of the current round has converged to the preset value, the model parameter update and model training are stopped, and the current prediction model is used as the target prediction model.

基于上述实施例，本发明还提供了一种保函信息提取与价值预测***，如图2所示，所述***包括：Based on the above embodiment, the present invention further provides a system for extracting letter of guarantee information and predicting value, as shown in FIG2 , the system comprises:

信息提取模块01，用于获取保函文本和已训练的目标提取模型，将所述保函文本输入所述目标提取模型，得到所述保函文本对应的若干三元组，每一所述三元组用于反映所述保函文本中两个实体之间的关系；Information extraction module 01, used for obtaining a letter of guarantee text and a trained target extraction model, inputting the letter of guarantee text into the target extraction model, and obtaining a plurality of triples corresponding to the letter of guarantee text, each of which is used for reflecting the relationship between two entities in the letter of guarantee text;

图谱生成模块02，用于根据各所述三元组，生成所述保函文本对应的保函知识图谱；A graph generation module 02, used for generating a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples;

模型选择模块03，用于确定所述保函文本对应的保函类型，根据所述保函类型确定所述保函文本对应的已训练的目标预测模型；Model selection module 03, used to determine the letter of guarantee type corresponding to the letter of guarantee text, and determine the trained target prediction model corresponding to the letter of guarantee text according to the letter of guarantee type;

价值预测模块04，用于将所述保函知识图谱输入所述目标预测模型，得到预测保函价值。The value prediction module 04 is used to input the guarantee knowledge graph into the target prediction model to obtain the predicted guarantee value.

基于上述实施例，本发明还提供了一种终端，其原理框图可以如图3所示。该终端包括通过***总线连接的处理器、存储器、网络接口、显示屏。其中，该终端的处理器用于提供计算和控制能力。该终端的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作***和计算机程序。该内存储器为非易失性存储介质中的操作***和计算机程序的运行提供环境。该终端的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现保函信息提取与价值预测方法。该终端的显示屏可以是液晶显示屏或者电子墨水显示屏。Based on the above embodiments, the present invention also provides a terminal, whose principle block diagram can be shown in Figure 3. The terminal includes a processor, a memory, a network interface, and a display screen connected through a system bus. Among them, the processor of the terminal is used to provide computing and control capabilities. The memory of the terminal includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer program in the non-volatile storage medium. The network interface of the terminal is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, the guarantee information extraction and value prediction method is implemented. The display screen of the terminal can be a liquid crystal display or an electronic ink display.

本领域技术人员可以理解，图3中示出的原理框图，仅仅是与本发明方案相关的部分结构的框图，并不构成对本发明方案所应用于其上的终端的限定，具体的终端可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art will understand that the principle block diagram shown in FIG3 is only a block diagram of a partial structure related to the solution of the present invention, and does not constitute a limitation on the terminal to which the solution of the present invention is applied. The specific terminal may include more or fewer components than those shown in the figure, or combine certain components, or have a different arrangement of components.

在一种实现方式中，所述终端的存储器中存储有一个以上的程序，且经配置以由一个以上处理器执行所述一个以上程序包含用于进行保函信息提取与价值预测方法的指令。In one implementation, the terminal has one or more programs stored in its memory, and is configured to be executed by one or more processors. The one or more programs include instructions for performing a method for extracting letter of guarantee information and predicting value.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本发明所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器（ROM）、可编程ROM（PROM）、电可编程ROM（EPROM）、电可擦除可编程ROM（EEPROM）或闪存。易失性存储器可包括随机存取存储器（RAM）或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM（SRAM）、动态RAM（DRAM）、同步DRAM（SDRAM）、双数据率SDRAM（DDRSDRAM）、增强型SDRAM（ESDRAM）、同步链路（Synchlink） DRAM（SLDRAM）、存储器总线（Rambus）直接RAM（RDRAM）、直接存储器总线动态RAM（DRDRAM）、以及存储器总线动态RAM（RDRAM）等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment method can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above-mentioned methods. Among them, any reference to memory, storage, database or other media used in the embodiments provided by the present invention can include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM) or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

综上所述，本发明公开了保函信息提取与价值预测方法、***、终端及存储介质，所述方法通过获取保函文本和已训练的目标提取模型，将所述保函文本输入所述目标提取模型，得到所述保函文本对应的若干三元组，每一所述三元组用于反映所述保函文本中两个实体之间的关系；根据各所述三元组，生成所述保函文本对应的保函知识图谱；确定所述保函文本对应的保函类型，根据所述保函类型确定所述保函文本对应的已训练的目标预测模型；将所述保函知识图谱输入所述目标预测模型，得到预测保函价值。本发明采用基于深度学习方法构建的目标提取模型和目标预测模型可以准确提取保函文本中的信息，并基于提取出的信息准确地评估保函文本的价值。解决了现有技术中采用人工评估保函的价值，难以得到客观的保函价值的问题。In summary, the present invention discloses a method, system, terminal and storage medium for extracting and predicting the value of a letter of guarantee information. The method obtains a letter of guarantee text and a trained target extraction model, inputs the letter of guarantee text into the target extraction model, and obtains a number of triples corresponding to the letter of guarantee text, each of which is used to reflect the relationship between two entities in the letter of guarantee text; generates a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples; determines the letter of guarantee type corresponding to the letter of guarantee text, and determines the trained target prediction model corresponding to the letter of guarantee text according to the letter of guarantee type; inputs the letter of guarantee knowledge graph into the target prediction model to obtain the predicted letter of guarantee value. The present invention uses a target extraction model and a target prediction model constructed based on a deep learning method to accurately extract information from the letter of guarantee text, and accurately evaluates the value of the letter of guarantee text based on the extracted information. This solves the problem that the value of a letter of guarantee is manually evaluated in the prior art, and it is difficult to obtain an objective letter of guarantee value.

应当理解的是，本发明的应用不限于上述的举例，对本领域普通技术人员来说，可以根据上述说明加以改进或变换，所有这些改进和变换都应属于本发明所附权利要求的保护范围。It should be understood that the application of the present invention is not limited to the above examples. For ordinary technicians in this field, improvements or changes can be made based on the above description. All these improvements and changes should fall within the scope of protection of the claims attached to the present invention.

Claims

一种保函信息提取与价值预测方法，其特征在于，所述方法包括：A method for extracting letter of guarantee information and predicting its value, characterized in that the method comprises:

获取保函文本和已训练的目标提取模型，将所述保函文本输入所述目标提取模型，得到所述保函文本对应的若干三元组，每一所述三元组用于反映所述保函文本中两个实体之间的关系；Obtaining a letter of guarantee text and a trained target extraction model, inputting the letter of guarantee text into the target extraction model, and obtaining a plurality of triples corresponding to the letter of guarantee text, each of which is used to reflect a relationship between two entities in the letter of guarantee text;

根据各所述三元组，生成所述保函文本对应的保函知识图谱；Generate a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples;

确定所述保函文本对应的保函类型，根据所述保函类型确定所述保函文本对应的已训练的目标预测模型；Determine the letter of guarantee type corresponding to the letter of guarantee text, and determine the trained target prediction model corresponding to the letter of guarantee text according to the letter of guarantee type;

将所述保函知识图谱输入所述目标预测模型，得到预测保函价值。The letter of guarantee knowledge graph is input into the target prediction model to obtain the predicted letter of guarantee value.
根据权利要求1所述的保函信息提取与价值预测方法，其特征在于，所述目标提取模型对应的训练过程包括：The method for extracting letter of guarantee information and predicting value according to claim 1, characterized in that the training process corresponding to the target extraction model includes:

获取历史保函文本，根据所述历史保函文本确定若干第一训练数据，其中，每一所述第一训练数据包含所述历史保函文本中的语句和该语句对应的标注信息，所述标注信息用于反映该语句中包含的实体和各实体之间的关系；Acquire a historical letter of guarantee text, and determine a plurality of first training data according to the historical letter of guarantee text, wherein each of the first training data includes a sentence in the historical letter of guarantee text and annotation information corresponding to the sentence, and the annotation information is used to reflect the entities included in the sentence and the relationship between the entities;

获取预先经过训练的目标双向语言模型，其中，所述目标双向语言模型的输入数据为通过掩码掩盖后的保函文本语句，所述目标双向语言模型的输出数据为预测的被所述掩码掩盖的词语；Obtaining a pre-trained target bidirectional language model, wherein the input data of the target bidirectional language model is the text sentence of the letter of guarantee masked by the mask, and the output data of the target bidirectional language model is the predicted words masked by the mask;

对所述目标双向语言模型进行调整，得到提取模型，其中，所述提取模型的输入数据为所述历史保函文本中的语句，所述提取模型的输出数据为预测的该语句中包含的实体和各实体之间的关系；The target bidirectional language model is adjusted to obtain an extraction model, wherein the input data of the extraction model is a sentence in the historical letter of guarantee text, and the output data of the extraction model is the predicted entities contained in the sentence and the relationship between the entities;

根据各所述第一训练数据对所述提取模型进行迭代训练，得到所述目标提取模型。The extraction model is iteratively trained according to each of the first training data to obtain the target extraction model.
根据权利要求2所述的保函信息提取与价值预测方法，其特征在于，所述目标双向语言模型对应的训练过程包括：The method for extracting letter of guarantee information and predicting value according to claim 2, wherein the training process corresponding to the target bidirectional language model comprises:

根据所述历史保函文本，确定若干所述保函文本语句；Determining a number of statements in the letter of guarantee based on the historical letter of guarantee text;

通过掩码对所述保函文本语句中的词语进行掩盖得到掩盖语句，根据被掩盖的词语生成所述掩盖语句对应的标签信息；Masking the words in the text sentence of the letter of guarantee by means of a mask to obtain a masked sentence, and generating label information corresponding to the masked sentence according to the masked words;

将所述掩盖语句输入未完成训练的双向语言模型，得到所述双向语言模型基于所述掩盖语句生成的预测词语；Inputting the masked sentence into a bidirectional language model that has not completed training, to obtain predicted words generated by the bidirectional language model based on the masked sentence;

根据所述预测词语和所述标签信息，生成所述双向语言模型对应的第一损失函数值；Generating a first loss function value corresponding to the bidirectional language model according to the predicted words and the label information;

判断所述第一损失函数值是否收敛至目标值，若否，根据所述第一损失函数值对所述双向语言模型进行参数更新，得到更新双向语言模型；Determine whether the first loss function value converges to a target value, and if not, update parameters of the bidirectional language model according to the first loss function value to obtain an updated bidirectional language model;

将所述更新双向语言模型作为所述双向语言模型，继续执行通过掩码对所述保函文本语句中的词语进行掩盖得到掩盖语句，直至所述第一损失函数值收敛至所述目标值，得到所述目标双向语言模型。The updated bidirectional language model is used as the bidirectional language model, and the masking of words in the guarantee text sentence is continued to obtain a masked sentence, until the first loss function value converges to the target value, thereby obtaining the target bidirectional language model.
根据权利要求3所述的保函信息提取与价值预测方法，其特征在于，所述通过掩码对所述保函文本语句中的词语进行掩盖得到掩盖语句，包括：The method for extracting letter of guarantee information and predicting value according to claim 3 is characterized in that the step of masking the words in the letter of guarantee text sentence by masking to obtain the masked sentence comprises:

判断前一轮训练对应的所述第一损失函数值是否收敛至中间值，其中，所述中间值为首轮训练对应的所述第一损失函数值与所述目标值的中间数值；Determine whether the first loss function value corresponding to the previous round of training converges to an intermediate value, wherein the intermediate value is an intermediate value between the first loss function value corresponding to the first round of training and the target value;

若前一轮训练对应的所述第一损失函数值未收敛至所述中间值，通过所述掩码对所述保函文本语句中的词语进行随机掩盖得到所述掩盖语句；If the first loss function value corresponding to the previous round of training does not converge to the intermediate value, randomly masking the words in the text sentence of the letter of guarantee by using the mask to obtain the masked sentence;

若前一轮训练对应的所述第一损失函数值收敛至所述中间值时，获取所述保函文本语句中各词语分别对应的掩盖概率，其中，每一词语对应的所述掩盖概率与该词语对所述历史保函文本的保函价值的贡献度成正比关系；If the first loss function value corresponding to the previous round of training converges to the intermediate value, obtain the masking probability corresponding to each word in the guarantee text sentence, wherein the masking probability corresponding to each word is proportional to the contribution of the word to the guarantee value of the historical guarantee text;

通过所述掩码基于所述保函文本语句中各词语分别对应的所述掩盖概率，对所述保函文本语句中的词语进行掩盖得到所述掩盖语句。The masked sentence is obtained by masking the words in the letter of guarantee text sentence based on the masking probabilities respectively corresponding to the words in the letter of guarantee text sentence through the masking code.
根据权利要求1所述的保函信息提取与价值预测方法，其特征在于，所述根据各所述三元组，生成所述保函文本对应的保函知识图谱，包括：The method for extracting letter of guarantee information and predicting value according to claim 1 is characterized in that generating a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples comprises:

根据各所述三元组中包含的实体一一对应地生成所述保函知识图谱中的节点；Generate nodes in the letter of guarantee knowledge graph according to the entities contained in each triplet in a one-to-one correspondence;

根据各所述三元组中包含的实体之间的关系，对所述保函知识图谱中的各节点进行连线，得到所述保函知识图谱，其中，不同类型的关系分别对应不同类型的连线。According to the relationship between the entities contained in each of the triples, each node in the guarantee knowledge graph is connected to obtain the guarantee knowledge graph, wherein different types of relationships correspond to different types of connections.
根据权利要求5所述的保函信息提取与价值预测方法，其特征在于，所述保函知识图谱还包括各所述节点分别对应的注意力权重，每一所述节点对应的所述注意力权重的确定过程包括：The method for extracting letter of guarantee information and predicting value according to claim 5 is characterized in that the letter of guarantee knowledge graph also includes attention weights corresponding to each of the nodes, and the process of determining the attention weight corresponding to each of the nodes includes:

以该节点为起始点，对所述保函知识图谱进行邻域搜索得到该节点对应的所有关联节点；Taking the node as the starting point, a neighborhood search is performed on the letter of guarantee knowledge graph to obtain all associated nodes corresponding to the node;

根据各所述关联节点，确定该节点对应的关系框，其中，所述关系框为包含有各所述关联节点的最小包围框；According to each of the associated nodes, determining a relationship box corresponding to the node, wherein the relationship box is a minimum bounding box including each of the associated nodes;

根据所述关系框的大小，确定该节点对应的所述注意力权重。The attention weight corresponding to the node is determined according to the size of the relationship box.
根据权利要求1所述的保函信息提取与价值预测方法，其特征在于，所述目标预测模型对应的训练过程包括：The method for extracting letter of guarantee information and predicting value according to claim 1, characterized in that the training process corresponding to the target prediction model includes:

获取历史保函知识图谱和所述历史保函知识图谱对应的保函价值，将所述历史保函知识图谱输入未完成训练的预测模型，得到训练保函价值；其中，所述历史保函知识图谱对应的保函类型与所述保函文本相同，所述预测模型为图注意力模型；Obtain a historical letter of guarantee knowledge graph and a letter of guarantee value corresponding to the historical letter of guarantee knowledge graph, input the historical letter of guarantee knowledge graph into an untrained prediction model, and obtain a trained letter of guarantee value; wherein the letter of guarantee type corresponding to the historical letter of guarantee knowledge graph is the same as the letter of guarantee text, and the prediction model is a graph attention model;

根据所述保函价值和所述训练保函价值的最小均方误差，确定所述预测模型对应的第二损失函数值；Determining a second loss function value corresponding to the prediction model according to a minimum mean square error between the letter of guarantee value and the training letter of guarantee value;

判断所述第二损失函数值是否收敛至预设值，若否，根据所述第二损失函数值对所述预测模型进行参数更新，得到更新预测模型；Determine whether the second loss function value converges to a preset value, and if not, update the parameters of the prediction model according to the second loss function value to obtain an updated prediction model;

将所述更新预测模型作为所述预测模型，继续执行获取历史保函知识图谱和所述历史保函知识图谱对应的保函价值，将所述历史保函知识图谱输入未完成训练的预测模型的步骤，直至得到的所述第二损失函数值收敛至所述预设值，得到所述目标预测模型。Taking the updated prediction model as the prediction model, continue to execute the steps of obtaining the historical guarantee knowledge graph and the guarantee value corresponding to the historical guarantee knowledge graph, and inputting the historical guarantee knowledge graph into the prediction model that has not completed training, until the obtained second loss function value converges to the preset value, and the target prediction model is obtained.
一种保函信息提取与价值预测***，其特征在于，所述***包括：A guarantee information extraction and value prediction system, characterized in that the system comprises:

信息提取模块，用于获取保函文本和已训练的目标提取模型，将所述保函文本输入所述目标提取模型，得到所述保函文本对应的若干三元组，每一所述三元组用于反映所述保函文本中两个实体之间的关系；An information extraction module, used for obtaining a letter of guarantee text and a trained target extraction model, inputting the letter of guarantee text into the target extraction model, and obtaining a plurality of triples corresponding to the letter of guarantee text, each of which is used for reflecting a relationship between two entities in the letter of guarantee text;

图谱生成模块，用于根据各所述三元组，生成所述保函文本对应的保函知识图谱；A graph generation module, used for generating a letter of guarantee knowledge graph corresponding to the letter of guarantee text according to each of the triples;

模型选择模块，用于确定所述保函文本对应的保函类型，根据所述保函类型确定所述保函文本对应的已训练的目标预测模型；A model selection module, used to determine the type of letter of guarantee corresponding to the letter of guarantee text, and determine a trained target prediction model corresponding to the letter of guarantee text according to the letter of guarantee type;

价值预测模块，用于将所述保函知识图谱输入所述目标预测模型，得到预测保函价值。The value prediction module is used to input the guarantee knowledge graph into the target prediction model to obtain the predicted guarantee value.
一种终端，其特征在于，所述终端包括有存储器和一个以上处理器；所述存储器存储有一个以上的程序；所述程序包含用于执行如权利要求1-7中任一所述的保函信息提取与价值预测方法的指令；所述处理器用于执行所述程序。A terminal, characterized in that the terminal includes a memory and one or more processors; the memory stores one or more programs; the program contains instructions for executing the guarantee information extraction and value prediction method as described in any one of claims 1-7; and the processor is used to execute the program.
一种计算机可读存储介质，其上存储有多条指令，其特征在于，所述指令适用于由处理器加载并执行，以实现上述权利要求1-7任一所述的保函信息提取与价值预测方法的步骤。A computer-readable storage medium having a plurality of instructions stored thereon, characterized in that the instructions are suitable for being loaded and executed by a processor to implement the steps of the method for extracting letter of guarantee information and predicting value as described in any one of claims 1 to 7 above.