CN115204120B

CN115204120B - Insurance field triplet extraction method and device, electronic equipment and storage medium

Info

Publication number: CN115204120B
Application number: CN202210875618.7A
Authority: CN
Inventors: 杨坤; 王燕蒙; 李剑锋; 王少军
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2023-05-30
Anticipated expiration: 2042-07-25
Also published as: CN115204120A

Abstract

The invention relates to the field of natural language, and discloses a method and a device for extracting triples in the insurance field, electronic equipment and a readable storage medium, wherein the method comprises the following steps: encoding the insurance target text to obtain a text vector and a text vector position code; predicting the head and tail positions of a first entity in the insurance target text by using a first half pointer half annotation prediction model; searching a first entity text vector and a first entity text vector position code of the first entity; encoding the first entity text vector, and splicing the encoded vector and the position encoding of the first entity text vector to obtain a first spliced vector; predicting the corresponding relation of the first entity and the second entity by using a second half pointer half label prediction model; and generating a triplet of the insurance target text according to the first entity, the corresponding relation of the first entity and the second entity. The invention can improve the comprehensiveness and accuracy of triad extraction in the insurance field.

Description

Insurance field triplet extraction method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of natural language, and in particular, to a method and apparatus for extracting triples in the insurance field, an electronic device, and a readable storage medium.

Background

With development and progress of technology, when reading an insurance document, in order to facilitate a user to quickly grasp the content of the insurance document, a knowledge graph is often generated according to the insurance document, where a triplet is a form of the knowledge graph, a triplet refers to a set of (s, p, o) forms, s is an entity, o is an entity or an attribute value, p is a relationship between two entities or an attribute of an entity, for example, a (personal An Quanxian, beneficiary, a Ming) triplet may be extracted from "a beneficiary of personal safety risk is Ming".

The relationship types or attribute types of triples in the conventional insurance domain triplet extraction method are all fixed, so that the extraction flow of the conventional insurance domain triplet extraction method is mainly to extract one entity, then judge whether the entity belongs to one of the fixed relationship types or the fixed attribute types, and find the attribute value corresponding to the other entity or the entity attribute, but in the open domain type, the same text (s, o) can have multiple relationship types or attribute types, so that the text cannot completely cover all relationship types or attribute types when the insurance domain triplet is extracted, thereby not completing the extraction of all text triplet forms, limiting the extraction range of the insurance domain triplet, and reducing the comprehensiveness and accuracy of the text insurance domain triplet extraction.

Disclosure of Invention

The invention provides a method, a device, electronic equipment and a readable storage medium for extracting triples in the insurance field, and aims to improve the comprehensiveness and accuracy of the extraction of the triples in the insurance field.

In order to achieve the above object, the present invention provides a method for extracting triples in insurance field, which includes:

acquiring an insurance target text, and coding the insurance target text by using a coding module in a preset Bert model to obtain a text vector and a text vector position code;

adding a corresponding weight to the text vector by using a preset first self-attention mechanism to obtain a first weight text vector, and predicting the head and tail positions of a first entity in the insurance target text by using a preset first half pointer half-label prediction model based on the first weight text vector;

searching the first entity text vector and the first entity text vector position code according to the head-to-tail position of the first entity, the text vector and the text vector position code;

encoding the first entity text vector by using a preset bidirectional long-short-time memory network to obtain a first entity encoding vector, and splicing the first entity encoding vector and the first entity text vector position code to obtain a first splicing vector;

Adding a corresponding weight to the text vector by using a preset second self-attention mechanism to obtain a second weight text vector, and splicing the second weight text vector and the first splicing vector to obtain a second splicing vector;

according to the second splicing vector, predicting the corresponding relation of the first entity and the second entity by using a preset second half pointer half label prediction model;

when the first entity, the corresponding relation of the first entity and the second entity can form a closed loop, generating the triplet of the insurance target text according to the corresponding relation of the first entity, the first entity and the second entity.

Optionally, the encoding the insurance target text by using an encoding module in a preset Bert model to obtain a text vector and a text vector position code, including:

extracting text features of the insurance target text, and carrying out word mixed coding on the text features to obtain a word vector sequence;

performing position index coding on each text in the insurance target text to obtain text vector position coding;

and adding the word vector sequence and the text vector position codes to obtain text splicing vectors, and coding the text splicing vectors by using a coding module in a preset Bert model to obtain text vectors.

Optionally, the performing word hybrid encoding on the text feature to obtain a word vector sequence includes:

each word in the insurance target text is coded by using a preset word coding layer, so that a word vector sequence is obtained;

word segmentation processing is carried out on the insurance target text to obtain text word segmentation;

extracting feature vectors of the text word segmentation to obtain a word vector sequence;

expanding the word vector according to the word number of the word corresponding to the word vector sequence to obtain an aligned word vector sequence aligned with the word vector sequence;

performing cross multiplication on the aligned word vector sequence and a preset transformation matrix to obtain a target word vector sequence with the same dimension as the word vector sequence;

and adding the target word vector sequence and the corresponding word vector sequence to obtain a word vector sequence.

Optionally, the predicting, based on the first weight text vector, the head and tail positions of the first entity in the insurance target text by using a preset first half pointer half label prediction model includes:

screening the head and tail positions of a first entity to be selected by using a starting pointer and an ending pointer in a preset first half pointer half-labeling prediction model;

Performing position coding on the first entity to be selected according to the head-to-tail position of the first entity to be selected to obtain a first entity to be selected position coding vector;

performing point multiplication on the first entity position coding vector to be selected and the first weight text vector to obtain a first weighted summation vector;

and screening a first target weighted summation vector larger than a first preset value from the first weighted summation vector, and taking the head and tail positions of a first entity to be selected corresponding to the first target weighted summation vector as the head and tail positions of the first entity.

Optionally, after predicting the head-tail position of the first entity in the insurance target text by using a preset first half pointer half label prediction model based on the first weight text vector, the method further includes:

inquiring the insurance target text according to the head-to-tail position of the first entity to obtain the first entity;

and classifying the first entity according to a preset entity class comparison table to obtain a first entity class.

Optionally, predicting, according to the second stitching vector, the relationship corresponding to the first entity and the second entity by using a preset second half-pointer half-label prediction model includes:

Screening the head and tail positions of a second entity to be selected from the first entity by utilizing a starting pointer and an ending pointer in a preset second half pointer half-labeling prediction model;

performing position coding on the second entity to be selected according to the head and tail positions of the second entity to be selected to obtain a second entity to be selected position coding vector;

performing dot multiplication on the second entity position coding vector to be selected and the second weight text vector to obtain a second weighted summation vector;

screening a second target weighted sum vector larger than a second preset value from the second weighted sum vector, and taking the head and tail positions of a second entity to be selected corresponding to the second target weighted sum vector as the head and tail positions of the second entity; inquiring the insurance target text according to the head-to-tail position of the second entity to obtain the second entity;

performing entity classification on the second entity according to the entity class comparison table to obtain a second entity class;

and predicting the corresponding relation of the first entity according to the first entity class and the second entity class.

Optionally, the encoding the first entity text vector by using a preset bidirectional long-short-time memory network to obtain a first entity encoded vector includes:

Acquiring the number of the first entity text vectors, selecting the preset bidirectional long and short time memory units which are the same as the number of the first entity text vectors, and splicing all the preset bidirectional long and short time memory units to obtain a target bidirectional long and short time memory network;

performing dimension reduction on the first entity text vector by utilizing an embedded layer in the target bidirectional long-short-term memory network to obtain a first entity text dimension reduction vector;

and respectively inputting the first entity text dimension-reducing vector into one of the forward and reverse long-short-time memory network units of the network layer in each layer of the target bidirectional long-short-time memory network, and splicing the output results of the forward and reverse long-short-time memory network units of the network layer by utilizing the connecting layer in the target bidirectional long-short-time memory network to obtain the first entity coding vector.

In order to solve the above problems, the present invention further provides an insurance field triplet extraction apparatus, the apparatus includes:

the first entity prediction module is used for obtaining an insurance target text, coding the insurance target text by using a coding module in a preset Bert model to obtain a text vector and a text vector position code, adding a corresponding weight to the text vector by using a preset first self-attention mechanism to obtain a first weight text vector, and predicting the head and tail positions of a first entity in the insurance target text by using a preset first half pointer half-labeling prediction model based on the first weight text vector;

The second entity prediction module is used for searching the first entity text vector and the first entity text vector position code according to the head-to-tail position of the first entity, the text vector and the text vector position code, utilizing a preset bidirectional long-short-time memory network to encode the first entity text vector to obtain a first entity encoding vector, splicing the first entity encoding vector and the first entity text vector position code to obtain a first splicing vector, utilizing a preset second self-attention mechanism to add corresponding weights to the text vector to obtain a second weight text vector, splicing the second weight text vector and the first splicing vector to obtain a second splicing vector, and predicting the corresponding relation of the first entity and the second entity by utilizing a preset second half-pointer half-label prediction model according to the second splicing vector;

and the triplet splicing module is used for generating the triplet of the target text insurance target text according to the first entity, the relation corresponding to the first entity and the second entity when the first entity, the relation corresponding to the first entity and the second entity can form a closed loop.

In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:

a memory storing at least one computer program; and

And the processor executes the computer program stored in the memory to realize the insurance domain triplet extraction method.

In order to solve the above-mentioned problems, the present invention also provides a computer readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the insurance domain triplet extraction method described above.

According to the embodiment of the invention, the insurance target text is obtained, the insurance target text is encoded by utilizing an encoder in a preset Bert model to obtain a text vector and a text vector position code, a preset first self-attention mechanism is utilized to add corresponding weight to the text vector to obtain a first weight text vector, the interrelation among texts is fully considered, the accuracy of text prediction is improved, thereby improving the accuracy of first entity extraction, secondly, based on the first weight text vector, the head and tail positions of the first entity in the text to be extracted are predicted by utilizing a preset first half pointer half-label prediction model, the situation of error leakage when predicting the entity is reduced, the accuracy of triplet extraction is improved, and further, according to the head and tail positions of the first entity, the text vector and the text vector position code, searching the first entity text vector and the position code of the first entity text vector, utilizing a preset bidirectional long-short-time memory network to encode the first entity text vector to obtain a first entity encoding vector, splicing the first entity encoding vector and the position code of the first entity text vector to obtain a first splicing vector, utilizing a preset second self-attention mechanism to add corresponding weights to the text vector to obtain a second weight text vector, splicing the second weight text vector and the first splicing vector to obtain a second splicing vector, predicting each predicted entity to ensure that the condition of missing detection does not occur, thereby improving the comprehensiveness of triplet extraction, finally, predicting the corresponding relation of the first entity and the second entity by utilizing a preset second half-pointer half-label prediction model according to the second splicing vector, and when the first entity, the corresponding relation of the first entity and the second entity can form a closed loop, generating the triplet of the insurance target text according to the first entity, the corresponding relation of the first entity and the second entity, fully combining the position coding of the first entity, improving the prediction accuracy of the second entity and the relation, and ensuring the accuracy and the comprehensiveness of the triplet of the insurance target text. Therefore, the method, the device, the equipment and the storage medium for extracting the triples in the insurance field can improve the accuracy and the comprehensiveness of extracting the triples in the insurance field.

Drawings

FIG. 1 is a flow chart of a method for extracting triples in an insurance domain according to an embodiment of the present invention;

FIGS. 2-3 are flowcharts illustrating a detailed implementation of one of the steps in the insurance domain triplet extraction method according to an embodiment of the present invention;

FIG. 4 is a schematic block diagram of a triple extracting device in an insurance domain according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an internal structure of an electronic device for implementing a method for extracting triples in an insurance field according to an embodiment of the present invention;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the invention provides a method for extracting triples in the insurance field. The execution body of the insurance domain triplet extraction method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the insurance domain triplet extraction method may be performed by software or hardware installed in a terminal device or a server device, where the software may be a blockchain platform. The server may include an independent server, and may also include a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

Referring to fig. 1, which is a schematic flow chart of an insurance domain triplet extraction method according to an embodiment of the present invention, in an embodiment of the present invention, the insurance domain triplet extraction method includes steps S1 to S8 as follows:

s1, acquiring an insurance target text, and coding the insurance target text by using a coding module in a preset Bert model to obtain a text vector and a text vector position code.

In the embodiment of the invention, the insurance target text can be a text with logical statement smoothness, such as an insurance instruction book, insurance contract and the like. The predetermined Bert model may be a bi-directional encoder representation in a transducer, which is a pre-trained language characterization model. The coding module can be a module for compiling and converting signals or data into a signal form which can be used for communication, transmission and storage, and can be a 12-layer DGCNN network.

In an alternative embodiment of the invention, text data can be obtained by retrieving the text data in a text database or downloading the text from a webpage, and a series of pre-training is performed on the obtained text data to obtain the insurance target text with smooth logic sentences, wherein the pre-training comprises training such as mispronounced word recognition and the like.

According to the embodiment of the invention, the preset coding module in the Bert model is utilized to code the insurance target text, so that the text vector and the text vector position code are obtained, and the insurance target text can be weighted according to the position of each word, so that the keyword extraction capability of the Bert model on the insurance target text is improved, and the accuracy of the insurance field triplet extraction is improved.

Further, as an optional embodiment of the present invention, referring to fig. 2, the encoding module in the preset Bert model is used to encode the insurance target text to obtain a text vector and a text vector position code, which includes steps S11-S13:

s11, extracting text features of the insurance target text, and carrying out word mixed coding on the text features to obtain a word vector sequence;

s12, carrying out position index coding on each text in the insurance target text to obtain text vector position coding;

and S13, adding the word vector sequence and the text vector position codes to obtain text splicing vectors, and coding the text splicing vectors by using a coding module in a preset Bert model to obtain text vectors.

In an embodiment of the invention, word vector sequences are obtained by carrying out word mixing dimension reduction processing on the insurance target text, the insurance target text is converted into a low-dimension vector, so that the encoding process of the insurance target text can be smoothly and rapidly implemented, the efficiency of extracting the triplet of the insurance target text is ensured, furthermore, the position corresponding to each word in the insurance target text is defined by carrying out position index encoding on each word in the insurance target text, the subsequent weighted calculation is convenient, finally, the word vector sequences and the text vector position encoding are added, and the text splicing vector is encoded by utilizing an encoding module in a preset Bert model, so that the text vector is obtained, the keyword extracting capability of the Bert model on the insurance target text is improved, and the accuracy of extracting the triplet in the insurance field is improved.

Further, in an optional embodiment of the present invention, the performing word mixing dimension reduction processing on the insurance target text to obtain a word vector sequence includes:

In the embodiment of the invention, the preset character dimension reduction layer is mainly used for dimension reduction processing of characters to obtain a character vector sequence with shorter codes. The preset transformation matrix may be a matrix which is artificially set to change according to the dimension change of the word vector sequence.

In the alternative embodiment of the invention, as word vectors are simply used, the situation of entity identification errors is easy to occur, and therefore, word vector sequences are obtained by carrying out word mixing dimension reduction processing on the insurance target text, thereby being beneficial to improving the accuracy and the overall coverage rate of the ternary extraction in the final insurance target text insurance field.

S2, adding a corresponding weight to the text vector by using a preset first self-attention mechanism to obtain a first weight text vector.

In the embodiment of the present invention, the preset first self-attention mechanism has the capability of determining which part of the input needs to be focused and allocating limited information processing resources to important parts, and weighting the important parts.

In the embodiment of the invention, the preset first self-attention mechanism is composed of a parameter matrix, a normalized exponential function, a matrix point multiplication calculation formula and a variance reduction calculation formula, wherein the parameter matrix is used for calculating a Q, K, V matrix of the text vector entering the first self-attention mechanism, the matrix point multiplication calculation formula is used for carrying out point multiplication calculation on transposed matrixes of a Q matrix and a K matrix, the variance reduction calculation formula is used for carrying out Scale calculation on a result after carrying out point multiplication calculation on transposed matrixes of the Q matrix and the K matrix, the normalized exponential function is used for carrying out normalized calculation on a result of Scale calculation, and a common function is a softmax function.

According to the embodiment of the invention, the corresponding weight is added to the text vector by utilizing the preset first self-attention mechanism, so that the importance degree of the entity in the insurance target text is improved, the entity labeling is more accurate and rapid, and the accuracy of triad extraction in the insurance field of the insurance target text is improved.

Further, as an optional embodiment of the present invention, first, the general location of the entity may be determined according to the attribute of the entity, and the general location may be weighted by using a normalized exponential function in a preset first self-attention mechanism to obtain a first weighted text vector, where determining the general location of the entity according to the attribute of the entity refers to determining the location of the entity in the insurance target text according to the attribute of nouns, subjects, objects, and the like of the entity, for example, an applicant of the pension insurance is the elderly, where "yes" is the predicate, and therefore, "yes" is the elderly behind may be one of the entities.

S3, based on the first weight text vector, predicting the head and tail positions of the first entity in the insurance target text by using a preset first half pointer half-labeling prediction model.

In the embodiment of the invention, the preset first half pointer half annotation prediction model can be a CNN model constructed based on a half pointer-half annotation structure and is mainly used for predicting the starting position and the ending position of the target word. The first entity may be all entities predicted for the first time.

In an alternative embodiment of the invention, since the insurance target text may contain a plurality of relation triples and the triples overlap, a half pointer half label prediction model is required to predict all entities in the insurance target text, thereby improving the overall coverage rate of the insurance target text triplet extraction and ensuring that the situation of missing triples does not occur in the insurance target text triplet extraction process.

According to the embodiment of the invention, the head and tail positions of the first entity in the insurance target text are predicted by using the preset first half pointer half label prediction model, so that the situation of false leakage in predicting the entity is reduced, and the accuracy rate of triad extraction is improved.

Further, as an optional embodiment of the present invention, referring to fig. 3, the predicting, based on the first weight text vector, the head-tail position of the first entity in the insurance target text by using a preset first half pointer half label prediction model includes S31-S34:

s31, screening the head and tail positions of the entity to be selected by using a starting pointer and an ending pointer in a preset first half pointer half-labeling prediction model;

s32, carrying out position coding on the first entity to be selected according to the head-to-tail position of the first entity to be selected to obtain a first entity to be selected position coding vector;

S33, performing dot multiplication on the first entity position coding vector to be selected and the first weight text vector to obtain a first weighted summation vector;

s34, a first target weighted summation vector larger than a first preset value is screened out from the first weighted summation vector, and the head and tail positions of a first entity to be selected corresponding to the first target weighted summation vector are used as the head and tail positions of the first entity.

In the embodiment of the present invention, the start pointer and the end pointer may be memory addresses of the head and tail positions of the entity.

In addition, as an optional embodiment of the present invention, after the predicting the head and tail positions of the first entity in the insurance target text by using a preset first half pointer half label prediction model based on the first weight text vector, the method further includes:

In the embodiment of the present invention, the preset entity class comparison table may be a comparison table of each entity and its class, for example, the entity class of Xiaoming may be a name of a person, the entity class of Beijing may be an address, and the entity class of pension may be an insurance name.

In an alternative embodiment of the present invention, since the relationship between entities needs to be determined by entity class, before predicting the relationship between the first entity and the first entity, the first entity needs to be classified.

S4, searching the first entity text vector and the first entity text vector position code according to the head and tail positions of the first entity, the text vector and the text vector position code.

In an alternative embodiment of the present invention, since the text vector and the text vector position code include the vector and the vector position code of all words in the insurance target text, the prediction of the second entity and the relation will be affected, and therefore, the first entity text vector and the first entity text vector position code need to be searched according to the head-tail position of the first entity, the text vector and the text vector position code.

Further, as an optional embodiment of the present invention, a corresponding first entity may be found from the insurance target text according to the head-tail position of the first entity, and a first entity text vector position code and a first entity word vector sequence may be determined according to the position of the first entity in the insurance target text, and finally the first entity text vector position code and the first entity word vector sequence may be added to obtain a first entity text vector.

And S5, coding the first entity text vector by using a preset bidirectional long-short-time memory network to obtain a first entity coding vector, and splicing the first entity coding vector and the first entity text vector position code to obtain a first spliced vector.

In the embodiment of the invention, the preset bidirectional long-short-time memory network may be a neural network for deep learning, and the preset bidirectional long-short-time memory network is composed of an embedded layer, a network layer, a connection layer, an output layer and the like, wherein the embedded layer may be a level for reducing the vector dimension, the network layer may be a level for forward calculation and backward calculation of the vector, i.e. a level for connecting up and down vectors, the connection layer may be a level for splicing the output vectors in the network layer, and the output layer may be a level for reducing the vector dimension of the vector spliced by the connection layer.

In an alternative embodiment of the present invention, in order to ensure that the first entity text vector does not have the condition of gradient vanishing and gradient explosion in the encoding process, and the encoding result of the first entity text vector can be memorized for a long time, therefore, a bidirectional long-short-time memory network can be adopted to encode the first entity text vector.

Further, as an optional embodiment of the present invention, the encoding the first entity text vector using a preset bidirectional long-short-time memory network to obtain a first entity encoded vector includes:

In an alternative embodiment of the present invention, in order to ensure that the bidirectional long-short-time memory network can encode the first entity, the number of layers of the bidirectional long-short-time memory network is aligned with the number of the first entities, that is, the number of layers of the bidirectional long-short-time memory network is the same as the number of the first entities.

Further, as an optional embodiment of the present invention, the first entity encoding vector and the first entity text vector position code may be spliced by a simple vector splicing method, so as to more accurately find the relationship corresponding to the first entity and the second entity, thereby improving the accuracy of insurance target text triplet extraction.

S6, adding a corresponding weight to the text vector by using a preset second self-attention mechanism to obtain a second weight text vector, and splicing the second weight text vector and the first splicing vector to obtain a second splicing vector.

In the embodiment of the present invention, the preset second self-attention mechanism has the same function as the preset first self-attention mechanism.

In the embodiment of the present invention, the process of adding a corresponding weight to the text vector by using the preset second self-attention mechanism to obtain a second weighted text vector is similar to the process of adding a corresponding weight to the text vector by using the preset first self-attention mechanism to obtain a first weighted text vector, so that the description is omitted.

In an alternative embodiment of the invention, the second weight text vector and the first splicing vector are spliced by a vector splicing method to obtain the second splicing vector, so that the corresponding second entity can be ensured to be accurately found in the first entity according to the second weight text vector, and the accuracy of insurance target text triplet extraction is improved.

And S7, predicting the corresponding relation of the first entity and the second entity by using a preset second half pointer half label prediction model according to the second splicing vector.

In the embodiment of the invention, the preset second half pointer half label prediction model is similar to the preset first half pointer half label prediction model.

According to the second splicing vector, the preset second half pointer half label prediction model is utilized to predict the corresponding relation of the first entity and the second entity, so that the first entity, the corresponding relation of the first entity and the second entity can form a closed loop, and a triplet of the insurance target text is generated.

Further, as an optional embodiment of the present invention, the predicting, according to the second stitching vector, the relationship corresponding to the first entity and the second entity by using a preset second half-pointer half-label prediction model includes:

In an alternative embodiment of the present invention, the head-tail position of the second entity may be predicted by using the same method as the method for predicting the head-tail position of the first entity in the insurance target text by using the preset first half pointer half label prediction model, and the category of the second entity may be determined according to the head-tail position of the second entity and the entity category comparison table, so that the corresponding relationship of the first entity is predicted according to the category of the first entity and the category of the second entity.

And S8, when the first entity, the corresponding relation of the first entity and the second entity can form a closed loop, generating a triplet of the insurance target text according to the first entity, the corresponding relation of the first entity and the second entity.

In this alternative embodiment of the present invention, because the prediction of the relationship corresponding to the first entity and the second entity are performed on the basis of the first entity, the relationship between the first entity and the second entity and the relationship between the first entity and the second entity can be obtained, and when the relationship between the first entity, the relationship between the first entity and the second entity, and the second entity are the same in the three types of relationships, it is determined that the first entity, the relationship between the first entity, and the second entity can form a closed loop.

Further, as an optional embodiment of the present invention, the first entity, the relationship corresponding to the first entity, and the second entity are extracted, and a triplet of the insurance target text is generated according to a triplet structure, so that the accuracy of extracting the triplet of the insurance target text is ensured.

FIG. 4 is a functional block diagram of a triad extraction device in the insurance field of the present invention.

The insurance domain triplet extraction apparatus 100 of the present invention may be installed in an electronic device. Depending on the implementation, the insurance domain triplet extraction apparatus 100 may include a first entity prediction module 101, a second entity prediction module 102, and a triplet splicing module 103, where the modules may also be referred to as units, and refer to a series of computer program segments that can be executed by a processor of an electronic device and can perform a fixed function, and are stored in a memory of the electronic device.

In the present embodiment, the functions concerning the respective modules/units are as follows:

the first entity prediction module 101 is configured to obtain an insurance target text, encode the insurance target text by using an encoding module in a preset Bert model to obtain a text vector and a text vector position code, add a corresponding weight to the text vector by using a preset first self-attention mechanism to obtain a first weight text vector, and predict a head and tail position of a first entity in the insurance target text by using a preset first half pointer half-label prediction model based on the first weight text vector.

The second entity prediction module 102 is configured to search the first entity text vector and the first entity text vector position code according to the head-to-tail position of the first entity, the text vector and the text vector position code, encode the first entity text vector by using a preset bidirectional long-short-time memory network to obtain a first entity encoded vector, splice the first entity encoded vector and the first entity text vector position code to obtain a first spliced vector, add a corresponding weight to the text vector by using a preset second self-attention mechanism to obtain a second weight text vector, splice the second weight text vector and the first spliced vector to obtain a second spliced vector, and predict a relationship corresponding to the first entity and a second entity by using a preset second half-pointer semi-label prediction model according to the second spliced vector.

The triplet splicing module 103 is configured to generate a triplet of the target text insurance target text according to the first entity, the relationship corresponding to the first entity, and the second entity when the first entity, the relationship corresponding to the first entity, and the second entity can form a closed loop.

Fig. 5 is a schematic structural diagram of an electronic device for implementing the triple extraction method in the insurance field according to the present invention.

The electronic device may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as an insurance domain triplet extraction program, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, such as a mobile hard disk of the electronic device. The memory 11 may in other embodiments also be an external storage device of the electronic device, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in an electronic device and various data, such as codes of a triplet extraction program in an insurance field, but also to temporarily store data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing Unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device and processes data by running or executing programs or modules (e.g., insurance field triplet extraction programs, etc.) stored in the memory 11, and calling data stored in the memory 11.

The communication bus 12 may be a peripheral component interconnect standard (PerIPheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The communication bus 12 is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

Fig. 5 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 5 is not limiting of the electronic device and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.

For example, although not shown, the electronic device may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device may further include various sensors, bluetooth modules, wi-Fi modules, etc., which are not described herein.

Optionally, the communication interface 13 may comprise a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices.

Optionally, the communication interface 13 may further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The insurance domain triplet extraction program stored by the memory 11 in the electronic device is a combination of a plurality of computer programs, which when run in the processor 10, can implement:

In particular, the specific implementation method of the processor 10 on the computer program may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

Further, the electronic device integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. The computer readable medium may be non-volatile or volatile. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

Embodiments of the present invention may also provide a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, may implement:

Further, the computer-usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

In the embodiments provided in the present invention, it should be understood that the disclosed electronic device, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A method for triad extraction in the insurance field, the method comprising:

adding a corresponding weight to the text vector by using a preset first self-attention mechanism to obtain a first weight text vector;

based on the first weight text vector, predicting the head and tail positions of a first entity in the insurance target text by using a preset first half pointer half annotation prediction model;

searching a first entity text vector and a first entity text vector position code according to the head-to-tail position of the first entity, the text vector and the text vector position code;

when the first entity, the corresponding relation of the first entity and the second entity can form a closed loop, generating a triplet of the insurance target text according to the corresponding relation of the first entity, the first entity and the second entity;

the encoding module in the preset Bert model is used for encoding the insurance target text to obtain a text vector and a text vector position code, and the encoding module comprises: extracting text features of the insurance target text, and carrying out word mixed coding on the text features to obtain a word vector sequence; performing position index coding on each text in the insurance target text to obtain text vector position coding; adding the word vector sequence and the text vector position codes to obtain text splicing vectors, and coding the text splicing vectors by using a coding module in a preset Bert model to obtain text vectors;

The predicting the head and tail positions of the first entity in the insurance target text by using a preset first half pointer half annotation prediction model based on the first weight text vector comprises the following steps: screening the head and tail positions of a first entity to be selected by using a starting pointer and an ending pointer in a preset first half pointer half-labeling prediction model; performing position coding on the first entity to be selected according to the head-to-tail position of the first entity to be selected to obtain a first entity to be selected position coding vector; performing point multiplication on the first entity position coding vector to be selected and the first weight text vector to obtain a first weighted summation vector; screening a first target weighted summation vector larger than a first preset value from the first weighted summation vector, and taking the head-tail position of a first entity to be selected corresponding to the first target weighted summation vector as the head-tail position of the first entity;

and predicting the corresponding relation of the first entity and the second entity by using a preset second half pointer half label prediction model according to the second splicing vector, wherein the method comprises the following steps: screening the head and tail positions of a second entity to be selected from the first entity by utilizing a starting pointer and an ending pointer in a preset second half pointer half-labeling prediction model; performing position coding on the second entity to be selected according to the head and tail positions of the second entity to be selected to obtain a second entity to be selected position coding vector; performing dot multiplication on the second entity position coding vector to be selected and the second weight text vector to obtain a second weighted summation vector; screening a second target weighted sum vector larger than a second preset value from the second weighted sum vector, and taking the head and tail positions of a second entity to be selected corresponding to the second target weighted sum vector as the head and tail positions of the second entity; inquiring the insurance target text according to the head-to-tail position of the second entity to obtain the second entity; performing entity classification on the second entity according to a preset entity class comparison table to obtain a second entity class; predicting the corresponding relation of the first entity according to a predetermined first entity class and the second entity class;

The encoding the first entity text vector by using a preset bidirectional long-short-time memory network to obtain a first entity encoding vector comprises the following steps: acquiring the number of the first entity text vectors, selecting preset bidirectional long and short time memory units which are the same as the number of the first entity text vectors, and splicing all the preset bidirectional long and short time memory units to obtain a target bidirectional long and short time memory network; performing dimension reduction on the first entity text vector by utilizing an embedded layer in the target bidirectional long-short-term memory network to obtain a first entity text dimension reduction vector; and respectively inputting the first entity text dimension-reducing vector into one of the forward and reverse long-short-time memory network units of the network layer in each layer of the target bidirectional long-short-time memory network, and splicing the output results of the forward and reverse long-short-time memory network units of the network layer by utilizing the connecting layer in the target bidirectional long-short-time memory network to obtain the first entity coding vector.

2. The insurance field triplet extraction method according to claim 1, wherein said performing word hybrid coding on the text feature to obtain a word vector sequence includes:

3. The insurance domain triplet extraction method of claim 1, wherein said first entity class is determined according to the steps of:

4. An insurance field triplet extraction device, characterized in that the device comprises:

the second entity prediction module is used for searching a first entity text vector and a first entity text vector position code according to the head-to-tail position of the first entity, the text vector and the text vector position code, utilizing a preset bidirectional long-short-time memory network to encode the first entity text vector to obtain a first entity encoding vector, splicing the first entity encoding vector and the first entity text vector position code to obtain a first splicing vector, utilizing a preset second self-attention mechanism to add corresponding weights to the text vector to obtain a second weight text vector, splicing the second weight text vector and the first splicing vector to obtain a second splicing vector, and predicting a corresponding relation of the first entity and the second entity by utilizing a preset second half-pointer semi-label prediction model according to the second splicing vector;

The triplet splicing module is used for generating a triplet of the target text insurance target text according to the first entity, the corresponding relation of the first entity and the second entity when the first entity, the corresponding relation of the first entity and the second entity can form a closed loop;

5. An electronic device, the electronic device comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores computer program instructions executable by the at least one processor to enable the at least one processor to perform the insurance domain triplet extraction method according to any one of claims 1 to 3.

6. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the insurance domain triplet extraction method according to any one of claims 1 to 3.