CN115114906A

CN115114906A - Method and device for extracting entity content, electronic equipment and storage medium

Info

Publication number: CN115114906A
Application number: CN202210435626.XA
Authority: CN
Inventors: 孟朋; 林卫亮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2022-09-27

Abstract

The application relates to the technical field of artificial intelligence, and discloses a method, a device, electronic equipment and a storage medium for extracting entity content, wherein the method comprises the following steps: acquiring grouping information of N entities; determining the extraction sequence corresponding to each group according to the group information; splicing the target text with the identification texts corresponding to the entities in the N entities to obtain an input text; coding the input text to obtain a coded hidden state sequence corresponding to the input text; based on a mask attention mechanism, carrying out M-path parallel decoding on a coded hidden state sequence corresponding to an input text according to an extraction sequence to obtain M-path output texts; the mask attention mechanism is used for performing mask processing on the attention scores of the non-relevant texts corresponding to the words to be decoded relative to the words to be decoded; according to the scheme, the entity contents of the plurality of entities are extracted in a serial-parallel mode, and the extraction efficiency of the entity contents is improved.

Description

Method and device for extracting entity content, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for extracting entity content, an electronic device, and a storage medium.

Background

Since there may be text with wrong or disordered information in an OCR text obtained by performing OCR (Optical Character Recognition) on characters in an image, in order to facilitate subsequent processing, the OCR text needs to be structured, that is, the OCR text is converted into a text with structural information, for example, into a form of Key-value Key value pair, and the process of structuring the OCR text is a process of extracting entity content of each entity, where Key may be regarded as an entity and value is entity content corresponding to the entity. Fig. 1 shows a schematic diagram of an OCR text at the upper side, in order to convert the OCR text into a form of a Key-value pair, a single word decoding manner in the related art, that is, a time step decoding obtains one word, and in the lower side of fig. 1, ": the "preceding part represents a Key, such as" name "or the like; ": the portion after "indicates Value.

Since the related art decodes one word at one time step in the process of extracting the entity content, which is equivalent to serial decoding, when the entity content of a plurality of entities needs to be extracted, the time taken for extracting the entity content in this way is longer, and the efficiency of extracting the entity content is lower.

Disclosure of Invention

In view of the foregoing problems, embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for extracting entity content, so as to improve the foregoing problems.

According to an aspect of an embodiment of the present application, there is provided an entity extraction method, including: acquiring grouping information of N entities, wherein the grouping information indicates that the N entities are divided into M groups and entities included in each group, M and N are positive integers, and M is more than or equal to 2 and is less than N; the sum of the text length thresholds corresponding to the entities belonging to the same group is not greater than the target length threshold; determining an extraction sequence corresponding to each of M paths of decoding according to the grouping information, wherein one path of decoding is used for extracting entity content corresponding to an entity included in one grouping; splicing the target text with the identification texts corresponding to the entities in the N entities to obtain an input text; coding an input text to obtain a coding hidden state sequence corresponding to the input text; and based on a mask attention mechanism, carrying out parallel decoding on the coded hidden state sequence corresponding to the input text according to the extraction sequence to obtain M paths of output texts, wherein one path of output text comprises entity contents corresponding to all entities belonging to one group in the target text, and the text length of the entity contents corresponding to each entity does not exceed the text length threshold corresponding to the entity.

According to an aspect of an embodiment of the present application, there is provided an apparatus for extracting entity content, including: an obtaining module, configured to obtain grouping information of N entities, where the grouping information indicates that the N entities are divided into M groups and entities included in each group, where M and N are positive integers, and M is greater than or equal to 2 and less than N; the sum of the text length thresholds corresponding to the entities belonging to the same group is not greater than the target length threshold; the sequence determining module is used for determining an extraction sequence corresponding to each decoding in the M decoding paths according to the grouping information, wherein one decoding path is used for extracting entity content corresponding to an entity included in one grouping; the splicing module is used for splicing the target text with the identification texts corresponding to the entities in the N entities to obtain an input text; the encoding module is used for encoding an input text to obtain an encoding hidden state sequence corresponding to the input text; and the parallel decoding module is used for performing parallel decoding on the coding hidden state sequence corresponding to the input text according to the extraction sequence based on a mask attention mechanism to obtain M paths of output texts, wherein one path of output text comprises entity contents corresponding to all entities belonging to one group in the target text, and the text length of the entity contents corresponding to each entity does not exceed the text length threshold corresponding to the entity.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement a method of extracting physical content as described above.

According to an aspect of the embodiments of the present application, there is provided a computer readable storage medium having stored thereon computer readable instructions, which, when executed by a processor, implement the method of extracting entity content as described above.

According to an aspect of embodiments of the present application, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement a method of extracting entity content as described above.

In the application, according to a text length threshold, dividing N entities into M groups, where N > M, it may be determined that at least one group including two entities exists, so that in the process of extracting entity content, decoding is performed in parallel for entities in different groups, and decoding is performed in series for entities in the same group, thereby realizing serial-parallel entity content extraction. Compared with the mode of serially extracting the entity content in the related technology, namely decoding one word at one time step, the method can decode M words at one time step, so that the time spent on extracting the entity content from a plurality of entities can be greatly shortened, and the efficiency of extracting the entity content is improved.

In addition, in the decoding process, a mask attention mechanism is adopted to mask the attention scores of the words to be decoded with the non-relevant texts corresponding to the words to be decoded, so that in the process of decoding and determining the words to be decoded, the feature information of the non-relevant texts corresponding to the words to be decoded is not concerned, and only the feature information of the relevant texts corresponding to the words to be decoded is concerned, so that the interdependence between the words can be avoided in the serial parallel decoding process, and the accuracy of the entity contents extracted by aiming at a plurality of entities is further ensured.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic diagram of structuring an OCR text in the related art.

Fig. 2 is a schematic diagram illustrating an application scenario to which the technical solution of the embodiment of the present application may be applied.

FIG. 3 is a flow diagram illustrating a method of extracting entity content according to one embodiment of the present application.

Fig. 4 illustrates a schematic view of a business license.

Fig. 5A-5C are schematic diagrams illustrating the application of attention in accordance with an embodiment of the present application.

FIG. 6 is a flowchart illustrating step 240 according to an embodiment of the present application.

FIG. 7 is a flowchart illustrating step 630 according to an embodiment of the present application.

FIG. 8 is a flowchart illustrating step 720 according to an embodiment of the present application.

Fig. 9 is a schematic diagram of a decoder subnetwork shown in accordance with an embodiment of the present application.

Fig. 10 is a schematic diagram of a structure of a sub-network of encoders according to an embodiment of the present application.

Fig. 11 is a flow chart illustrating extracting entity content according to an embodiment of the application.

Fig. 12 is a flowchart illustrating step 630 according to another embodiment of the present application.

Fig. 13 is a flow chart illustrating extracting entity content according to another embodiment of the present application.

Fig. 14 is a flow chart illustrating extracting entity content according to another embodiment of the present application.

Fig. 15 is a block diagram illustrating an apparatus for extracting entity content according to an embodiment of the present application.

FIG. 16 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

It should be noted that: reference herein to "a plurality" means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Fig. 2 is a schematic diagram illustrating an application scenario to which the technical solution of the embodiment of the present application may be applied. As shown in fig. 2, the application scenario includes a terminal 210 and a server 220, and the terminal 210 establishes a communication connection with the server 220 through a wired network or a wireless network. The terminal 210 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a self-service terminal, a vehicle-mounted terminal, and other electronic devices that can interact with a user, and is not limited in particular.

The server 220 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

The terminal 210 may display an interactive interface, based on which a user may trigger selection of a target text, and the terminal 210 extracts entity content corresponding to each of the N entities from the target text.

Further, based on the interactive interface in the terminal 210, the user may further select an entity that needs to extract the entity content from the target text, further set a text length threshold corresponding to each selected entity, and send content extraction configuration information to the server 220, where the content extraction configuration information is used to indicate an entity identifier (for example, an entity name of the entity) of the selected entity that needs to extract the entity content and the text length threshold set for the selected entity. On the basis, the server extracts entity content corresponding to each entity from the target text according to the method of the application by combining the content extraction configuration information.

Of course, in other embodiments, the content extraction configuration information may also be pre-stored in the server 220, so that after receiving the target text, the server 220 correspondingly extracts entity content corresponding to each entity from the target text according to the content extraction configuration information.

In other embodiments, the target text may also be pre-stored in the server 210, and a text length threshold corresponding to the entity to be subjected to the entity content extraction is pre-stored. The terminal 210 may send entity extraction indication information to the server 210, where the entity extraction indication information may include a text identifier corresponding to the target text and an entity identifier corresponding to an entity to be subjected to entity content extraction, so that the server 210 determines, according to the entity extraction prompt information, a text length threshold corresponding to the entity indicated by the entity extraction indication information and determines that the text identifier indicated by the entity extraction indication information determines the target text, and then extracts, based on the determined text length threshold and the target text corresponding to each entity, entity content corresponding to each entity from the target text according to the method of the present application.

In other embodiments, the method of the present application may be applied to perform a structured processing on an OCR text, and in this application scenario, the target text may be an OCR text obtained by performing OCR (Optical Character Recognition) on a text in an image. Because the obtained OCR text is a single line of characters, which text corresponds to which entity cannot be quickly determined, according to the method of the present application, an entity to be subjected to entity content extraction is specified, a text length threshold corresponding to the entity is set, and further, according to the specified multiple entities and the corresponding text length thresholds, the OCR text is processed according to the method of the present application, so as to obtain entity content corresponding to each entity, on the basis of which, the entity and the entity content corresponding to the entity are converted into a Key-Value pair form, where Key represents the entity and Value represents the entity content (which can also be understood as an attribute Value). In other words, by the method, the OCR text can be structured, and converted into a Key-Value pair form, so that subsequent data processing is facilitated.

It should be noted that the method of the present application is not limited to be executed by the server 220, and the method of the present application may also be executed by the terminal 210 with sufficient processing capability, and may also be executed interactively by a system formed by the server 220 and the terminal 210.

The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:

fig. 3 is a flowchart illustrating a method for extracting entity content according to an embodiment of the present application, which may be performed by an electronic device with processing capability, such as a server, a terminal, or a system of a server and a terminal, and is not limited in detail herein. Referring to fig. 3, the method includes at least steps 310 to 350, which are described in detail as follows:

step 310, obtaining grouping information of N entities, wherein the grouping information indicates that the N entities are divided into M groups and the entities in each group, M and N are positive integers, and M is more than or equal to 2 and is less than N; the sum of the text length thresholds corresponding to the entities belonging to the same group is not greater than the target length threshold.

The N entities are entities to be subjected to entity content extraction from the target text, and N is more than or equal to 3. The target text is the text from which the entity content to be extracted originates. In other words, the extracted entity content in the present disclosure is extracted from the target text. Therefore, any text that needs to extract entity content can be regarded as the target text in the present application. In the present application, extracting entity content corresponding to an entity from a target text may also be understood as extracting an attribute value corresponding to the entity from the target text.

In some application scenarios, the target text may be text resulting from OCR of the image. The image may be an image of a business license, an image of an invoice, an image of a train ticket, an image of a document (an image of an identification card, an image of a bank card), and the like. Fig. 4 is a schematic diagram illustrating an exemplary business license, and as shown in fig. 4, the unified social credit code, license number, name, type, legal representative, scope of administration, registered capital, date of fulfillment, business period, residence, registration authority, etc. shown in the business license may be regarded as an entity, and what is shown after the entity is the entity content corresponding to the entity.

In other embodiments, the target text may also be any given piece of text, for example, the target text may be introduction text, and after receiving the question text, the identification text that characterizes the entity being asked is extracted from the question text, and then the entity content of the entity indicated by the question text may be extracted from the introduction text according to the method of the present application. For example, if the target text is an introduction text of a scenic spot, if three question texts are received, where an entity indicated by one question text is the park closing time of the scenic spot, an entity indicated by another question text is the open time of the scenic spot, and an entity indicated by another question text is the build time of the scenic spot, for the three entities, entity contents corresponding to the three entities can be extracted from the introduction text of the scenic spot according to the method of the present application.

Therefore, the method can provide a scene of the automatic question answering service for multiple users, so that answers of multiple questions can be provided at one time, and the efficiency of the automatic text question answering service is improved. In this case, the N entities to be subjected to entity content extraction may be extracted from keywords in multiple question texts, and of course, may also be determined by user-triggered selection.

In different application scenarios, there may be differences between entities that need to extract entity content from the target text, and therefore, a plurality of entities that need to extract entity content may be preset, or a plurality of entities that need to extract entity content may be determined in real time according to a received question.

The text length threshold corresponding to the entity is used for limiting the maximum length of the corresponding entity content, namely the length of the entity content extracted from the target text does not exceed the corresponding text length threshold.

It is understood that the text length thresholds corresponding to different entities may be the same or different, and may be specifically set according to actual needs, for example, if the entity extraction is performed from the OCR recognition texts corresponding to the license, the text length thresholds respectively set for the legal representative and for the unified social credit code may be different.

In some embodiments, prior to step 310, the method further comprises: acquiring text length thresholds respectively set for N entities; taking the maximum value of the text length threshold values corresponding to the N entities as a target length threshold value; and dividing the N entities into M groups according to the text length threshold and the target length threshold corresponding to each entity to obtain group information.

The entity to be subjected to entity content extraction from the target text can be selected by the user, and similarly, the text length threshold corresponding to the entity can also be set by the user in a self-defined manner. In other embodiments, the entity to be subjected to entity content extraction may also be preset and stored, and the text length threshold corresponding to the entity may also be preset and stored.

The text length threshold corresponding to the entity may be set according to actual needs. After the text length threshold corresponding to each entity in the N entities is obtained, grouping the N entities based on the text length threshold to obtain grouping information.

The process of grouping entities is described below in conjunction with a specific embodiment. Table 1 shows text length thresholds set for entities in OCR recognition results of images of business licenses. Based on the text length threshold corresponding to each entity in table 1, it can be seen that, if the maximum length threshold is 100, the target length threshold can be determined to be 100. On this basis, the 10 entities in table 1 may be divided into 3 groups based on the target text length threshold and the text length threshold corresponding to each entity shown in table 1.

TABLE 1

Table 2 shows a grouping result of grouping the 10 entities in table 1 according to a target length threshold of 100, as shown in table 2, the 10 entities in table 1 are divided into 3 groups, where the entities in group 1 include: unifying social credit codes, certificate numbers, names, types, legal representatives and registering capital; the entities in group 2 include a business segment; the entities in group 3 include establishment date, business term, and residence.

TABLE 2

In some embodiments, in the grouping process, the grouping may be performed with a target length threshold as a target, that is, the sum of text length thresholds corresponding to entities in the same group is as close as possible to the target length threshold, so that the number of groups may be reduced.

In other embodiments, grouping may be targeted to ensure that the sum of the total number of groupings M and the text length threshold corresponding to an entity in the same grouping does not exceed the target length threshold, given the target length threshold and the total number of groupings M.

It can be understood that N entities are grouped based on the text length threshold corresponding to each of the N entities, so that the sum of the text length thresholds corresponding to the entities in the same group is not greater than the target length threshold, and the obtained grouping result may not be unique.

In other embodiments, a result of adding the maximum text length threshold to the set length value may be used as a target length threshold on the basis of a maximum value among length thresholds corresponding to the N entities, and a certain length space may be reserved for entity content extraction by adding the maximum text length threshold to the specified length value.

In some embodiments, a mapping relationship between a text type and content extraction indication information may be preset, where the content extraction indication information corresponding to a text type is used to indicate an entity identifier of an entity that needs to perform entity content extraction and a text length threshold corresponding to each entity, where the text type may be set according to actual needs, for example, the text type may include a license OCR text, an invoice OCR text, a train ticket OCR text, an identification card OCR text, a scenic spot introduction text, a business introduction text, and the like, and is not specifically limited herein. On the basis, the text type to which the target text belongs can be selected by the user, so that the content extraction indication information corresponding to the text type selected by the user can be determined according to the text type selected by the user and the mapping relation between the text type and the content extraction indication information, and entity grouping is carried out according to the determined content extraction indication information.

In other embodiments, a mapping relationship between a text type and grouping information may also be preset, where the grouping information corresponding to a text type indicates a grouping result of an entity for which entity content extraction is required for a target text of the text type. On this basis, after the text type to which the target text belongs is determined, the grouping information corresponding to the text type to which the target text belongs can be correspondingly obtained.

In the present application, since N entities are divided into M groups, where M < N, it indicates that there is at least one group, and the group chinese includes two entities. In this application, an entity belongs to only one group.

Step 320, determining the corresponding extraction sequence of each group according to the grouping information.

The extraction sequence corresponding to one path of decoding is used for indicating the extraction sequence of all the entities in the packet corresponding to the path of decoding. For example, if the entities in a group include an establishment date, an expiration date, and a residence, the first-come-last extraction sequence corresponding to the group can be set as: establishment date → business term → residence.

When a plurality of entities are included in one packet, the extraction order corresponding to the packet may be specified according to actual needs, and is not specifically limited herein, or may be randomly determined, and is not specifically limited herein.

And step 330, splicing the target text with the identification texts corresponding to the entities in the N entities to obtain an input text.

The identification text corresponding to the entity may be an entity name of the entity. In the method, the target text is spliced with the identification texts corresponding to the N entities, and the entity needing entity content extraction is indicated by using the identification text corresponding to the entity, so that the entity content corresponding to the entity can be extracted in a targeted manner in the subsequent decoding process.

In some embodiments, the target text and the identification texts corresponding to the entities may be spliced according to the grouping results of the N entities, specifically, in the process of splicing the target text and the identification texts corresponding to the entities, the identification texts corresponding to the entities belonging to the same group may be spliced adjacently, that is, the identification texts corresponding to the entities of the same group are adjacent in position in the input text. For example, if the target text is XXXX, the entities in one group include dates of formation, business dates, residences, and another group includes business areas, the input text may be: XXXX standing date and business term house business range, it can be seen that in the input text, "standing date, business term, house" the entity names of the three entities are located adjacently.

In other embodiments, the identification texts corresponding to the entities in the same group in the input text may not be adjacent to each other, and the specific splicing position may be set according to actual needs.

And 340, coding the input text to obtain a coded hidden state sequence corresponding to the input text.

In some embodiments, the input text may be encoded by an encoder network to obtain an encoded hidden state sequence corresponding to the input text, where the encoder network may be constructed by at least one of a convolutional neural network, a cyclic neural network, a fully-connected neural network, a feed-forward neural network, and the like. In other embodiments, the encoder network may be an encoder in a transform model (Transformer network model).

The encoding hidden state sequence is a vectorized description of the text features of the input text, namely the encoding hidden state sequence indicates the text features of the input text, and the text features of the input text at least comprise semantic features of the input text.

And 350, performing M-path parallel decoding on the coded hidden state sequence corresponding to the input text according to the extraction sequence based on a mask attention mechanism to obtain M-path output texts, wherein one path of the output texts comprises entity contents corresponding to all entities belonging to one group in the target text, and the text length of the entity contents corresponding to each entity does not exceed the text length threshold corresponding to the entity.

The mask attention mechanism is used for performing mask processing on the attention scores of the non-related texts corresponding to the words to be decoded relative to the words to be decoded, the non-related texts corresponding to the words to be decoded comprise identification texts corresponding to other entities except a target entity and decoded words corresponding to other entities except the target entity, and the target entity refers to the entity corresponding to the words to be decoded.

Specifically, the encoded hidden state sequence corresponding to the input text may be decoded in parallel by a decoder network based on a mask attention mechanism. The decoder network may be constructed by one or more neural networks, such as a convolutional neural network, a cyclic neural network, a fully-connected neural network, a (unidirectional, bidirectional) long-term memory network, a feed-forward neural network, a pooled neural network, and the like, which are not limited in particular. In one embodiment, the decoder network may be a decoder in a transform model (Transformer network model).

In the decoding process, decoding is performed time step by time step, and M-way decoding is performed in parallel, that is, at each time step, decoding is performed in parallel for one word in each of M entities (hereinafter, target entities). In this application, the word to be decoded at present is referred to as the word to be decoded, and therefore, at each time step, the number of the word to be decoded is M, and different words to be decoded corresponding to the same time step correspond to different entities. Each time step can be decoded to obtain the feature information (i.e. the decoding hidden state vector in the following) corresponding to M words to be decoded. And because the extraction sequence of the entity content corresponding to the entity in the same group is predetermined, in the decoding process, decoding output is carried out according to the extraction sequence of the corresponding group, and the decoded output text of the path corresponding to each group is obtained. And then correspondingly acquiring entity contents corresponding to the entities in the group corresponding to the output text from the output text according to the determined extraction sequence and the text length threshold corresponding to the entities.

Masking is to mask certain values so that they do not have an effect when updating parameters, and thus, it is equivalent to ignoring the existence of the masked positions. In the method, at each time step, the attention score of the non-relevant text corresponding to each word to be decoded is masked on the word to be decoded, so that the word to be decoded is determined without depending on the characteristic information of the non-relevant text corresponding to the word to be decoded, and the word to be decoded is determined depending on the characteristic information of the relevant text corresponding to the word to be decoded, and therefore, the influence of the dependence between the words on the accuracy of parallel decoding in the parallel decoding process can be avoided.

For example, if a group includes entity 1, entity 2, and entity 3, and the extraction sequence corresponding to the group is entity 1 → entity 2 → entity 3, where the text length threshold corresponding to entity 1 is 2, the text length threshold corresponding to entity 2 is 3, and the text length threshold corresponding to entity 3 is 2, where the target length threshold is 7, then a path of output text corresponding to the group includes 10 words, and it is assumed that the path of output text is: Q1Q2Q3Q4Q5Q6Q7, (wherein Qi represents the ith word in the corresponding output text), according to the extraction order corresponding to the group and the text length threshold corresponding to each entity, firstly extracting the text length threshold (namely 2) words corresponding to the entity 1 from the output text of the path, namely obtaining Q1Q2, and then Q1Q2 is the entity content corresponding to the entity 1; similarly, then, extracting a text length threshold (namely 3) words corresponding to the entity 2 from the remaining texts in the output text to obtain Q3Q4Q5, and then Q3Q4Q5 is the entity content corresponding to the entity 2; and finally, taking the remaining text in the path of output text as the entity content corresponding to the entity 3.

In the application, through a mask attention mechanism, the irrelevant text corresponding to the word to be decoded is masked relative to the attention score of the word to be decoded, so that in the process of decoding the word to be decoded, information of words in the irrelevant text corresponding to the word to be decoded is not concerned, only the relevant text corresponding to the word to be decoded is concerned, further, the influence of the irrelevant text corresponding to the word to be decoded on the decoding accuracy of the word to be decoded is avoided, and the accuracy of the extracted entity content is ensured. It can be understood that, since the different words to be decoded at each time step belong to different entities, the non-relevant text corresponding to the different words to be decoded at each time step is also different.

For a word to be decoded, the relevant text corresponding to the word to be decoded includes the target text, the identification text of the entity corresponding to the word to be decoded, and the decoded word for the entity corresponding to the word to be decoded.

In a specific embodiment, the non-relevant text corresponding to each word to be decoded can be determined according to the word to be decoded at each time step. And firstly calculating the attention score of the input information of each time step to each word to be decoded by an attention mechanism, then processing the attention score mask of the non-relevant text corresponding to the word to be decoded according to the non-relevant text corresponding to the word to be decoded at the current time step, and reserving the attention score of the relevant text corresponding to the word to be decoded. Specifically, the attention score of each word to be decoded in the input information of each time step can be determined through a single-head attention mechanism or a multi-head attention mechanism.

The input information of each time step comprises a coding hidden state sequence corresponding to the input text (the coding hidden state sequence reflects the semantic information of the target text and the semantic information of the identification text corresponding to each entity), and the decoding input information of the current time step at least comprises a decoding hidden state vector of a decoded word determined by decoding before the current time step. It is understood that, in the decoding process, after determining a word of a decoding hidden state vector, the decoding hidden state vector may be classified to determine the word corresponding to the decoding hidden state vector, so that, if a decoding hidden state vector is determined, it is equivalent to the word represented by the decoding hidden state vector being determined correspondingly.

In the application, according to a text length threshold, dividing N entities into M groups, where N > M, it may be determined that at least one group including two entities exists, so that in the process of extracting entity content, decoding is performed in parallel for entities in different groups, and decoding is performed in series for entities in the same group, thereby realizing serial-parallel entity content extraction. Compared with the mode of serially extracting the entity content and decoding one word at one time step in the related technology, the method can greatly shorten the time spent on extracting the entity content aiming at a plurality of entities, thereby improving the efficiency of extracting the entity content.

There is also a way of extracting entity content in the related art, that is: the N entities are divided into N groups, N paths of decoding are carried out in the decoding process, the step number of one path of decoding is determined by the maximum text length threshold value in the N paths of decoding, the required decoding step number is the same as the step number of the path of decoding corresponding to the entity corresponding to the maximum text length threshold value for the decoding of the path corresponding to the entity of which the text length threshold value is smaller than the maximum text length threshold value, and in the process, except for the decoding of the path corresponding to the entity corresponding to the maximum text length threshold value, invalid decoding processes, namely redundancy calculation processes, exist in the decoding of other paths. If two entities with larger difference of text length threshold values exist in the N entities, invalid decoding is performed frequently, and thus, computing resources are wasted.

For example, for 10 entities in table 1, the maximum text length threshold is 100, and if 10 entities are divided into 10 packets (i.e., one entity is a packet) for parallel decoding, 10 words are decoded at each time step, and 100 time steps are decoded, the entity content corresponding to 10 entities in table 1 can be obtained. However, if the method of the present application is adopted, 10 entities are divided into 3 groups, and thus, each time step decodes 3 words and 100 time steps, it can be seen that although the total number of time steps required to be spent in the decoding process is the same, the number of words decoded in each time step is reduced, and thus, the probability of invalid decoding can be reduced, the redundant computation amount is reduced, and the computation amount is saved.

The method is applied to the situation that entities with large text length threshold difference exist in a plurality of entities, the N entities are divided into M groups based on the text length threshold, and the difference value between the sum of the text lengths of the entity contents corresponding to all the entities in the same group and the target length threshold is smaller, so that the calculation amount of entity content extraction can be greatly reduced, and the redundant calculation amount is reduced.

Fig. 5A-5C are schematic diagrams of attention in extracting entity content from a target text according to an embodiment of the application. In this embodiment, the entities to be subjected to entity content extraction include 4 entities (entities K0, K10, K11, and K12), and if the text length threshold corresponding to the entity K0 is set to 10, the text length threshold corresponding to the entity K10 is 2, the text length threshold corresponding to the entity K11 is 4, and the text length threshold corresponding to the entity K12 is 3; the 4 entities can be divided into two groups according to the text length threshold corresponding to each entity, and the group 1 includes one entity, namely the entity K0; packet 2 includes two entities, entity K10, entity K11, and entity K12. If the first-to-last extraction order corresponding to the group 2 is set to be the entity K10 → the entity K11 → the entity K12.

In this embodiment, since the number of packets is 2 (i.e. M is 2), in this embodiment, each time step decoding obtains decoding hidden state vectors corresponding to two words to be decoded. Thus, to ensure that one time step decoding results in two decoded hidden state vectors instead of one, at each time step, two placeholders are input to the decoder network. Suppose that when t is 1, it is the first time step of the decoding process.

As shown in fig. 5A, at time step 1 (i.e., t ═ 1), two placeholders (i.e., mask01 and mask11) are input to the decoder network, and these two placeholders are used to indicate that two words of information (decoding hidden state vectors) need to be decoded at this time step, and are also equivalent to indicating that the decoder network performs two-way decoding in parallel. According to the extraction sequence corresponding to the group 1 and the extraction sequence corresponding to the group 2, at the 1 st time step, the mask01 corresponds to the first character of the entity content corresponding to the entity K0, and the mask11 corresponds to the first character of the entity content corresponding to the entity K10.

In this case, there is no decoded word for each target entity, since it is the first time step. Therefore, for the entity K0 as the target entity, at the 1 st time step, the non-relevant text corresponding to the entity K0 includes the identification text corresponding to the entity K10, the identification text corresponding to the entity K11, and the identification text corresponding to the entity K12, the attention score of the non-relevant text corresponding to the entity K0 to the mask01 at the 1 st time step is masked, and only the attention scores of the target text and the identification text corresponding to the entity K0 to the mask01 are retained.

Similarly, at the 1 st time step, for the entity K10 as the target entity, at the 1 st time step, the non-relevant text corresponding to the entity K10 includes the identification text corresponding to the entity K0, the identification text corresponding to the entity K11, and the identification text corresponding to the entity K12. Masking the attention score of mask11 by the non-relevant text corresponding to the entity K10 at the 1 st time step, and only keeping the attention score of the target text and the identification text corresponding to the entity K10 to the mask 11. At time step 1, the schematic diagram of the object with attention to the placeholders mask01 and mask11 is shown by the arrows in fig. 5A, and the portion of fig. 5A without arrows indicates that attention to the placeholders by the corresponding text is not paid attention to.

By decoding at time step t ═ 1, the decoded hidden state vector O01 of the first character in the entity content corresponding to the entity K0 and the decoded hidden state vector O11 of the first character in the entity content corresponding to the entity K10 can be obtained.

Thereafter, at the 2 nd time step (i.e., when t is 2), the two decoded hidden state vectors (O01 and O11) obtained at the 1 st time step and two placeholders (mask02 and mask12) are input to the decoder network. Similarly, the placeholders mask02 and mask12 indicate that two decoding paths need to be decoded in parallel at the time step decoder network, and two decoding hidden state vectors are output.

At the 2 nd time step, the irrelevant text corresponding to the entity K0 includes the identification text corresponding to the entity K10, the identification text corresponding to the entity K11, the identification text corresponding to the entity K12, and the decoded hidden state vector O11 of the first character corresponding to the entity K10, then the identification text corresponding to the entity K10, the identification text corresponding to the entity K11, the identification text corresponding to the entity K12, and the decoded hidden state vector O11 of the first character corresponding to the entity K10 are masked to the attention score of the mask02, and only the attention scores of the target text, the identification text corresponding to the entity K0, and the decoded character corresponding to the entity K0, which are the decoded hidden state vector O01 to the mask02, and the action diagram of the corresponding attention scores is shown in fig. 5B.

Similarly, at the 2 nd time step, the irrelevant text corresponding to the entity K10 includes the identification text corresponding to the entity K0, the identification text corresponding to the entity K11, the identification text corresponding to the entity K12, and the decoded hidden state vector O01 of the first character corresponding to the entity K0, the identification text corresponding to the entity K0, the identification text corresponding to the entity K11, the identification text corresponding to the entity K12, and the decoded hidden state vector O01 of the first character corresponding to the entity K0 are masked off the attention score of the mask12, and only the attention scores of the target text, the identification text corresponding to the entity K10, and the decoded hidden state vector O11 of the decoded word corresponding to the entity K10 to the mask12 are retained, and a schematic diagram of the effect of the corresponding attention scores is shown in fig. 5B.

By decoding at the 2 nd time step, the decoded hidden state vector O02 of the 2 nd word in the entity content corresponding to the entity K0 and the decoded hidden state vector O12 of the 2 nd word in the entity content corresponding to the entity K10 can be obtained.

Thereafter, at the 3 rd time step (i.e., when t is 3), since the decoding hidden state vectors corresponding to the two words in the entity content corresponding to the entity K10 are output at the 1 st time step and the 2 nd time step, respectively, and the text length threshold corresponding to the entity K10 is 2, the entity K11 starts to be extracted at the 3 rd time step according to the extraction order corresponding to the group 2, corresponding to the fact that the entity K10 has already been decoded. Similarly, since the text length threshold corresponding to the entity K0 is 10, at the 3 rd time step, the entity content corresponding to the entity K0 is continuously extracted for the group 1.

At time step 3, four decoded hidden vectors (i.e., O01, O11, O02, and O12) obtained at time step 1 and time step 2, and two placeholders (mask03 and mask13) are input to the decoder network.

At time step 3, for the entity K0, the non-relevant text corresponding to the entity K0 includes the identification text corresponding to the entity K10, the identification text corresponding to the entity K11, the identification text corresponding to the entity K12, and the decoded words corresponding to the entity K10 (i.e., the 1 st word and the 2 nd word in the entity content corresponding to the entity K10, and of course, the two words are represented by the decoding hidden state vectors (O11, O12) of the two children during the decoding process). Then the attention score mask of mask03 by the identification text corresponding to entity K10, the identification text corresponding to entity K11, the identification text corresponding to entity K12 and the decoded word corresponding to entity K10 is removed, and only the attention score of mask03 by the target text, the identification text corresponding to entity K0 and the decoded word corresponding to entity K0 (similarly, the decoded word corresponding to entity K0 at time step 2 is represented by decoding hidden state vectors O01 and O01) is retained, and a schematic diagram of the effect of the corresponding attention score is shown in fig. 5C.

Similarly, at time step 3, for the entity K11, the irrelevant text corresponding to the entity K11 includes the identification text corresponding to the entity K0, the identification text corresponding to the entity K10, the identification text corresponding to the entity K12, the decoded word corresponding to the entity K10, and the decoded word corresponding to the entity K0, so that the identification text corresponding to the entity K0, the identification text corresponding to the entity K10, the identification text corresponding to the entity K12, the decoded word corresponding to the entity K10, and the decoded word corresponding to the entity K0 are masked off the attention score of the placeholder mask13, only the attention scores of the target text, the identification text corresponding to the entity K11, and the attention scores of the placeholder mask13 are retained, and a schematic diagram of the functions of the corresponding attention scores is shown in fig. 5C. The decoding process at the subsequent time step is similar to the above process and will not be described herein.

Through the process, the decoding hidden state vectors of a plurality of words can be obtained by decoding at each time step, and the dependency of the irrelevant copies of the words to be decoded on the words to be decoded is avoided through a mask attention mechanism, so that the accuracy and precision of decoding are ensured, and the accuracy of the entity contents extracted by each entity is ensured. Moreover, because parallel decoding is performed at each time step, decoding at each time step obtains a decoding hidden state vector of a plurality of words, compared with decoding one word at each time step, the method of the application greatly improves the decoding speed, in other words, improves the speed of extracting the entity content.

In some embodiments, as shown in fig. 6, step 240, comprises:

step 610, determining target entities respectively corresponding to the M words to be decoded at the t-th time step according to the extraction sequence, wherein different words to be decoded correspond to different target entities; wherein t is a positive integer.

As described above, M decoding paths are divided into M paths for parallel decoding, so that at each time step, one path of decoding obtains decoding of one word in the content of the entity corresponding to one entity, and therefore, in the present scheme, M words to be decoded can be obtained by decoding at the t-th time step, where the M words to be decoded correspond to different entities.

The word to be decoded at the tth time step is a word corresponding to the hidden state vector to be decoded at the tth time step. For example, if it is required to decode at the t-th time step to obtain a decoding hidden state vector corresponding to the 3 rd word in the entity content corresponding to the entity 1, the 3 rd word in the entity content corresponding to the entity 1 is a word to be decoded at the t-th time step. In this application, for convenience of description, an entity corresponding to a word to be decoded at present is referred to as a target entity, and it is understood that, at different time steps, target entities for the same packet may be the same or different.

In some embodiments, prior to step 610, the method further comprises: determining a time step interval according to a target length threshold; the time step interval is 1-Q, and Q is a target length threshold; dividing the time step interval into a target number of time step subintervals according to the target number of entities included in a group and a text length threshold value corresponding to the entities included in the group and an extraction sequence corresponding to the group, wherein one time step subinterval corresponds to one entity in the corresponding group, and the interval lengths of the time step subintervals corresponding to other entities in one group except the last entity in the extraction sequence are the corresponding text length threshold values; in this embodiment, step 610 includes: for each group in the M groups, in the time step subinterval corresponding to the group, the time step subinterval where t is located is determined, and the entity corresponding to the time step subinterval where t is located is taken as the target entity of the corresponding group at the t-th time step.

In this embodiment, the time step intervals corresponding to different packets are the same and all are 1 to Q. It can be understood that, since the sum of the text length thresholds corresponding to all entities in a group may be smaller than the target length threshold, in this case, according to the extraction sequence corresponding to the group, the interval lengths of the time step sub-intervals corresponding to other entities except the last entity in the extraction sequence are equal to the corresponding text length threshold, and the interval length of the time step sub-interval corresponding to the last entity in the extraction sequence is not smaller than the text length threshold corresponding to the entity. It is understood that the time step subintervals divided for the time step interval corresponding to a packet are consecutive.

Continuing with the example of the corresponding embodiment of fig. 5A-5C, the target length threshold Q is 10, and the 4 entities are divided into two groups, where group 1 includes one entity, i.e., entity K0 (whose corresponding text length threshold is 10); group 2 includes two entities, namely entity K10 (which corresponds to a text length threshold of 2), entity K11 (which corresponds to a text length threshold of 4), and entity K12 (which corresponds to a text length threshold of 3). The first-to-last extraction order corresponding to group 2 is entity K10 → entity K11 → entity K12.

On the basis, the time step interval corresponding to the group 1 is determined to be 1-10, and similarly, the time step interval corresponding to the group 2 is also determined to be 1-10. Since the packet 1 includes only 1 entity, the time step interval corresponding to the packet 1 does not need to be divided.

For the group 2, the corresponding extraction sequence is entity K10 → entity K11 → entity K12, and then time step intervals 1 to 10 are divided into 3 time step sub-intervals according to the text length threshold corresponding to entity K10, the text length threshold corresponding to entity K11, and the text length threshold corresponding to entity K12, respectively, time step sub-intervals 1 to 2 corresponding to entity K10, time step sub-intervals 3 to 6 corresponding to entity K11, and time step sub-intervals 7 to 10 corresponding to entity K12, and it can be seen that the last entity in the extraction sequence corresponding to the group 2, namely, entity K12, has an interval length of 4 corresponding to time step sub-intervals 7 to 10, and an interval length greater than the text length threshold 3 corresponding to entity K12.

On the basis of the time step subinterval division, for the grouping 1, the words to be decoded at the 1 st to 10 th time steps are all the words in the entity content corresponding to the entity K0; for the packet 2, the word to be decoded at the 1 st time step and the word to be decoded at the 2 nd time step are words in the entity content corresponding to the entity K10, the words to be decoded at the 3 rd, 4 th, 5 th and 6 th time steps are words in the entity content corresponding to the entity K11, and the words to be decoded at the 7 th, 8 th and 9 th time steps are words in the entity content corresponding to the entity K12.

And step 620, splicing the embedded vectors corresponding to the M placeholders with the decoded hidden state vector before the tth time step to obtain an input sequence corresponding to the decoder network at the tth time step.

Since target entities corresponding to decoded words at different time steps in different packets are different, in the present application, the decoded hidden state vectors before the t-th time step include not only decoded hidden state vectors corresponding to M target entities at the t-th time step but also decoded hidden state vectors corresponding to other entities than the M target entities before the t-th time step. The placeholder corresponding embedded vector is a vectorized representation of the placeholder.

For example, in the corresponding implementation of fig. 5C, the target entities at time step 3 include entity K0 and entity K11, and the decoded hidden state vectors before time step 3 include decoded hidden state vectors O01 and O02 obtained for entity K0 and decoded hidden state vectors O11 and O12 obtained for entity K10. As above, M placeholders are used to indicate that decoding is required at this time step to obtain M decoded hidden state vectors. In a specific embodiment, the placeholder may be embedded to obtain an encoding vector corresponding to the placeholder.

Step 630, the decoder network performs parallel decoding by using the input sequence corresponding to the t-th time step and the coded hidden state sequence corresponding to the input text based on a mask attention mechanism to obtain a decoded hidden state sequence at the t-th time step, where the decoded hidden state sequence includes M decoded hidden state vectors, and one decoded hidden state vector is used to determine a word to be decoded.

The input sequence corresponding to the t time step indicates the decoded hidden state vector corresponding to the decoded word before the t time step and the information of the M placeholders input at the t time step.

In step 630, after determining the attention scores of the input sequence corresponding to the t-th time step and the coded hidden state sequence corresponding to the input text for each placeholder by using the input sequence corresponding to the t-th time step and the coded hidden state sequence corresponding to the input text, since one placeholder represents a current word to be decoded, then correspondingly determining the non-relevant text corresponding to the word to be decoded at the t-th time step, masking the attention scores of the corresponding placeholders by the non-relevant text, and then determining the decoded hidden state vectors corresponding to the M words to be decoded at the t-th time step by using the attention scores that are not masked at the t-th time step.

And repeating the process, and if the number of the decoding hidden state vectors obtained by one-way decoding is the target length threshold value, or the decoding hidden state vectors representing the end characters are obtained by decoding in all-way decoding, then further calling a decoder network to continue decoding is not needed.

And step 640, combining words corresponding to the decoded hidden state vectors obtained by decoding the same path according to the time sequence to obtain M paths of output texts.

In a specific embodiment, after each decoding hidden state vector is obtained, the decoding hidden state vector may be subjected to linear transformation through a full connection layer, and then the vector obtained by the linear transformation is classified through a classification function in a classification layer, so as to determine a word corresponding to the decoding hidden state vector. For all decoding hidden state vectors in one decoding path, the processing is performed according to the time sequence of the decoding hidden state vectors (namely, the sequence of the output of the decoding hidden state vectors in the same decoding path from first to last), and one output text corresponding to one decoding path can be correspondingly determined.

In some embodiments, the decoder network includes a cascade of a level 1 decoder subnetwork to a level P decoder subnetwork, P being a positive integer; in this embodiment, as shown in fig. 7, step 630 includes:

step 710, acquiring a K-level decoding input sequence of the K-level decoder subnetwork at the t time step; wherein K is a positive integer, and K is more than or equal to 1 and less than or equal to P; when K is 1, the K-level decoded input sequence at the t-th time step is a sequence obtained by processing the input sequence corresponding to the t-th time step; when K > 1, the K-level decoded input sequence at the t-th time step is the K-1-level decoded output sequence output by the K-1-level decoder subnetwork at the t-th time step.

And 720, performing parallel decoding by the K-th decoder subnetwork based on a mask attention mechanism according to the K-th decoding input sequence at the t-th time step and the coding hidden state sequence corresponding to the input text to obtain a K-th decoding output sequence of the K-th decoder subnetwork at the t-th time step.

And step 730, if K is less than P, taking the K-level decoding output sequence of the K-level decoder sub-network at the t-th time step as a K + 1-level decoding input sequence corresponding to the K + 1-level decoder sub-network at the t-th time step, and decoding by the K + 1-level decoder sub-network.

In step 740, if K is equal to P, the K-level decoding output sequence of the K-th decoder subnetwork at the t-th time step is regarded as the decoding hidden state sequence at the t-th time step.

The network structure of the decoder subnetworks of different levels from the level 1 decoder subnetwork to the level P decoder subnetwork may be the same or different, and is not particularly limited herein. In one embodiment, each level of decoder sub-network in the decoder network may be a decoder in a transform model (converter network model).

The input sequence corresponding to the t-th time step may be processed by performing Position coding and Segment coding on the embedded vector of each placeholder at the t-th time step and the decoded hidden state vector before the t-th time step, and then combining and generating a level 1 decoded input sequence according to the embedded vector, Position coding (Position encoding) and Segment coding (Segment encoding) of each placeholder and the Position coding and Segment coding corresponding to the decoded hidden state vector and the decoded hidden state vector. The position code is used for the position in the input sequence corresponding to the t time step, and the segment code is used for indicating whether the input sequence belongs to the entity content to be output or not. In this application, since both the decoded hidden-state vector and the embedded vector of the placeholder are used to represent words in the entity content, at the t-th time step, the segment code corresponding to the decoded hidden-state vector and the segment code corresponding to the embedded vector of the placeholder are the same and both indicate codes belonging to the entity content to be output.

In this embodiment, for convenience of description, a decoded input sequence input to the K-th-level decoder subnetwork at the t-th time step is referred to as a K-level decoded input sequence, and a decoded output sequence output by the K-th-level decoder subnetwork at the t-th time step is referred to as a K-level decoded output sequence.

The kth-level decoder subnetwork may determine, according to an attention mechanism, an attention matrix according to the K-level decoded input sequence at the t-th time step and the coded hidden state sequence corresponding to the input text, where the attention matrix is used to indicate attention of existing information to a current word to be decoded, where the existing information includes a target text, an identification text corresponding to each of the N entities, and an obtained decoded hidden state vector before the t-th time step. It will be appreciated that in each level of decoder subnetwork, the target text and the corresponding text-identifying information of the N entities are represented by the encoded hidden state sequence of the input text.

Then, according to the M words to be decoded corresponding to the time step t, determining the irrelevant text corresponding to each word to be decoded, and further, correspondingly determining the attention score of each word to be decoded in the irrelevant text as the attention score which needs to be masked. And then, based on the determined attention scores needing to be masked, masking the attention at the corresponding positions in the attention matrix, so that the K-th decoder sub-network does not pay attention to the attention scores of the words to be decoded in the non-relevant texts corresponding to the words to be decoded in the subsequent processing process.

Decoding each level of decoder sub-network step by step according to the process to obtain a decoding output sequence output by the decoder sub-network corresponding to the level, if the current output decoding output sequence is not the last decoder sub-network in the decoder network, continuously inputting the decoding output sequence into the next decoder sub-network, and continuously performing decoding processing by the next decoder sub-network according to the input decoding output sequence and the coding hidden state sequence corresponding to the input text; on the contrary, if the current output decoding output sequence is the last decoder subnetwork in the decoder network, the decoding output sequence is used as the decoding hidden state sequence at the t time step.

In some embodiments, the kth-level decoder subnetwork includes a kth-level mask attention layer and a kth-level decoding processing layer; in this embodiment, as shown in fig. 8, step 720 includes:

and step 810, determining a K-level attention matrix corresponding to the t-th time step by the K-level mask attention layer according to the K-level decoding input sequence and the coding hidden state sequence corresponding to the input text.

In this application, each level of encoder sub-network in the encoder network includes a mask attention layer, the mask attention layer in the kth level of encoder sub-network is referred to as a kth level of mask attention layer, and correspondingly, an attention moment matrix determined by the kth level of mask attention layer is referred to as a K level of attention matrix. It will be appreciated that at different time steps, the attention matrices determined at different time steps for the kth level masked attention layer are also different, due to the different inputs to the decoder network.

The K-level attention matrix corresponding to the t-th time step indicates the attention scores of each word in the input text, each word in the identification text corresponding to the entity, and each placeholder of the decoded state hidden vector before the t-th time step (namely, the word to be decoded at the t-th time step).

In a specific embodiment, the kth-level mask attention layer may determine the K-level attention matrix corresponding to the t-th time step according to the kth-level decoded input sequence and the encoded hidden state sequence corresponding to the input text based on a single head attention mechanism or a multi-head attention mechanism.

Step 820, determining a K-level mask attention matrix corresponding to the t-th time step by the K-level mask attention layer according to the K-level attention matrix corresponding to the t-th time step and the mask matrix corresponding to the t-th time step; the mask matrix corresponding to the t-th time step indicates the attention scores that need to be masked and the attention scores that do not need to be masked in the K-th order attention matrix corresponding to the t-th time step.

In this application, for convenience of description, the masking attention layer in the K-th decoder sub-network is referred to as a K-th masking attention layer, and the other neural network layers after the K-th masking attention layer in the K-th decoder sub-network are referred to as K-th decoding processing layers, where the neural network layers after the masking attention layer in the decoder sub-network may be one layer or multiple layers, for example, a fully connected network layer, a convolutional network layer, a feedforward network layer, and the like, and are not particularly limited herein.

It is understood that the decoded input sequences of different levels of the mask attention layer at the same time step are different, and therefore, the attention moment arrays determined by different levels of the mask attention layer at the same time step may be different.

In some embodiments, at the t-th time step, the mask matrix corresponding to the t-th time step determined by the above process may be shared by the decoder networks of the respective stages without repeatedly determining the mask matrix corresponding to the t-th time step. In this embodiment, before step 820, the mask matrix corresponding to the t-th time step may be determined by the following process: aiming at a target entity corresponding to each word to be decoded at the t-th time step, determining non-relevant texts corresponding to the words to be decoded at the t-th time step in a target text and a decoded word determined before the t-th time step; and determining a mask matrix corresponding to the t time step according to the corresponding positions of the words in the non-relevant text corresponding to the words to be decoded in the t step in the K-level decoding input sequence and the K-level coding hidden state sequence.

In some embodiments, the K-level attention matrix corresponding to the t-th time step may be matrix-added to the mask matrix corresponding to the t-th time step, and the matrix obtained by the addition may be used as the K-level mask attention matrix corresponding to the t-th time step. In the mask matrix corresponding to the t-th time step, the attention score of each word in the non-relevant text corresponding to the word to be decoded on the word to be decoded is set to be a first specified number, for example, set to be minus infinity, and the attention score of each word in the relevant text corresponding to the word to be decoded on the word to be decoded is a second specified number, for example, 0.

In this way, after the addition, the attention score of the word to be decoded of the text related to the word to be decoded in the K-level mask attention matrix is the same as that in the K-level attention matrix, and the attention score of the word to be decoded of the non-related text corresponding to the word to be decoded in the K-level mask attention matrix is reset to minus infinity, and in the subsequent processing process, the attention score of the word to be decoded of the non-related text corresponding to the word to be decoded can be converted into 0 through normalization processing and the like, so that the purpose of determining the decoding hidden state vector of the word to be decoded without using the information of the non-related text corresponding to the word to be decoded can be achieved.

In other embodiments, after determining the non-relevant text corresponding to each word to be decoded at the tth time step, the attention score of a word in the relevant text corresponding to the word to be decoded in the mask matrix corresponding to the tth time step may be set to 0, and the attention score of a word in the irrelevant text representing the word to be decoded to the word to be decoded is set to be the opposite number of the attention score of the corresponding position in the K-level attention matrix, thus, after matrix-adding the K-level attention matrix corresponding to the t-th time step and the mask matrix corresponding to the t-th time step, in the obtained K-level mask attention matrix, the attention score of the related text of a word to be decoded on the word to be decoded is the same as that in the K-level attention moment matrix, and the attention score of the non-related text of the word to be decoded on the word to be decoded is 0.

In the Transformer model, future information cannot be seen by the model through a Sequence mask matrix, that is, for a Sequence, at a time point of time step t, a decoded output can only depend on an output before the time point t but cannot depend on an output after the time point t. The specific method comprises the following steps: a mask (attribute mask) matrix for the lower triangle is generated, and the values of the upper triangle are all 0. Applying this matrix to each sequence allows time t to see only the values before time t. Specifically, in this scheme, based on this manner, the mask matrix at each time step may be determined in a similar manner, so that the attention score of a word in the non-relevant text corresponding to a word to be decoded in the mask matrix to the word to be decoded is zero.

And step 830, the K-level decoding processing layer performs processing according to the K-level mask attention matrix corresponding to the t-th time step, the K-level decoding input sequence and the coding hidden state sequence corresponding to the input text to obtain a K-level decoding output sequence of the K-level decoder subnetwork at the t-th time step.

In some embodiments, the K-level decoding processing layer may perform weighting processing on vector representations corresponding to all words in the K-level decoded input sequence and the coded hidden-state sequence corresponding to the input text according to an attention score indicated by the K-level mask attention moment array corresponding to the t-th time step, so as to obtain a K-level decoded output sequence of the K-level decoder subnetwork at the t-th time step.

In some embodiments, if the decoder subnetwork is one of the decoders in the transform model, the K-level decoding processing layer may include a first sum and normalization (Add & normalization) layer, a feed-forward network layer, and a second sum and normalization (Add & normalization) layer, in serial concatenation with a mask attention layer.

Fig. 9 is a schematic diagram of a structure of a decoder sub-network according to an embodiment of the present application. In this embodiment, the decoder subnetwork is the decoder in the transform model. As shown in fig. 9, the decoder sub-network comprises a cascaded mask attention layer, a first sum & normalization (Add & normalization) layer, a first Feed Forward (Feed Forward) network layer and a second sum and normalization layer, wherein the decoder sub-network further comprises an identity map pointing from the input of the mask attention layer to the first sum and normalization layer and an identity map pointing from the input of the first Feed Forward network layer to the second sum and normalization layer (the identity map is also referred to as residual concatenation). In the present embodiment, the mask Attention layer is a Multi-Head mask Attention layer (Masked Multi-Head Attention). And by residual connection, the shallow information is applied to the deep calculation, so that the problems of gradient disappearance and gradient explosion are avoided.

In some embodiments, based on the embodiment corresponding to fig. 7, the encoding hidden state sequence corresponding to the input text may be obtained by performing an encoding process step by step through a multi-stage encoder subnetwork. Specifically, in this embodiment, step 230 includes: acquiring an R-level coding input sequence corresponding to an R-level encoder sub-network, wherein when R is 1, the R-level coding input sequence is a sequence obtained by processing an input text; when R > 1, the R-level encoded input sequence is an R-1 level encoded output sequence output by the R-1 level encoder subnetwork; r is more than or equal to 1 and less than or equal to S, and S is the total number of the encoder sub-networks; r and S are positive integers; the R-level encoder sub-network performs encoding processing on the R-level encoding input sequence and outputs an R-level encoding output sequence; if R is less than S, the R-level coding output sequence is used as an R + 1-level coding input sequence corresponding to the R + 1-level encoder sub-network, and the R + 1-level coding input sequence is subjected to coding processing by the R + 1-level encoder sub-network; and if R is equal to S, taking the R-level coded output sequence as a coded hidden state sequence corresponding to the input text.

In this embodiment, for convenience of description, the encoded input sequence input to the sub-network of the R-th level encoder is referred to as an R-level encoded input sequence, and the encoded output sequence output from the sub-network of the R-th level encoder is referred to as an R-level encoded output sequence.

In this embodiment, the feature extraction is performed on the level 1 coded input sequence step by step through a multi-level encoder subnetwork, so as to obtain a coded hidden state sequence that accurately reflects the semantic information of the input text. Similarly, the structures of the encoder subnetworks of different levels may be the same or different, and are not particularly limited herein.

In some embodiments, each word in the input text may be embedded coded (Token Embedding), position coded and segmented coded, and then the embedded vector, the position coded and the segmented coded corresponding to each word in the input text are combined in an overlapping manner to obtain a level 1 coded input sequence corresponding to the input text. Similarly, the position code corresponding to the word is used for indicating the position of the word in the input text, and the segment code is used for indicating whether the word belongs to the entity content to be output.

The encoder subnetworks may be constructed by one or more neural networks, such as at least one of convolutional neural networks, fully-connected neural networks, recurrent neural networks, feed-forward neural networks, and the like. The structures of the encoder subnetworks of different levels may be the same or different. In some embodiments, the encoder subnetworks may be encoders in a Transformer model (Transformer network model).

Fig. 10 is a schematic diagram of a structure of a sub-network of encoders according to an embodiment of the present application. In this embodiment, the encoder subnetworks are encoders in a Transformer model (Transformer network model). As shown in fig. 10, the encoder sub-network comprises an attention layer, a third summing and normalization layer, a second feed-forward network layer, and a fourth summing and normalization layer, which are sequentially cascaded, wherein the encoder sub-network further comprises an identity map directed by an input from the attention layer to the third summing and normalization layer, and an identity map directed by an input from the second feed-forward network layer to the fourth summing and normalization layer. Wherein the attention layer may be based on a single-head attention mechanism or a multi-head attention mechanism to calculate the corresponding attention score.

Fig. 11 is a flowchart illustrating extracting entity content according to an embodiment of the present application. As shown in fig. 11, the encoder network 1110 includes multiple levels of encoder subnetworks, and the decoder network 1120 includes multiple levels of decoder subnetworks, as shown in fig. 11, it is assumed that the entity to be subjected to entity content extraction is an entity in the embodiment corresponding to fig. 5A to 5C, and the grouping result is also as described above, in fig. 11, mask is abbreviated as m, that is, mask00 is m 01.

As shown in fig. 11, after the word vectors (Token embedding) corresponding to the respective words in the input text and the positions of the respective words in the input text are subjected to Position encoding (Position embedding) and Segment encoding (Segment embedding), the word vectors, the Position encoding, and the Segment encoding corresponding to the respective words are superimposed, and then the input sequence of the level 1 encoding obtained by the superposition combination is input to the sub-network of the level 1 encoder. And then, coding processing is carried out by each level of encoder sub-network based on the 1-level coding input sequence, and the S-level coding output sequence output by the last level of encoder sub-network (namely the S-level encoder sub-network) is used as the coding hidden state sequence corresponding to the input text.

And then, inputting the coding hidden state sequence corresponding to the input text into each level of decoder sub-network, and performing decoding processing step by the decoder sub-network of each level by using the coding hidden state sequence corresponding to the input text and the decoding output sequence output by the decoder sub-network of the previous level to obtain M decoding hidden state vectors obtained at each time step.

For the decoder network, at each time step, after determining the input sequence corresponding to the decoder network at the t-th time step, position coding and segmentation coding are correspondingly performed to obtain a 1-level decoded input sequence, and then, decoding processing is performed step by each level of decoder subnetworks to obtain M decoded hidden state vectors corresponding to the time step, where u is a target text length threshold in fig. 11.

In this embodiment, the total number (or the total number of stages) of the encoder subnetworks and the total number (or the total number of stages) of the decoder subnetworks may be the same or different, and may be specifically set according to actual needs, and is not specifically limited herein.

In the example above, the information utilized by the decoder subnetworks of different levels from the encoder network is the same. In other embodiments, the information from the encoder network utilized by different levels of the encoder subnetworks may also be different.

Specifically, in the present embodiment, the decoder network includes a 1 st-order decoder subnetwork to a P-th-order decoder subnetwork cascaded in sequence, where P is a positive integer; the information from the encoder network utilized by different peer encoder sub-networks may be different. Specifically, in this embodiment, the coding hidden state sequence includes a level 1 to level P coding hidden state sequence; in this embodiment, as shown in fig. 12, step 630 includes:

step 1210, acquiring a K-level decoding input sequence corresponding to a K-level decoder subnetwork; wherein K is a positive integer, and K is more than or equal to 1 and less than or equal to P; when K is 1, the K-level decoding input sequence is a sequence obtained by processing the input sequence corresponding to the t-th time step; when K > 1, the K-level decoded input sequence is a K-1 level output sequence at the t-time step for the K-1 level decoder subnetwork.

And step 1220, decoding by the K-th decoder sub-network according to the K-th decoding input sequence and the K-th coding hidden state sequence based on a mask attention mechanism to obtain a K-level decoding output sequence of the K-th decoder sub-network at the t-th time step.

In some embodiments, the kth-level decoder subnetwork includes a kth-level mask attention layer and a kth-level decoding processing layer; in this embodiment, step 1220 includes: determining a K-level attention matrix corresponding to the t-th time step by the K-level mask attention layer according to the K-level decoding input sequence and the K-level coding hidden state sequence; determining a K-level mask attention matrix corresponding to the t-th time step by the K-level mask attention layer according to the K-level attention matrix corresponding to the t-th time step and a mask matrix corresponding to the t-th time step; the mask matrix corresponding to the t time step indicates the attention scores needing to be masked and the attention scores not needing to be masked in the K-level attention matrix corresponding to the t time step; and processing by the K-level decoding processing layer according to the K-level mask attention matrix corresponding to the t-th time step, the K-level decoding input sequence and the coding hidden state sequence corresponding to the input text to obtain the K-level decoding output sequence of the K-level decoder sub-network at the t-th time step.

The process of specifically calculating the K-level attention matrix, the mask matrix corresponding to the t-th time step, and the K-level attention matrix corresponding to the t-th time step is described above, and is not described herein again.

And step 1230, if K is less than P, taking the K-level decoding output sequence at the t-th time step as a K + 1-level decoding input sequence corresponding to the K + 1-level decoder sub-network, and continuing decoding processing by the K + 1-level decoder sub-network.

In step 1240, if K is equal to P, the K-level decoded output sequence at the t-th time step is defined as a decoded hidden state sequence at the t-th time step.

In this embodiment, in the decoding process, the information from the decoder network used is not only the feature information output by the last decoder subnetwork in the decoder network, but also the feature information output by the middle decoder subnetwork, so that more feature information are used in the decoding process, and therefore, the decoding accuracy can be further improved.

In some embodiments, based on the embodiment shown in fig. 12, step 230, includes: acquiring a T-level coding input sequence corresponding to a T-level encoder subnetwork, wherein T is more than or equal to 1 and less than or equal to P-1, and when T is equal to 1, the T-level coding input sequence is a 1-level coding hidden state sequence obtained by embedding an input text; when T is more than 1, the T-level coding input sequence is a T-level coding hidden state sequence output by the T-1 level coder network. And the T-level encoder sub-network carries out encoding processing on the T-level encoding input sequence to obtain a T + 1-level encoding hidden state sequence output by the T-level encoder sub-network. And if T is less than P-1, taking the T + 1-level coding hidden state sequence as a T + 1-level coding input sequence corresponding to the T + 1-level encoder sub-network, and carrying out coding processing by the T + 1-level encoder sub-network. And if T is equal to P-1, taking the coding hidden state sequence from the level 1 to the level P as the coding hidden state sequence corresponding to the input text.

In this embodiment, since the sequence obtained by performing the embedding process on the input text is regarded as the level 1 encoding hidden state sequence, the total number of encoder subnetworks is one less than the total number of decoder subnetworks. The structures of the encoder sub-networks at each level may be the same or different, for example, the structures of the encoder sub-networks at each level may be as shown in fig. 10.

Fig. 13 is a flow chart illustrating extracting entity content according to another embodiment of the present application. In contrast to the embodiment of fig. 11, in this embodiment, the feature information from the encoder utilized by each level of the decoder sub-network is different, and in this embodiment, the feature information from the encoder utilized by the level 1 decoder sub-network is the level 1 encoded input sequence of the level 1 encoder sub-network, and it can also be understood that the K level encoded input sequence of the K level encoder sub-network is utilized by the K level encoder sub-network.

In another embodiment, the total number of encoder subnetworks may be set equal to the total number of decoder subnetworks, such that the information from the encoder utilized by the level 1 decoder subnetwork is the level 1 encoded output sequence of the level 1 encoder subnetwork. Fig. 14 is a flow chart illustrating extracting entity content according to another embodiment of the present application. In contrast to the embodiment of fig. 13, in this embodiment, the characterization information from the encoders utilized by the kth-level encoder subnetwork is the K-level encoded output sequence output by the kth-level encoder subnetwork.

In the above embodiment, the encoder sub-networks may be encoders in a transform model and the decoder sub-networks may be decoders in a transform model. The entity content is extracted according to the method of the application through the Transformer model, the parallel extraction of the entity content corresponding to a plurality of entities is realized, the extraction efficiency of the entity content is greatly improved, the text length of the entity content is not limited by length, and the method can be widely applied to the extraction scene of the entity content, such as application scenes of OCR text structuring (certificates, receipts, bills and the like), machine question answering and the like. Due to the adoption of a mask attention mechanism, in the parallel decoding process, the attention scores of the non-relevant texts corresponding to the words to be decoded relative to the words to be decoded are masked, so that the obtained characteristic information of the words to be decoded is only relevant to the characteristic information of the relevant texts of the words to be decoded and cannot be influenced by the characteristic information of the non-relevant texts corresponding to the words to be decoded, the entity contents corresponding to the entities extracted according to the method are completely consistent with the entity contents decoded in a word-by-word serial mode, and the training process of a transform model does not need to be adjusted.

The method can be applied to the scene of extracting the entity content at the cloud end, the use delay of a user can be greatly reduced, the user experience of a cloud product is improved, and the redundant calculation amount is reduced, so that the hardware cost is greatly reduced, and the hardware cost overhead is saved.

Embodiments of the apparatus of the present application are described below, which may be used to perform the methods of the above-described embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the above-described embodiments of the method of the present application.

Fig. 15 is a block diagram illustrating an apparatus for extracting entity content according to an embodiment of the present application, where the apparatus for extracting entity content includes, as shown in fig. 15: an obtaining module 1510, configured to obtain grouping information of N entities, where the grouping information indicates that the N entities are divided into M groups and entities included in each group, where M and N are positive integers, and M is greater than or equal to 2 and less than N; the sum of the text length thresholds corresponding to the entities belonging to the same group is not greater than the target length threshold; an order determining module 1520, configured to determine, according to the grouping information, an extraction order corresponding to each group; the splicing module 1530 is configured to splice the target text with the identification texts corresponding to each of the N entities to obtain an input text; the encoding module 1540 is configured to perform encoding processing on the input text to obtain an encoded hidden state sequence corresponding to the input text; and the parallel decoding module 1550 is configured to perform M-way parallel decoding on the coded hidden state sequence corresponding to the input text according to the extraction order based on a mask attention mechanism, so as to obtain M-way output texts, where one way of the output texts includes entity contents corresponding to all entities belonging to a group in the target text, and a text length of the entity content corresponding to each entity does not exceed a text length threshold corresponding to the entity.

In some embodiments, parallel decode module 1550 includes: a target entity determining unit, configured to determine, according to the extraction order, target entities corresponding to M to-be-decoded words to be decoded at the t-th time step, respectively, where different to-be-decoded words correspond to different target entities; wherein t is a positive integer; the first splicing unit is used for splicing the embedded vectors corresponding to the M placeholders with the decoded hidden state vector before the tth time step to obtain an input sequence corresponding to the decoder network at the tth time step; the parallel decoding unit is used for carrying out parallel decoding by using an input sequence corresponding to the t-th time step and a coding hidden state sequence corresponding to an input text through a decoder network based on a mask attention mechanism to obtain a decoding hidden state sequence at the t-th time step, wherein the decoding hidden state sequence comprises M decoding hidden state vectors, and one decoding hidden state vector is used for determining a word to be decoded; the mask attention mechanism is used for performing mask processing on the attention scores of the non-related texts corresponding to the words to be decoded relative to the words to be decoded, wherein the non-related texts corresponding to the words to be decoded comprise identification texts corresponding to other entities except the target entity and decoded words corresponding to other entities except the target entity; and the combining unit is used for combining the words respectively corresponding to the M decoding hidden state vectors obtained at each time step according to the time sequence to obtain M paths of output texts.

In some embodiments, the apparatus for extracting entity content further comprises: the time step interval determining module is used for determining a time step interval according to the target length threshold; the time step interval is 1-Q, and Q is a target length threshold; a time step subinterval dividing module, configured to divide a time step interval into a target number of time step subintervals according to a target number of entities included in a group and a text length threshold corresponding to the entities included in the group, according to an extraction sequence corresponding to the group, where a time step subinterval corresponds to one entity in the group to which the time step subinterval corresponds, and a length of the time step subinterval is the text length threshold corresponding to the entity to which the time step subinterval corresponds; in this embodiment, the target entity determining unit is further configured to: for each group in the M groups, in the time step subinterval corresponding to the group, the time step subinterval where t is located is determined, and the entity corresponding to the time step subinterval where t is located is taken as the target entity of the corresponding group at the t-th time step.

In some embodiments, the decoder network includes a cascade of a level 1 decoder subnetwork to a level P decoder subnetwork, P being a positive integer; in this embodiment, the parallel decoding unit includes: a first obtaining unit, configured to obtain a K-level decoded input sequence of a K-level decoder subnetwork at a t-time step; wherein K is a positive integer, and K is more than or equal to 1 and less than or equal to P; when K is 1, the K-level decoded input sequence at the t-th time step is a sequence obtained by processing the input sequence corresponding to the t-th time step; when K > 1, the K-level decoded input sequence at the t-th time step is a K-1-level decoded output sequence output by the K-1-level decoder subnetwork at the t-th time step; the first decoding unit is used for carrying out parallel decoding on the K-level decoding input sequence at the t-th time step and the coding hidden state sequence corresponding to the input text by the K-level decoder sub-network based on a mask attention mechanism to obtain a K-level decoding output sequence of the K-level decoder sub-network at the t-th time step; the first continuous decoding unit is used for taking the K-level decoding output sequence of the K-level decoder sub-network at the t-th time step as a K + 1-level decoding input sequence corresponding to the K + 1-level decoder sub-network at the t-th time step if K is less than P, and decoding by the K + 1-level decoder sub-network; and a first determining unit, configured to take a K-level decoding output sequence of the K-level decoder subnetwork at the t-th time step as a decoding hidden state sequence at the t-th time step if K is equal to P.

In some embodiments, an encoding module comprises: the second acquisition unit is used for acquiring an R-level coding input sequence corresponding to the R-level encoder subnetwork, wherein when R is 1, the R-level coding input sequence is a sequence obtained by processing an input text; when R > 1, the R-level encoded input sequence is an R-1 level encoded output sequence output by the R-1 level encoder subnetwork; r is more than or equal to 1 and less than or equal to S, and S is the total number of the encoder sub-networks; r and S are positive integers; a first encoding unit, for encoding the R-level encoded input sequence by the sub-network of the R-level encoder, and outputting an R-level encoded output sequence; the first continuous coding unit is used for taking the R-level coding output sequence as an R + 1-level coding input sequence corresponding to the R + 1-level encoder sub-network if R is less than S, and the R + 1-level coding input sequence is coded by the R + 1-level encoder sub-network; and the second determining unit is used for taking the R-level coded output sequence as a coded hidden state sequence corresponding to the input text if R is equal to S.

In other embodiments, the decoder network includes a cascade of a level 1 decoder subnetwork to a level P decoder subnetwork, P being a positive integer; the coding hidden state sequence comprises a level 1 to level P coding hidden state sequence; in this embodiment, the parallel decoding unit includes: a third obtaining unit, configured to obtain a K-level decoded input sequence corresponding to a K-level decoder subnetwork; wherein K is a positive integer, and K is more than or equal to 1 and less than or equal to P; when K is 1, the K-level decoding input sequence is a sequence obtained by processing the input sequence corresponding to the t-th time step; when K is more than 1, the K-level decoding input sequence is a K-1-level output sequence of the K-1-level decoder subnetwork at the t time step; the second decoding unit is used for decoding the K-th decoding input sequence and the K-th coding hidden state sequence by the K-th decoder sub-network based on a mask attention mechanism to obtain a K-th decoding output sequence of the K-th decoder sub-network at the t-th time step; a second continuous decoding unit, configured to, if K is less than P, use the K-level decoded output sequence at the t-th time step as a K + 1-level decoded input sequence corresponding to the K + 1-level decoder subnetwork, and continue decoding processing by the K + 1-level decoder subnetwork; and a third determining unit, configured to determine the K-level decoded output sequence at the t-th time step as a decoded hidden state sequence at the t-th time step if K is equal to P.

In some embodiments, an encoding module comprises: a fourth obtaining unit, configured to obtain a T-level coding input sequence corresponding to a T-level encoder subnetwork, where T is greater than or equal to 1 and is less than or equal to P-1, and when T is equal to 1, the T-level coding input sequence is a 1-level coding hidden state sequence obtained by performing embedding processing on an input text; when T is more than 1, the T-level coding input sequence is a T-level coding hidden state sequence output by the T-1 level coder network; the second coding unit is used for coding the T-level coding input sequence by the T-level encoder sub-network to obtain a T + 1-level coding hidden state sequence output by the T-level encoder sub-network; the second continuous coding unit is used for taking the T + 1-level coding hidden state sequence as a T + 1-level coding input sequence corresponding to the T + 1-level encoder sub-network if T is less than P-1, and the T + 1-level encoder sub-network performs coding processing; and if T is equal to P-1, the fourth determining unit is used for taking the coding hidden state sequence from the 1-level coding hidden state sequence to the P-level coding hidden state sequence as the coding hidden state sequence corresponding to the input text.

In some embodiments, the kth-level decoder subnetwork includes a kth-level mask attention layer and a kth-level decoding processing layer; in this embodiment, the second decoding unit includes: the attention matrix determining unit is used for determining a K-level attention matrix corresponding to the t-th time step by the K-level mask attention layer according to the K-level decoding input sequence and the K-level coding hidden state sequence; a mask attention matrix determining unit, configured to determine, by the kth-level mask attention layer, a K-level mask attention matrix corresponding to the tth time step according to the K-level attention matrix corresponding to the tth time step and a mask matrix corresponding to the tth time step; the mask matrix corresponding to the t time step indicates the attention scores needing to be masked and the attention scores not needing to be masked in the K-level attention matrix corresponding to the t time step; and the processing unit is used for processing the K-level decoding processing layer according to the K-level mask attention matrix corresponding to the t-th time step, the K-level decoding input sequence and the coding hidden state sequence corresponding to the input text to obtain a K-level decoding output sequence of the K-level decoder sub-network at the t-th time step.

In some embodiments, the apparatus for extracting entity content further comprises: the irrelevant text determining module is used for determining irrelevant text corresponding to the words to be decoded at the t-th time step from the target text and the decoded words determined before the t-th time step aiming at the target entity corresponding to each word to be decoded at the t-th time step; and the mask matrix determining module is used for determining a mask matrix corresponding to the t time step according to the corresponding positions of the words in the non-relevant text corresponding to the words to be decoded in the t step in the K-level decoding input sequence and the K-level coding hidden state sequence.

In some embodiments, the apparatus for extracting entity content further comprises: a text length threshold acquisition module, configured to acquire text length thresholds respectively set for the N entities; a target length threshold determination module, configured to use a maximum value of text length thresholds corresponding to the N entities as a target length threshold; and the grouping module is used for dividing the N entities into M groups according to the text length threshold value and the target length threshold value corresponding to each entity to obtain grouping information.

FIG. 16 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application. It should be noted that the computer system 1600 of the electronic device shown in fig. 16 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 16, the computer system 1600 includes a Central Processing Unit (CPU)1601 which can perform various appropriate actions and processes, such as executing the method in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) 1602 or a program loaded from a storage portion 1608 into a Random Access Memory (RAM) 1603. In the RAM 1603, various programs and data necessary for system operation are also stored. The CPU1601, ROM1602, and RAM 1603 are connected to each other via a bus 1604. An Input/Output (I/O) interface 1605 is also connected to the bus 1604.

The following components are connected to the I/O interface 1605: an input portion 1606 including a keyboard, a mouse, and the like; an output section 1607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 1608 including a hard disk and the like; and a communication section 1609 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1609 performs communication processing via a network such as the internet. The driver 1610 is also connected to the I/O interface 1605 as needed. A removable medium 1611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1610 as necessary, so that a computer program read out therefrom is mounted in the storage portion 1608 as necessary.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1609, and/or installed from the removable media 1611. When the computer program is executed by a Central Processing Unit (CPU)1601, various functions defined in the system of the present application are executed.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries computer readable instructions which, when executed by a processor, implement the method of any of the embodiments described above.

According to an aspect of the present application, there is also provided an electronic device, including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any of the above embodiments.

According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of any of the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for extracting entity content, comprising:

acquiring grouping information of N entities, wherein the grouping information indicates that the N entities are divided into M groups and entities included in each group, M and N are positive integers, and M is more than or equal to 2 and is less than N; the sum of the text length thresholds corresponding to the entities belonging to the same group is not greater than the target length threshold;

determining the extraction sequence corresponding to each group according to the group information;

splicing the target text with the identification texts corresponding to the entities in the N entities to obtain an input text;

coding an input text to obtain a coding hidden state sequence corresponding to the input text;

and performing M-path parallel decoding on the coded hidden state sequence corresponding to the input text according to the extraction sequence based on a mask attention mechanism to obtain M-path output texts, wherein one path of the output texts comprises entity contents corresponding to all entities belonging to one group in the target text, and the text length of the entity contents corresponding to each entity does not exceed the text length threshold corresponding to the entity.

2. The method according to claim 1, wherein the performing M-way parallel decoding on the coded hidden state sequence corresponding to the input text according to the extraction order based on the mask attention mechanism to obtain M-way output texts comprises:

determining target entities respectively corresponding to M to-be-decoded words to be decoded at the t time step according to the extraction sequence, wherein different to-be-decoded words correspond to different target entities; wherein t is a positive integer;

splicing the embedded vectors corresponding to the M placeholders with the decoded hidden state vector before the tth time step to obtain an input sequence of the decoder network corresponding to the tth time step;

performing parallel decoding by using an input sequence corresponding to a t-th time step and a coding hidden state sequence corresponding to the input text by the decoder network based on a mask attention mechanism to obtain a decoding hidden state sequence at the t-th time step, wherein the decoding hidden state sequence comprises M decoding hidden state vectors, and one decoding hidden state vector is used for determining a word to be decoded; the mask attention mechanism is used for performing mask processing on the attention scores of the non-related texts corresponding to the words to be decoded relative to the words to be decoded, wherein the non-related texts corresponding to the words to be decoded comprise identification texts corresponding to other entities except a target entity and decoded words corresponding to other entities except the target entity;

and combining words corresponding to the decoding hidden state vectors obtained by decoding the same path according to the time sequence to obtain M paths of output texts.

3. The method according to claim 2, wherein before determining target entities respectively corresponding to M words to be decoded at a tth time step according to the extraction order, the method further comprises:

determining a time step interval according to the target length threshold; the time step interval is 1-Q, and Q is a target length threshold;

dividing the time step interval into a target number of time step subintervals according to the target number of entities included in a group and a text length threshold value corresponding to the entities included in the group and an extraction sequence corresponding to the group, wherein one time step subinterval corresponds to one entity in the corresponding group, and the interval lengths of the time step subintervals corresponding to other entities in one group except the last entity in the extraction sequence are the corresponding text length threshold values;

the determining, according to the extraction order, target entities corresponding to M to-be-decoded words to be decoded at the tth time step includes:

for each group in M groups, determining the time step subinterval where t is located in the time step subinterval corresponding to the group, and taking the entity corresponding to the time step subinterval where t is located as the target entity of the corresponding group at the t-th time step.

4. The method of claim 2, wherein the decoder network comprises a cascade of a level 1 decoder sub-network to a level P decoder sub-network, P being a positive integer;

the decoding, by the decoder network, based on a mask attention mechanism, performing parallel decoding using an input sequence corresponding to a t-th time step and a coded hidden state sequence corresponding to the input text to obtain a decoded hidden state sequence at the t-th time step, includes:

acquiring a K-level decoding input sequence of a K-level decoder sub-network at a t-th time step; wherein K is a positive integer, and K is more than or equal to 1 and less than or equal to P; when K is 1, the K-level decoded input sequence at the t-th time step is a sequence obtained by processing the input sequence corresponding to the t-th time step; when K > 1, the K-level decoded input sequence at the t-th time step is a K-1-level decoded output sequence output by the K-1-level decoder subnetwork at the t-th time step;

performing parallel decoding by the K-th-level decoder sub-network according to the K-level decoding input sequence at the t-th time step and the coding hidden state sequence corresponding to the input text based on a mask attention mechanism to obtain a K-level decoding output sequence of the K-th-level decoder sub-network at the t-th time step;

if K is less than P, taking the K-level decoding output sequence of the K-level decoder sub-network at the t-th time step as a K + 1-level decoding input sequence corresponding to the K + 1-level decoder sub-network at the t-th time step, and decoding by the K + 1-level decoder sub-network; and if K is equal to P, taking the K-level decoding output sequence of the K-level decoder sub-network at the t-th time step as the decoding hidden state sequence at the t-th time step.

5. The method according to claim 4, wherein the encoding the input text to obtain the encoded hidden state sequence corresponding to the input text comprises:

acquiring an R-level coding input sequence corresponding to an R-level encoder sub-network, wherein when R is 1, the R-level coding input sequence is a sequence obtained by processing the input text; when R > 1, the R-level encoded input sequence is an R-1 level encoded output sequence output by the R-1 level encoder subnetwork; r is more than or equal to 1 and less than or equal to S, and S is the total number of the encoder subnetworks; r and S are positive integers;

the R-level coding input sequence is coded by the R-level coder sub-network, and an R-level coding output sequence is output;

if R is less than S, the R-level coding output sequence is used as an R + 1-level coding input sequence corresponding to the R + 1-level encoder sub-network, and the R + 1-level coding input sequence is subjected to coding processing by the R + 1-level encoder sub-network; and if R is equal to S, taking the R-level coded output sequence as a coded hidden state sequence corresponding to the input text.

6. The method of claim 2, wherein the decoder network comprises a cascade of a level 1 decoder sub-network to a level P decoder sub-network, P being a positive integer; the coding hidden state sequence comprises a level 1 to level P coding hidden state sequence;

acquiring a K-level decoding input sequence corresponding to a K-level decoder sub-network; wherein K is a positive integer, and K is more than or equal to 1 and less than or equal to P; when K is 1, the K-level decoding input sequence is a sequence obtained by processing the input sequence corresponding to the t-th time step; when K is more than 1, the K-level decoding input sequence is a K-1-level output sequence of the K-1-level decoder subnetwork at the t time step;

decoding, by the K-th decoder subnetwork, according to the K-th decoding input sequence and the K-th coding hidden state sequence based on a mask attention mechanism to obtain a K-level decoding output sequence of the K-th decoder subnetwork at a t-th time step;

if K is less than P, taking the K-level decoding output sequence at the t time step as a K + 1-level decoding input sequence corresponding to a K + 1-level decoder sub-network, and continuing decoding processing by the K + 1-level decoder sub-network; and if K is equal to P, the K-level decoding output sequence at the t-th time step is used as the decoding hidden state sequence at the t-th time step.

7. The method according to claim 6, wherein the encoding the input text to obtain an encoded hidden state sequence corresponding to the input text comprises:

acquiring a T-level coding input sequence corresponding to a T-level encoder subnetwork, wherein T is more than or equal to 1 and less than or equal to P-1, and when T is equal to 1, the T-level coding input sequence is a 1-level coding hidden state sequence obtained by embedding the input text; when T is more than 1, the T-level coding input sequence is a T-level coding hidden state sequence output by a T-1 level coder network;

the T-level coding input sequence is coded by the T-level encoder sub-network, so that a T + 1-level coding hidden state sequence output by the T-level encoder sub-network is obtained;

if T is less than P-1, the T + 1-level coding hidden state sequence is used as a T + 1-level coding input sequence corresponding to the T + 1-level encoder sub-network, and the T + 1-level encoder sub-network performs coding processing; and if T is P-1, taking the coding hidden state sequence from the level 1 to the level P as the coding hidden state sequence corresponding to the input text.

8. The method of claim 6, wherein the K-th decoder subnetwork includes a K-th level mask attention layer and a K-th level decoding processing layer;

the obtaining, by the kth-level decoder subnetwork, a K-level decoded output sequence at a tth time step according to the kth-level decoded input sequence and the K-level encoded hidden state sequence based on a mask attention mechanism, includes:

determining, by the kth-level mask attention layer, a K-level attention matrix corresponding to a tth time step according to the kth-level decoded input sequence and a K-level encoded hidden state sequence;

determining, by the kth-level mask attention layer, a K-level mask attention matrix corresponding to the tth time step according to the K-level attention matrix corresponding to the tth time step and a mask matrix corresponding to the tth time step; the mask matrix corresponding to the t time step indicates the attention scores needing to be masked and the attention scores not needing to be masked in the K-level attention matrix corresponding to the t time step;

and processing by the K-level decoding processing layer according to the K-level mask attention matrix corresponding to the t-th time step, the K-level decoding input sequence and the coding hidden state sequence corresponding to the input text to obtain a K-level decoding output sequence of the K-level decoder sub-network at the t-th time step.

9. The method of claim 8, wherein the determining, by the kth level mask attention layer, the kth level mask attention matrix at the tth time step is preceded by determining, by the kth level mask attention layer, the kth level mask attention matrix at the tth time step based on the K level attention matrix at the tth time step and a mask matrix at the tth time step, the method further comprising:

for a target entity corresponding to each word to be decoded at the tth time step, determining non-relevant texts corresponding to the word to be decoded at the tth time step from the target texts and the decoded words determined before the tth time step;

and determining a mask matrix corresponding to the t time step according to the corresponding positions of the words in the non-relevant text corresponding to the word to be decoded in the t step in the K-level decoding input sequence and the K-level coding hidden state sequence.

10. The method of claim 1, wherein before the obtaining the grouping information of the N entities, the method further comprises:

acquiring text length thresholds respectively set for N entities;

taking the maximum value of text length threshold values respectively corresponding to the N entities as the target length threshold value;

and dividing the N entities into M groups according to the text length threshold value and the target length threshold value corresponding to each entity to obtain the group information.

11. An apparatus for extracting entity content, comprising:

an obtaining module, configured to obtain grouping information of N entities, where the grouping information indicates that the N entities are divided into M groups and entities included in each group, where M and N are positive integers, and M is greater than or equal to 2 and less than N; the sum of the text length thresholds corresponding to the entities belonging to the same group is not greater than the target length threshold;

the sequence determining module is used for determining the extraction sequence corresponding to each group according to the grouping information;

the splicing module is used for splicing the target text with the identification texts corresponding to the entities in the N entities to obtain an input text;

the encoding module is used for encoding an input text to obtain an encoding hidden state sequence corresponding to the input text;

and the parallel decoding module is used for carrying out M-path parallel decoding on the coding hidden state sequence corresponding to the input text according to the extraction sequence based on a mask attention mechanism to obtain M-path output texts, wherein one path of output texts comprises entity contents corresponding to all entities belonging to one group in the target text, and the text length of the entity contents corresponding to each entity does not exceed the text length threshold corresponding to the entity.

12. An electronic device, comprising:

a processor;

a memory having computer-readable instructions stored thereon which, when executed by the processor, implement the method of any one of claims 1-10.

13. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-10.

14. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, implement the method of any of claims 1-10.