CN113761920A

CN113761920A - Word processing method and device based on double-task model

Info

Publication number: CN113761920A
Application number: CN202010507257.1A
Authority: CN
Inventors: 白静; 唐剑波; 李长亮
Original assignee: Beijing Kingsoft Digital Entertainment Co Ltd
Current assignee: Beijing Kingsoft Digital Entertainment Co Ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2021-12-07

Abstract

The application provides a word processing method and device based on a double-task model, wherein the method comprises the following steps: acquiring a candidate entity fragment, and generating a candidate entity fragment coding vector based on the candidate entity fragment; carrying out entity identification processing and classification pruning processing on the candidate entity fragment coding vectors through an entity identification model to obtain identification pruning coding vectors; and inputting the identified pruning code vectors into a coreference resolution model for processing, and determining the coreference relation among the words in the candidate entity segments. The method and the device provided by the application can improve the accuracy and recall rate of coreference resolution and entity recognition and improve the accuracy rate of word processing.

Description

Word processing method and device based on double-task model

Technical Field

The present application relates to the field of computer technologies, and in particular, to a word processing method and apparatus based on a dual task model, a training method and apparatus for the dual task model, a computing device, and a computer-readable storage medium.

Background

Entity identification refers to identifying and extracting entities with specific meanings or strong reference characters, such as names of people, places, organizational structures, dates and times, proper nouns and the like, in unstructured texts.

The relation is a certain relation between two or more entities, and the relation extraction is to detect and identify a certain semantic relation between the entities from the text, such as a sentence "beijing is the capital, political center and cultural center of china", wherein the expressed relation can be (china, capital, beijing), (china, political center, beijing) or (china, cultural center, beijing).

Coreference resolution is a special extraction of relationships, where one entity of coreference resolution is usually a different expression of another entity in the current context, and the relationship between two entities can be represented as (entity 1, coreference, entity 2).

At present, an entity recognition task and a coreference resolution task of a statement are respectively carried out, information between the tasks cannot be shared and cannot be constrained mutually, and therefore the effects of the entity recognition task and the coreference resolution task are not ideal.

Disclosure of Invention

In view of this, embodiments of the present application provide a word processing method and apparatus based on a dual task model, a training method and apparatus of the dual task model, a computing device, and a computer-readable storage medium, so as to solve technical defects in the prior art.

The embodiment of the application provides a word processing method based on a double-task model, which comprises the following steps:

acquiring a candidate entity fragment, and generating a candidate entity fragment coding vector based on the candidate entity fragment;

carrying out entity identification processing and classification pruning processing on the candidate entity fragment coding vectors through an entity identification model to obtain identification pruning coding vectors;

and inputting the identified pruning code vectors into a coreference resolution model for processing, and determining the coreference relation among the words in the candidate entity segments.

Optionally, inputting the identified pruning encoding vector into a coreference resolution model for processing, and determining coreference relationships among words in the candidate entity segments, including:

scoring the recognition pruning coded vectors through a coreference resolution model, and pruning the recognition pruning coded vectors based on the scores to obtain coreference resolution coded vectors;

generating second relation pair coding vectors based on the coreference resolution coding vectors, carrying out coreference resolution processing on the second relation pair coding vectors through the coreference resolution model to obtain coreference resolution results, and determining the coreference relation among the words in the candidate entity fragments based on the coreference resolution results.

Optionally, the entity recognition model and the coreference resolution model share a feed-forward neural network for scoring;

the obtaining of the recognition pruning code vector by performing entity recognition processing and classification pruning processing on the candidate entity fragment code vector through an entity recognition model comprises:

inputting the candidate entity fragment coding vectors into an entity recognition model, and scoring the candidate entity fragment coding vectors through the feedforward neural network;

and classifying the candidate entity segment coding vectors based on the scores of the candidate entity segment coding vectors to obtain classification labels of the candidate entity segment coding vectors, and pruning the candidate entity segment coding vectors to obtain recognition pruning coding vectors.

Optionally, the coreference resolution model and the entity recognition model share a feed-forward neural network for scoring;

scoring the identified pruning code vectors through a coreference resolution model, and pruning the identified pruning code vectors based on the scores, comprising:

inputting the identification pruning coding vector into the coreference resolution model, and scoring the identification pruning coding vector through the feedforward neural network to obtain the fraction of the identification pruning coding vector;

and taking the identification pruning coded vector with the score larger than or equal to a preset threshold value as a coreference resolution coded vector.

Optionally, the generating a second relation pair encoding vector based on the coreference resolution encoding vector comprises:

obtaining a second initial relation pair coding vector based on the coreference resolution coding vector and the classification label of the coreference resolution coding vector;

and carrying out classified prediction processing on the coding vectors according to the second initial relation, and pruning the coding vectors according to the second initial relation according to a preset proportion on the basis of the classified prediction result to obtain the coding vectors of the second relation pair.

Optionally, obtaining a second initial relationship pair code vector based on the coreference resolution code vector and the class label of the coreference resolution code vector, includes:

encoding the classification label of the coreference resolution encoding vector to generate a second label vector;

and obtaining a second initial relation pair encoding vector of any two coreference resolution encoding vectors based on any two coreference resolution encoding vectors and the corresponding second label vectors.

determining a semantic vector between any two coreference resolution coding vectors based on the positions of the two coreference resolution coding vectors in the candidate entity fragment;

and obtaining a second initial relation pair encoding vector of any two coreference resolution encoding vectors based on any two coreference resolution encoding vectors, semantic vectors between any two coreference resolution encoding vectors and the second label vector corresponding to each coreference resolution encoding vector.

Optionally, determining a semantic vector between the any two coreference resolution coding vectors comprises:

determining a plurality of word vectors between the any two coreference resolution code vectors;

and performing pooling or attention processing on a plurality of word vectors between any two coreference resolution coding vectors to obtain corresponding semantic vectors.

Optionally, performing a classified prediction process on the coding vector according to the second initial relationship, pruning the coding vector according to a preset ratio based on a result of the classified prediction, and obtaining a coding vector according to the second relationship pair, includes:

scoring the coding vectors of the second initial relation through a feedforward neural network to obtain the fraction of the coding vectors of the second initial relation;

carrying out classification prediction processing on the coding vectors of the second initial relation pair to obtain the category of the coding vectors of the second initial relation pair;

pruning the coding vector of the second initial relation pair based on the category and the fraction of the coding vector of the second initial relation pair to obtain the coding vector of the second relation pair.

Optionally, performing coreference resolution processing on the coding vector of the second relation by using the coreference resolution model to obtain a coreference resolution result, including:

and scoring the coding vectors of the second relation through the coreference resolution model, and carrying out classification prediction processing on the coding vectors of the second relation based on a scoring result to obtain a coreference resolution result.

The embodiment of the application provides a training method of a double-task model, which comprises the following steps:

obtaining at least two sample candidate entity pairs and a classification label of each sample candidate entity pair, and generating a sample candidate entity fragment encoding vector based on the sample candidate entities of each sample candidate entity pair;

carrying out entity identification processing and classification pruning processing on the sample candidate entity fragment coding vector through an entity identification model to obtain a sample identification pruning coding vector;

inputting the sample identification pruning coding vector into a coreference resolution model for processing to obtain a second sample relation pair coding vector;

and respectively determining loss values of the entity recognition model and the coreference resolution model according to the sample recognition pruning coding vector and the second sample relation and training the entity recognition model and the coreference resolution model.

Optionally, the step of inputting the sample identification pruning coding vectors into a coreference resolution model for processing to obtain a second sample relation pair coding vector includes:

inputting the sample identification pruning coding vector into the coreference resolution model, scoring the sample identification pruning coding vector through the coreference resolution model, pruning the sample identification pruning coding vector based on the score to obtain a sample coreference resolution coding vector, and generating a second sample relation pair coding vector based on the sample coreference resolution coding vector.

Optionally, determining the loss values of the entity identification model and the coreference resolution model for the code vector based on the sample identification pruning code vector and the second sample relation, respectively, comprises:

calculating a loss value of the entity identification model by using a cross entropy loss function based on the value of the sample identification pruning coding vector and the classification label of the sample identification pruning coding vector;

and calculating the loss value of the coreference resolution model by utilizing a cross entropy loss function based on the score of the second sample relation to the coding vector and the classification label of the second sample relation to the coding vector.

The embodiment of the application provides a word processing device based on a double-task model, which comprises:

an entity fragment acquisition module configured to acquire candidate entity fragments and generate candidate entity fragment encoding vectors based on the candidate entity fragments;

the entity identification pruning module is configured to perform entity identification processing and classification pruning processing on the candidate entity fragment coding vectors through an entity identification model to obtain identification pruning coding vectors;

and the coreference resolution processing module is configured to input the recognition pruning coding vectors into a coreference resolution model for processing, and determine coreference relations among words in the candidate entity fragments.

The embodiment of the application provides a training device of a double-task model, which comprises:

a sample obtaining module configured to obtain at least two sample candidate entity pairs and a class label of each of the sample candidate entity pairs, and generate a sample candidate entity fragment encoding vector based on a sample candidate entity of each sample candidate entity pair;

the sample identification module is configured to perform entity identification processing and classification pruning processing on the sample candidate entity fragment coding vector through an entity identification model to obtain a sample identification pruning coding vector;

the sample processing module is configured to input the sample identification pruning coding vector into a coreference resolution model for processing to obtain a second sample relation pair coding vector;

a model training module configured to determine loss values of the entity recognition model and the coreference resolution model for the code vectors based on a sample recognition pruning code vector and a second sample relationship, respectively, and train the entity recognition model and the coreference resolution model.

Embodiments of the present application provide a computing device, which includes a memory, a processor, and computer instructions stored on the memory and executable on the processor, and when the processor executes the instructions, the processor implements the steps of the word processing method based on the dual task model or the training method based on the dual task model as described above.

Embodiments of the present application provide a computer-readable storage medium storing computer instructions, which when executed by a processor, implement the steps of the word processing method based on a dual task model or the training method of the dual task model as described above.

According to the word processing method and device based on the double-task model, entity identification processing and classification pruning processing are firstly carried out on candidate entity fragment coding vectors through the entity identification model to obtain identification pruning coding vectors, so that negative examples in the candidate entity fragment coding vectors are reduced, then the identification pruning coding vectors are processed through the coreference resolution model, understanding of the coreference resolution model on the candidate entity fragments can be enhanced, a foundation is provided for execution of coreference resolution tasks, and accuracy of word processing is effectively improved.

The word processing method and device based on the double-task model, provided by the application, realize the organic combination of the coreference resolution model and the entity recognition model, realize the organic combination of the coreference resolution task and the entity recognition task, realize the sharing of information in the tasks, effectively improve the accuracy rate and the recall rate of the coreference resolution task and the entity recognition task, and effectively improve the accuracy rate of determining the word processing based on the double-task model through the word relation.

According to the training method and device for the dual-task model, the coreference resolution task and the entity recognition task are organically combined at first in the training process, and then the coreference resolution model and the entity recognition model are trained respectively, so that the model training effect can be effectively improved, and the performance of the coreference resolution model and the performance of the entity recognition model are improved.

Drawings

FIG. 1 is a schematic diagram of a dual task model according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating steps of a word processing method based on a multitask model according to an embodiment of the present application;

FIG. 3 is a flow chart illustrating steps of a method for word processing based on a multitasking model according to another embodiment of the present application;

FIG. 4 is a flowchart illustrating steps of a method for training a dual task model according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a word processing apparatus based on a multitask model according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a training apparatus for a multitask model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description.

First, the noun terms to which one or more embodiments of the present invention relate are explained.

Entity identification: it is meant to identify and extract entities with specific meaning or strong reference, such as name of person, place, organization structure, date and time, proper noun, etc., in unstructured text.

An entity recognition model: a model for performing entity recognition tasks.

And (3) extracting the relation: some semantic relationship between entities is detected and identified from the text, such as the sentence "Beijing is capital, political center and cultural center of China", wherein the expressed relationship can be (China, capital, Beijing), (China, political center, Beijing) or (China, cultural center, Beijing).

Performing coreference resolution: specific relationship extraction, coreference resolution, where one entity is often a different expression of another entity in the current context, the relationship between two entities can be represented as (entity 1, coreference, entity 2).

A coreference resolution model: a model for performing coreference resolution tasks.

Candidate entity fragment (span): a segment consisting of a word or words in a sentence.

Candidate entity fragment coding vector (span embedding): and the candidate entity fragments are subjected to a vector generated by the encoding process of the encoder.

Identifying pruning code vectors: and (4) residual coding vectors after pruning the candidate entity fragment coding vectors.

Pruning: and screening according to a preset rule.

Coreference resolution code vector: and carrying out pruning on the identified pruning coded vectors based on the result of the coreference resolution processing to obtain the residual coded vectors.

A second tag vector: and coding the classification label of the coreference resolution coding vector to obtain the coding vector.

The second relationship is for the encoded vector: and the coding vector is formed by combining the two coreference resolution coding vectors, the second label vector and the distance characteristic vector.

Feed-forward Neural Network (FFNN): the simplest neural network is characterized in that each neuron is arranged in a layered mode, each neuron is only connected with a neuron of the previous layer, receives the output of the previous layer and outputs the output to the next layer, and no feedback exists between the layers, so that the simplest neural network is one of the most widely applied and rapidly developed artificial neural networks at present. In the application, the entity recognition model and the coreference resolution model share one feedforward neural network for scoring.

Convolutional Neural Networks (CNN): the method is a feedforward neural network containing convolution calculation and having a deep structure, and is one of algorithms represented by deep learning (deep learning).

And (4) classification label: an identification for identifying a type of the coding vector.

The accuracy is as follows: the ratio of the number of the identified correct entities to the number of the identified entities is between 0 and 1, and the larger the numerical value is, the higher the accuracy is.

The recall ratio is as follows: the ratio of the number of the identified correct entities to the number of the entities of the sample is between 0 and 1, and the higher the value is, the higher the recall rate is.

Weighted harmonic mean: also known as F1 value, F1 value ═ 2 × accuracy ═ recall)/(accuracy + recall).

In the present application, a word processing method and apparatus based on a dual task model, a training method and apparatus of the dual task model, a computing device and a computer readable storage medium are provided, which are described in detail in the following embodiments one by one.

As shown in fig. 1, the present embodiment provides a dual task model, where the dual task model is used in the word processing method based on the dual task model of the present application, and includes an encoder, an entity recognition model and a coreference resolution model, where the entity recognition model and the coreference resolution model share a feedforward neural network.

In this embodiment, the entity identification model performs entity identification processing and classification pruning processing on the candidate entity segment coding vectors to obtain identification pruning coding vectors.

In this embodiment, the coreference resolution model scores the recognition pruning code vectors, and prunes the recognition pruning code vectors based on the scores to obtain coreference resolution code vectors; generating second relation pair coding vectors based on the coreference resolution coding vectors, carrying out coreference resolution processing on the second relation pair coding vectors through the coreference resolution model to obtain coreference resolution results, and determining the relation among the words in the candidate entity fragments based on the coreference resolution results.

According to the double-task model provided by the embodiment, the entity recognition model and the coreference resolution model are organically combined and share a feedforward neural network for scoring, so that information sharing between the entity recognition model and the coreference resolution model can be realized, and the accuracy and the recall rate of the entity recognition model and the coreference resolution model are improved.

As shown in fig. 2, fig. 2 is a flowchart illustrating steps of a word processing method based on a multitask model according to an embodiment of the present application, including steps S210 to S230.

S210, obtaining candidate entity fragments, and generating candidate entity fragment coding vectors based on the candidate entity fragments.

The candidate entity segment is a word set formed by combining one or more words in a target sentence or a target paragraph and a target article, and each word represents an entity. Specifically, the candidate entity segment can be obtained by performing word segmentation on a target sentence or a target paragraph, a target article, and the like, and extracting one or more target words from the word segmentation result to combine into a word set.

For example, assume that 10 words including the segmentation processing results a1-a10 are obtained after the segmentation processing is performed on the target sentence, a word set composed of a1-a6 is obtained by extracting the segmentation processing results, and the word set is used as a candidate entity segment.

In practical application, the candidate entity fragment may be input to an encoder for encoding processing, so as to generate a candidate entity fragment encoding vector.

In this embodiment, the encoder includes a bi-directional lstm, a pre-trained bert model, a cnn network, and any combination thereof.

Preferably, a pre-trained bert model is used to encode a sentence including a plurality of candidate entity segments to obtain a feature vector at a sentence word level, a cnn network is used to encode to obtain a feature vector at a sentence character level, the feature vector at the word level and the feature vector at the character level are spliced to obtain a spliced vector, the spliced vector is encoded by a bidirectional lstm network to obtain a feature vector with context features, and finally, an attention mechanism is used to calculate each candidate entity segment encoding vector based on the extracted candidate entity segments, and the encoding vectors of the candidate entity segments can be represented by the following formula:

wherein, g_iCode vector, x, representing candidate entity fragment_START(i) ^*、x_END(i) ^*Vector representing the starting and ending positions of candidate entity fragments, phi_(i)The additional features are shown in the drawings and,

representing the result of computing words in each candidate entity fragment based on an attention mechanism,

the specific calculation process of (2) is as follows:

specifically, t represents a candidate entity segment, i represents a word in the candidate entity segment, and formula (2) represents a code vector x corresponding to each word in the candidate entity segment_t ^*Vector (h) output by forward propagation through bidirectional lstm_t，1) And vector (h) of the back propagation output_t，-1) The parameter alpha of the candidate entity fragment t is obtained by multiplying the parameter w of the parameter alpha by the fraction of the feed-forward neural network on the candidate entity fragment, and the weight a of each word in the candidate entity fragment is represented by the formula (4)_i，tBased on the parameter alpha of the candidate entity segment where the word is located and the total parameter of the word in the candidate entity segment, formula (5) represents the coding vector corresponding to each word in the candidate entity segment

The weight parameter a of the word in the candidate entity segment_i，tWith the candidate physical segment encoding vector x_tThus obtaining the product.

In the embodiment, the candidate entity fragment is obtained and is encoded to prepare for the execution of other subsequent tasks, so that the efficiency of the execution of the subsequent tasks is improved.

S220, carrying out entity identification processing and classification pruning processing on the candidate entity fragment coding vectors through an entity identification model to obtain identification pruning coding vectors.

It should be noted that, in the present embodiment, the entity recognition model and

the coreference resolution models share a feed-forward neural network for scoring.

Specifically, the step S220 may further include steps S221 to S222.

S221, inputting the candidate entity fragment coding vectors into an entity recognition model, and scoring the candidate entity fragment coding vectors through the feedforward neural network.

Wherein, the score of each candidate entity segment coding vector is composed of a basic score (transition score) and a classification score (classifier score), and the basic score and the classification score are obtained by scoring through a feedforward neural network. The score of the candidate entity segment encoding vector may be the sum, average, weighted average, etc. of the basic score and the classification score, which is not limited in the present application.

The feedforward neural network scores candidate entity segment coding vectors by using a deep learning principle, specifically, the feedforward neural network calculates or codes the candidate entity segment coding vectors again and maps corresponding scores to obtain the scores of the candidate entity segment coding vectors. It should be noted that the mapping of the score by the feedforward neural network can be continuously adjusted through the execution of the subsequent task, the calculation of the loss value, the feedback of the gradient, and the like. The scores of the candidate entity segment coding vectors may be ten-system scores, percentile scores, thousandth-system scores, and the like, which is not limited in the present application.

In the embodiment, the candidate entity fragment coding vectors are scored, and then entity identification processing is performed, so that the accuracy of the entity identification processing can be improved, and the effect of the entity identification model can be improved.

S222, classifying the candidate entity segment coding vectors based on the scores of the candidate entity segment coding vectors to obtain classification labels of the candidate entity segment coding vectors, and pruning the candidate entity segment coding vectors to obtain recognition pruning coding vectors.

In practical application, the candidate entity segment coding vectors are classified based on the scores of the candidate entity segment coding vectors to obtain a classification label of each candidate entity segment coding vector, so that the category to which each candidate entity segment coding vector belongs can be obtained, a certain proportion of the candidate entity segment coding vectors in one or more categories can be pruned according to the classification result, and the remaining candidate entity segment coding vectors are used as identification pruning coding vectors, or the candidate entity segment coding vectors with the scores smaller than a preset threshold value can be pruned, and the remaining candidate entity segment coding vectors are used as identification pruning coding vectors, which is not limited in the present application.

For example, suppose that candidate entity segment code vectors coexist in n categories of a first category, a second category … … nth category, and the nth category is a negative example, that is, all candidate entity segment code vectors not belonging to the previous category are classified as nth category, and after the classification process, m candidate entity segment code vectors in m candidate entity segment code vectors are classified into m categories₁Each belongs to the first class, m₂Is of a second type … … m_nEach belonging to the nth class (m)₁+m₂+……m_nM), the candidate solid segment coding vectors may be pruned according to the following three ways: (1) pruning of p in the first class₁% candidate solid segment code vectors, second class pruning off p₂% candidate entity fragment encoding vector … … class n pruning p_n% candidate solid segment code vectors and using the remaining candidate solid segment code vectors as identifying pruning code vectors, wherein p₁、p₂……p_nThe numerical values of (A) may be the same or different; (2) the candidate entity fragment coding vectors of the first type and the second type … … n-1 type are not pruned, and the nth type prunes p_n% candidate entity segment code vectors, and using the remaining candidate entity segment code vectors as identification pruning code vectors; (3) and pruning one or more types of candidate entity segment coding vectors with the scores smaller than a preset threshold, and using the remaining candidate entity segment coding vectors as recognition pruning coding vectors, wherein the preset threshold of the scores can be determined according to specific conditions, and the application does not limit the method.

In the embodiment, the candidate entity fragment coding vectors are subjected to classification processing and pruning processing, so that the quality of the coding vectors input by the subsequent coreference resolution model is improved, and a foundation is laid for execution of coreference resolution tasks.

And S230, inputting the identified pruning code vectors into a coreference resolution model for processing, and determining the coreference relation among the words in the candidate entity fragment.

Specifically, the step S230 may further include steps S231 to S232.

S231, scoring the recognition pruning coding vectors through a coreference resolution model, and pruning the recognition pruning coding vectors based on the scores to obtain the coreference resolution coding vectors.

Specifically, the identification pruning coding vector may be input into the coreference resolution model, the identification pruning coding vector is scored through the feedforward neural network to obtain a score of the identification pruning coding vector, and then the identification pruning coding vector with the score greater than or equal to a preset threshold is used as the coreference resolution coding vector.

The score of each identification pruning code vector is composed of a basic score and a classification score, and the basic score and the classification score are obtained through the scoring of a feedforward neural network.

In the embodiment, the identification pruning coding vectors are scored and further processed on the basis of the entity identification task, so that the implementation of the subsequent coreference resolution task is facilitated to be laid.

S232, generating second relation pair coding vectors based on the coreference resolution coding vectors, carrying out coreference resolution processing on the second relation pair coding vectors through the coreference resolution model to obtain coreference resolution results, and determining the coreference relation among the words in the candidate entity fragments based on the coreference resolution results.

Specifically, in step S232, generating a second relation pair encoding vector based on the coreference resolution encoding vector, including S2321 to S2322:

s2321, obtaining a second initial relation pair coding vector based on the coreference resolution coding vector and the classification label of the coreference resolution coding vector.

In a specific embodiment, step S2421 includes: encoding the classification label of the coreference resolution encoding vector to generate a second label vector; and obtaining a second initial relation pair encoding vector of any two coreference resolution encoding vectors based on any two coreference resolution encoding vectors and the corresponding second label vectors.

In practical application, each second-relation pair code vector consists of two coreference resolution code vectors and classification label code vectors corresponding to the two coreference resolution code vectors, in other words, the coreference resolution code vectors of two words in the candidate entity fragment and the second label vectors of the two words can be spliced to obtain a second-relation pair code vector, as shown below:

span_pair_embeddings＝torch.cat([span1_embeddings，span2_embeddings，span1_embeddings*span2_embeddings，span1_label_embedding，span2_label_embedding]，-1)。

cat is a function for stitching together two or more vectors, span _ pair _ entries represents the second relational pair encoding vector, span1_ entries represents the coreference resolved encoding vector 1, span2_ entries represents the coreference resolved encoding vector 2, span1_ label _ embedding represents the label vector of coreference resolved encoding vector 1, and span2_ label _ embedding represents the label vector of coreference resolved encoding vector 2.

It should be noted that the classification label belongs to one kind of feature information of the candidate entity fragment, and in addition, other types of feature information, such as distance, may be combined when generating the second relation pair encoding vector, which may be determined according to specific situations, and this is not limited in this application.

For example, the coreference resolution code vectors of two terms in the candidate entity segment, the second label vectors of the two terms, and the distance feature vectors between the two terms may be spliced to obtain a second relation pair code vector, as shown below:

span_pair_embeddings＝torch.cat([span1_embeddings，span2_embeddings，span1_embeddings*span2_embeddings，antecedent_distance_embeddings，span1_label_embedding，span2_label_embedding]，-1)。

cat is a function for splicing two or more vectors together, span _ pair _ encoding represents a second relational pair encoding vector, span1_ encoding represents a coreference resolution encoding vector 1, span2_ encoding represents a coreference resolution encoding vector 2, anti _ distance _ encoding represents a distance feature vector of the coreference resolution encoding vector 1 and the coreference resolution encoding vector 2, span1_ label _ encoding represents a label vector of the coreference resolution encoding vector 1, and span2_ label _ encoding represents a label vector of the coreference resolution encoding vector 2.

The embodiment obtains the second relation pair coding vector based on the coreference resolution coding vector and the classification label of the coreference resolution coding vector, and is beneficial to improving the execution efficiency and the effect of coreference resolution tasks.

In another specific embodiment, step S2421 comprises: encoding the classification label of the coreference resolution encoding vector to generate a second label vector; determining semantic vectors between any two coreference resolution coding vectors based on the positions of the any two coreference resolution coding vectors in the candidate entity fragments; and obtaining a second initial relation pair encoding vector of any two coreference resolution encoding vectors based on any two coreference resolution encoding vectors, the semantic vector between any two coreference resolution encoding vectors and the second label vector corresponding to each coreference resolution encoding vector.

Specifically, determining a semantic vector between the any two coreference resolution coding vectors includes: determining a plurality of word vectors between the any two coreference resolution code vectors; and performing pooling or attention processing on a plurality of word vectors between any two coreference resolution coding vectors to obtain corresponding semantic vectors.

In a specific application, for candidate entity fragments [ z1, z2, …, zx-1, zx ], [ z1, z2] are coreference resolution coding vectors span3, and [ zx-1, zx ] are coreference resolution coding vectors span4, then [ z3, …, zx-2] is a word vector between any two coreference resolution coding vectors span3 and span4, and the word vectors [ z3, …, zx-2] between the two coreference resolution coding vectors span3 and span4 are subjected to pooling processing or attention processing to obtain corresponding semantic vectors, so that semantic information of the coding vectors of the second initial relation pair can be increased, and the expression capability of the coding vectors of the second initial relation pair can be enhanced.

Specifically, the calculation formula of the second initial relationship to the code vector is as follows:

span_pair_embeddings＝torch.cat([span3_embeddings，span4_embeddings，span3_embeddings*span4_embeddings，segment_info，span3_label_embedding，span4_label_embedding]，-1)。

cat is a function for stitching together two or more vectors, span _ pair _ entries represents the second initial relationship pair encoding vector, span3_ entries represents the coreference resolved encoding vector 1, span4_ entries represents the coreference resolved encoding vector 2, segment _ info represents the word vector between coreference resolved encoding vector 1 and coreference resolved encoding vector 2, span3_ label _ entry represents the label vector of coreference resolved encoding vector 1, and span4_ label _ entry represents the label vector of coreference resolved encoding vector 2.

It should be noted that in the process of calculating the span _ pair _ entries, not all words can have a relationship, for example, in a segment of text, a relationship basically does not occur between words far away. Therefore, in this embodiment, a distance threshold is set, and if the distance between the span3 and the span4 exceeds the threshold, the span _ pair _ embeddings of the span3 and the span4 are pruned directly. The value of the distance threshold may be set according to actual requirements, for example, the threshold is set to be 60 word units.

S2322, carrying out classification prediction processing on the coding vector of the second initial relation, and pruning the coding vector of the second initial relation according to a preset proportion based on the classification prediction result to obtain a coding vector of the second relation pair.

Specifically, step S2322 includes:

Specifically, the process of the coreference resolution processing includes scoring and classification prediction processing, in other words, the coding vectors are scored according to the coreference resolution model, and the coding vectors are subjected to classification prediction processing according to the second relation based on the scoring result, that is, the coreference resolution processing is completed, the coreference resolution result is obtained, and the coreference relationship between the words is determined.

According to the embodiment, the coreference resolution processing is carried out on the coding vectors according to the second relation, the coreference relation among the words is determined, the accuracy of the coreference resolution task can be achieved, and the accuracy of determining the coreference relation of the words is effectively improved.

The word processing method based on the dual-task model provided by this embodiment includes performing entity identification processing and classification pruning processing on candidate entity segment code vectors through an entity identification model to obtain identification pruning code vectors, so as to reduce negative examples in the candidate entity segment code vectors, performing coreference resolution processing through a coreference resolution model, pruning the identification pruning code vectors again based on results to obtain coreference resolution code vectors, and further screening the identification pruning code vectors based on different task requirements is realized, wherein a second relation pair code vector is generated based on the coreference resolution code vectors, so that understanding of the candidate entity segments by the coreference resolution model can be further enhanced in a deeper level, a basis is provided for execution of coreference resolution tasks, the coreference resolution processing is finally performed respectively, and relationships among words are determined based on the processing results, the accuracy of word processing can be effectively improved.

The word processing method based on the dual-task model provided by the embodiment realizes the organic combination of the coreference resolution model and the entity recognition model, the organic combination of the coreference resolution task and the entity recognition task, the sharing of information in the two tasks, the accuracy and the recall rate of the coreference resolution task and the entity recognition task can be effectively improved, and the accuracy of determining the word processing based on the dual-task model by the word relation can be effectively improved.

Referring to fig. 3, the present embodiment provides a word processing method based on a dual task model, including:

s310, obtaining candidate entity fragments and generating candidate entity fragment coding vectors based on the candidate entity fragments.

Step S310 is the same as step S210 in the foregoing embodiment, and for the specific explanation of step S310, reference is made to the detailed description of the foregoing embodiment, which is not repeated herein.

S311, inputting the candidate entity segment coding vectors into an entity recognition model, and scoring the candidate entity segment coding vectors through the feedforward neural network.

S312, classifying the candidate entity segment coding vectors based on the scores of the candidate entity segment coding vectors to obtain classification labels of the candidate entity segment coding vectors, and pruning the candidate entity segment coding vectors to obtain recognition pruning coding vectors.

Step S311 and step S312 are the same as steps S221 to S222 in the foregoing embodiment, and for the specific explanation of step S311 and step S312, refer to the detailed description of the foregoing embodiment, which is not repeated herein.

S313, scoring the identified pruning code vectors through a coreference resolution model, and pruning the identified pruning code vectors based on the scores to obtain coreference resolution code vectors.

S314, obtaining a second initial relation pair encoding vector based on the coreference resolution encoding vector and the classification label of the coreference resolution encoding vector.

S315, carrying out classification prediction processing on the coding vector according to the second initial relation, and pruning the coding vector according to the second initial relation according to a preset proportion on the basis of the classification prediction result to obtain a coding vector of a second relation pair.

For a detailed explanation of steps S313 to S315, refer to the detailed content of step S230 in the foregoing embodiment, and are not repeated herein.

As shown in fig. 4, the present embodiment provides a training method of a dual task model, which includes steps S410 to S440.

S410, obtaining at least two sample candidate entity pairs and the classification label of each sample candidate entity pair, and generating a sample candidate entity fragment encoding vector based on the sample candidate entities of each sample candidate entity pair.

For details, reference may be made to the above embodiments, which are not described herein again.

And S420, carrying out entity identification processing and classification pruning processing on the sample candidate entity fragment coding vector through an entity identification model to obtain a sample identification pruning coding vector.

Specifically, assuming that n types of sample candidate entity segments coexist (n is greater than or equal to 1, and n is an integer), then all the sample candidate entity segments not belonging to the n categories belong to a negative example, the sample candidate entity segment coding vectors are classified based on the fraction of the sample candidate entity segment coding vectors to obtain a classification label of each sample candidate entity segment coding vector, so that the category to which each sample candidate entity segment coding vector belongs can be obtained, and after a part of the sample candidate entity segment coding vectors in the negative example are pruned according to a preset proportion, the remaining other sample candidate entity segment coding vectors are the identification pruned coding vectors. The ratio of the candidate entity segment code vectors of the samples pruned in the negative example may be determined according to specific situations, such as one sixth, one fifth, and the like, which is not limited in this application.

In the embodiment, the sample candidate entity segment coding vectors are classified and processed, and part of the coding vectors in the negative examples are pruned, so that the model can simultaneously learn the positive examples and the negative examples in proper proportion, namely, simultaneously learn from the aspects of correctness and errors, and the model training effect is improved.

And S430, inputting the sample identification pruning coding vector into a coreference resolution model for processing to obtain a second sample relation pair coding vector.

Specifically, the step S430 includes: inputting the sample identification pruning coding vector into the coreference resolution model, scoring the sample identification pruning coding vector through the coreference resolution model, pruning the sample identification pruning coding vector based on the score to obtain a sample coreference resolution coding vector, and generating a second sample relation pair coding vector based on the sample coreference resolution coding vector.

S440, determining loss values of the entity recognition model and the coreference resolution model respectively for the coding vectors based on the sample recognition pruning coding vectors and the second sample relation, and training the entity recognition model and the coreference resolution model.

Specifically, step S440 includes:

For example, in the entity recognition model training process, the result of the set of loss values obtained through cross entropy calculation may be [ -0.0000, -6.8651, -9.8858, -9.3611, -9.4160, -8.8986, -10.0036], where 7 numbers respectively correspond to numbers 0-6, and each number represents a classification label.

Softmax becomes the classification probability [9.9856e-01, 1.0421e-03, 5.0818e-05, 8.5878e-05, 8.1292e-05, 1.3638e-04, 4.5174e-05], and finally the maximum value is taken as the final loss value.

The formula for the cross entropy loss function is as follows:

the cross entropy is a difference representing two probability distributions p, q, where p represents a true distribution, i.e., a sample identification pruned coding vector, and a class label corresponding to the second sample relationship pair coding vector, q represents a non-true distribution, i.e., a sample identification pruned coding vector, and the second sample relationship pair coding vector, and H (p, q) represents a loss value.

Specifically, based on the loss values of the entity recognition model and the coreference resolution model, the weight value of each layer of neuron nodes in the model is reversely adjusted from the output layer to the input layer of the model, and the model is trained.

According to the training method of the dual-task model, the coreference resolution task and the entity recognition task are organically combined in the training process, and then the coreference resolution model and the entity recognition model are trained respectively, so that the model training effect can be effectively improved, and the performance of the coreference resolution model and the performance of the entity recognition model are improved.

As shown in fig. 5, the present embodiment discloses a word processing apparatus based on a dual task model, which includes:

an entity fragment obtaining module 510 configured to obtain candidate entity fragments and generate candidate entity fragment encoding vectors based on the candidate entity fragments;

an entity identification pruning module 520, configured to perform entity identification processing and classification pruning processing on the candidate entity segment coding vectors through an entity identification model, so as to obtain identification pruning coding vectors;

and a coreference resolution processing module 530 configured to input the identified pruning code vectors into a coreference resolution model for processing, and determine coreference relationships among the words in the candidate entity segments.

Optionally, the coreference resolution processing module 530 is further configured to:

generating second relation pair coding vectors based on the coreference resolution coding vectors, carrying out coreference resolution processing on the second relation pair coding vectors through the coreference resolution model to obtain coreference resolution results, and determining the relation among the words in the candidate entity fragments based on the coreference resolution results.

the entity identification pruning module 520 is further configured to:

and classifying the candidate entity segment coding vectors based on the scores of the candidate entity segment coding vectors, and pruning the candidate entity segment coding vectors based on the classification processing result to obtain an identification pruning coding vector.

the coreference resolution processing module 530 is further configured to:

Optionally, the coreference resolution processing module 540 is further configured to: and scoring the coding vectors of the second relation through the coreference resolution model, and carrying out classification prediction processing on the coding vectors of the second relation based on a scoring result to obtain a coreference resolution result.

The word processing device based on the dual-task model realizes the organic combination of the coreference resolution model and the entity recognition model, realizes the organic combination of the coreference resolution task and the entity recognition task, realizes the sharing of information in the two tasks, can effectively improve the accuracy and recall rate of the coreference resolution task and the entity recognition task, and effectively improves the accuracy rate of determining the word processing based on the dual-task model by the word relation.

As shown in fig. 6, the present embodiment discloses a training apparatus for a dual task model, which includes:

a sample obtaining module 610 configured to obtain at least two sample candidate entity fragments and a class label of each of the sample candidate entity pairs, and generate a sample candidate entity fragment encoding vector based on the sample candidate entity fragments;

a sample identification module 620 configured to perform entity identification processing and classification pruning processing on the sample candidate entity segment coding vector through an entity identification model to obtain a sample identification pruning coding vector;

a sample processing module 630, configured to input the sample identification pruning coding vector into a coreference resolution model for processing, so as to obtain a second sample relation pair coding vector;

a model training module 640 configured to determine loss values of the entity recognition model and the coreference resolution model for the code vectors based on the sample recognition pruning code vectors and the second sample relationships, respectively, and train the entity recognition model and the coreference resolution model.

Optionally, the sample processing module 630 is further configured to: inputting the sample identification pruning coding vector into the coreference resolution model, scoring the sample identification pruning coding vector through the coreference resolution model, pruning the sample identification pruning coding vector based on the score to obtain a sample coreference resolution coding vector, and generating a second sample relation pair coding vector based on the sample coreference resolution coding vector.

Optionally, the model training module 640 is further configured to:

The application provides a two task model's trainer at training in-process at first carries out the organic combination with coreference resolution task, entity identification task, and the reexamination model and entity identification model are trained to the coreference respectively, can effectively improve the effect of model training, improve coreference resolution model and entity identification model's performance.

It should be noted that the components in the device claims should be understood as functional blocks which are necessary to implement the steps of the program flow or the steps of the method, and each functional block is not actually defined by functional division or separation. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.

As shown in fig. 7, fig. 7 is a block diagram illustrating a structure of a computing device 700 according to an embodiment of the present description. Components of the computing device 700 include, but are not limited to, memory 770 and processor 720. Processor 720 is coupled to memory 770 via bus 730, and database 750 is used to store data.

Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 740 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.77 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 7 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server. The computing device may perform the method of any of the embodiments described above.

An embodiment of the present application further provides a computer readable storage medium storing computer instructions, which when executed by a processor, implement the steps of the word processing method based on the dual task model or the training method of the dual task model as described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium is the same as the technical solution of the word processing method based on the dual task model or the training method based on the dual task model, and details of the technical solution of the storage medium, which are not described in detail, can be referred to the description of the technical solution of the word processing method based on the dual task model or the training method based on the dual task model.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. A word processing method based on a double-task model is characterized by comprising the following steps:

2. The method for processing words based on a multitask model according to claim 1, wherein the step of inputting the identified pruning encoding vector into a coreference resolution model for processing, and determining coreference relations among words in the candidate entity segments comprises:

3. The dual task model-based word processing method of claim 1, wherein the entity recognition model and the coreference resolution model share a feed-forward neural network for scoring;

4. The dual task model-based word processing method of claim 2, wherein the coreference resolution model and the entity recognition model share a feed-forward neural network for scoring;

5. The dual task model-based word processing method of claim 4, wherein generating a second relational pair code vector based on the coreference resolution code vector comprises:

6. The dual task model-based word processing method of claim 5, wherein obtaining a second initial relationship pair code vector based on coreference resolution code vectors and classification labels of the coreference resolution code vectors comprises:

7. The dual task model-based word processing method of claim 5, wherein obtaining a second initial relationship pair code vector based on coreference resolution code vectors and classification labels of the coreference resolution code vectors comprises:

8. The dual task model-based word processing method of claim 7, wherein determining semantic vectors between the any two coreference resolution coding vectors comprises:

9. The word processing method based on the multitask model according to claim 6 or 7, wherein the classifying and predicting the coding vector with respect to the second initial relationship, pruning the coding vector with respect to the second initial relationship according to a preset proportion based on a result of the classifying and predicting to obtain a coding vector with respect to the second relationship, comprises:

10. The word processing method based on the multitask model according to claim 2, wherein the coreference resolution processing is performed on the coding vectors of the second relation through the coreference resolution model to obtain a coreference resolution result, and the method comprises the following steps:

11. A training method of a double-task model is characterized by comprising the following steps:

12. The method for training the multitask model according to claim 11, wherein the step of inputting the sample recognition pruning coding vectors into the coreference resolution model for processing to obtain a second sample relation pair coding vector comprises the following steps:

13. The method for training the dual task model according to claim 12, wherein determining the loss values of the entity recognition model and the coreference resolution model for the code vector based on the sample recognition pruning code vector and the second sample relationship, respectively, comprises:

14. A word processing apparatus based on a dual task model, comprising:

15. A training apparatus for a multitask model, comprising:

16. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-13 when executing the instructions.

17. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 13.