CN110941700B

CN110941700B - Multi-task joint learning-based argument mining system and working method thereof

Info

Publication number: CN110941700B
Application number: CN201911161645.2A
Authority: CN
Inventors: 廖祥文; 倪继昌; 叶锴; 张铭洲; 苏锦河
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2022-08-09
Anticipated expiration: 2039-11-22
Also published as: CN110941700A

Abstract

The invention relates to a multi-task joint learning-based argument mining system and a working method thereof, wherein the argument mining system comprises a data preprocessing module, a data preprocessing module and a data processing module, wherein the data preprocessing module is used for preprocessing data; the text embedding module extracts feature representation from the word level and the character level by a CNN network and combines the specific features of the task as vector input of the next module; the joint learning module adopts a hard sharing mechanism of parameters in multi-task learning, a plurality of tasks share a hidden layer of a model, the training and learning of the model are carried out in parallel, and in addition, context information of a stack type bidirectional long-time and short-time memory neural network (LSTM) learning text is used for completing a sequence labeling task; and the argument mining tag prediction output module is used for finishing the argument mining argument type prediction work and outputting the argument types in the text. The invention can learn high-quality text features from the debate text data and finally detect the type of the argument contained in the text.

Description

Multi-task joint learning-based argument mining system and working method thereof

Technical Field

The invention relates to the field of natural language processing, in particular to a multi-task joint learning-based argument mining system and a working method thereof.

Background

Currently, there are many methods that can be applied in the discussion mining task. In early work, most of research only starts from a single subtask which is discussed and mined, a modeling solution task is carried out aiming at the single subtask, and correlation information among the three subtasks is ignored, so that the system performance is low.

In addition, part of the work jointly models the three subtasks by adopting a pipeline model so as to join the connection of the three subtasks. The error of the point type identification of the pipeline method can influence the extraction error of the point relation, and the problem of error propagation exists. In addition, the method pairwise pairs the identified points and classifies the point relations to generate redundant information of the point relation pairs.

Therefore, a labeling model of Bi-LSTM-CRF is adopted in work, the three subtasks are spliced into a whole, and the overall distribution of the neural network prediction label is trained. However, the method cannot fully consider the diversity of label samples of the debate mining corpus, only can solve the problems of sufficient data quantity and complete labels of some data sets and data sparsity, so that the generalization performance of the model is poor, and the method is difficult to apply to new data sets. In the face of the above problems, people hope to find a more detailed method which can fully consider the problem of inconsistent labels of different data sets, further solve the problem of data sparsity, and improve the generalization capability of the theory mining model.

Disclosure of Invention

In view of this, the present invention provides a argument mining system based on multi-task joint learning and a working method thereof, which can learn high-quality text vector characteristics from existing argument text data, improve generalization capability, apply to a new predicted data set, and finally detect an argument structure of a text.

The invention is realized by adopting the following scheme: the system comprises a data preprocessing module, a text embedding module, a joint learning module and a debate mining tag output module; the data preprocessing module is used for preprocessing data; the text embedding module is used for extracting the expression of semantics, part of speech and argument types in the input text at a word level and a character level respectively, and expressing the characteristics by vectors as the vector input of the joint learning module; the joint learning module is used for adopting a hard sharing mechanism of parameters in multi-task learning, regarding different argument mining data sets as different tasks, enabling the tasks to share a hidden layer of the joint learning module, performing training learning in parallel, and learning context information of a text by using a stack type bidirectional long-time and short-time memory neural network to complete sequence labeling tasks; the argument mining tag output module is used for completing argument mining argument type prediction work and outputting the type of arguments in a text.

Further, the invention provides a working method of a debate mining system based on multi-task joint learning, which comprises the following steps:

step S1: inputting a debate document of a debate structure to be mined, namely the debate type, into a data preprocessing module for preprocessing to obtain a preprocessed text, and inputting the preprocessed text into a text embedding module;

step S2: the text embedding module adopts a word-level CNN and a character-level CNNs-highway to respectively extract word-level and character-level characteristics of the preprocessed text, wherein the characteristics comprise digital character proportion information and character size writing information of words, and are expressed by vectors and input into the joint learning module;

step S3: the joint learning module adopts a multi-task hard parameter sharing mechanism according to the text data output by the text embedding module, treats each different data set as different argument mining tasks, uses a bidirectional long-time memory neural network learning hidden layer to express, trains parameters in parallel, learns the context information of the text data, and is used for completing the detection of the point boundary and the subsequent prediction of the type of the point component;

step S4: and the argument mining tag output module outputs the obtained type tag of the argument.

Further, the data preprocessing module for preprocessing the data comprises the following steps:

step SA: removing web page links, special characters and punctuation marks in the document;

step SB: performing word segmentation processing on the document;

step SC: carrying out stem reduction processing on English data;

step SD: and respectively filtering out stop words contained in the data set according to the Chinese and English stop word list.

Further, the step S2 specifically includes the following steps:

step S21: let the input of CNN be a preprocessed text with length n as x ═ x ₁ ,x ₂ ,x ₃ ,...x _n ]According to the definition of word vector, each word is expressed as d-dimensional vector;

step S22: invoking a narrow convolution and a convolution kernel of window size k on the sequence x to

And

as a padding vector, so that the sequence remains fixed length;

step S23: according to the following steps:

calculating a sequence C ═ C of the text x after CNN extraction features ₁ ,c ₂ ,c ₃ ,...c _n ](ii) a Wherein f represents a non-linear activation function, x _i:j Represent the i to j words, b represents the deviation;

step S23: obtaining local context feature representation in a theory mining text by adopting convolution kernel windows with four different lengths of k being 1,2,3 and 4;

step S24: let the word K ∈ V be composed of the character sequence [ c [ ] ₁ ,...,c _l ]Composition, where l represents the length of the word K; the character-level representation of the word K is represented by a matrix C ^K ∈R ^d×l Given, where the jth column represents the character vector c ^j (ii) a Thus, at C ^k A narrow convolution and a filter of width w H ═ R are called between ^d×w ；

Step S25: then adding an offset variable bias and applying a nonlinear function to obtain a feature mapping f ^k ∈R ^l-w+1 (ii) a For a particular ith feature map, there are: f. of ^k [i]＝tanh(〈C ^k [*,i:i+w+1],H>+ b) wherein C ^k [*,i:i+w+1]C from the ith column to the (i + w +1) th column ^k Symbol < A, B>＝Tr(AB ^T ) Represents Frobenius inner product;

step S26: introducing a maximization function:

this is applied to the word as the feature corresponding to filter H.

Step S27: inputting the CNN extraction result into a high-speed neural network layer, filtering valuable features through a conversion gate of the high-speed neural network, and inputting the obtained result into a joint learning module;

where g denotes a nonlinear function, and t ═ σ (W) _T y+b _T ) Called the conversion gate, and (1-t) called the carry gate.

Further, the step S3 specifically includes the following steps:

step S31, the joint learning module inputs the shared characteristic parameter vector output by the text embedding module, including the word V ^w Character V ^c Part of speech V ^p The system is used for learning the context information of the text and identifying a point component entity;

step S32, constructing a bidirectional stack type LSTM, calculating to obtain sentence vectors, wherein each LSTM unit consists of an n-dimensional vector at the t-th word and comprises the following steps: an input gate i _t Forget toDoor f _t An output gate o _t A memory cell c _t And a hidden unit h _t Each LSTM unit receives an n-dimensional vector input, the previous hidden state being h _t-1 The previous memory cell is c _t-1 ；

The parameters are updated according to the following formula:

i _t ＝σ(W ⁽ⁱ⁾ x _t +I ⁽ⁱ⁾ h _t-1 +b ⁱ )

f _t ＝σ(W ^(f) x _t +I ^(f) h _t-1 +b ^f )

o _t ＝σ(W ^(o) x _t +I ^(o) h _t-1 +b ^o )

u _t ＝tanh(W ^(u) x _t +I ^(u) h _t-1 +b ^u )

c _t ＝i _t ⊙u _t +f _t ⊙c _t-1

h _t ＝o _t ⊙tanh(c _t )

where σ denotes a logistic activation function, <' > denotes a dot product of vectors, W and I denote weight matrices, b denotes a disparity vector, and the input of the LSTM cell on the t-th word is the word V of the t-th word _t ^w Character V _t ^c And part of speech V ^e Of a vector of linkage

Two reverse LSTM hidden units

And

is connected into

As an output;

step S33, marking each word of the input sentence with BIO label, then marking its argument type to form BIO-argument type form;

step S34, constructing a two-layer neural network consisting of DenseNet and Softmax:

wherein W is a weight matrix and b is a deviation vector;

step S35, adding S _t And the vector e of the previous word _i-1 As input, the input is input into a Softmax layer through a layer of neural network to obtain a type label of the entity of the point of interest component, and the obtained output is mapped into a vector e _i 。

Further, the step S4 specifically includes the following steps:

step S41: adopting a Conditional Random Field (CRF) to directly separate the association of an output level and learn the relevant information of the context;

step S42: during the training process, the loss function is given as follows: loss (y, x) ═ E (y, x) + log (z)

Where y represents the true tag sequence, x is the input sequence, z is the normalization constant, and E (y, x) refers to the path energy of the input x and tag y, and is expressed as

h(y _i (ii) a x) represents the scoring function of the label, g (y) _k ,y _k+1 ) Is a parameter matrix to be trained;

step S43: the predicted dataset tag result is output according to the score of step S42.

Compared with the prior art, the invention has the following beneficial effects:

the invention can combine the multi-task learning method to do the debate mining from the debate text with high quality.

Drawings

Fig. 1 is a schematic configuration diagram of a system according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the embodiment provides a argument mining system based on multitask joint learning, which includes a data preprocessing module, a text embedding module, a joint learning module, and an argument mining tag output module; the data preprocessing module is used for preprocessing data; the text embedding module is used for extracting the expression of semantics, part of speech and argument types in the input text at a word level and a character level respectively, and expressing the characteristics by vectors as the vector input of the joint learning module; the joint learning module is used for adopting a hard sharing mechanism of parameters in multi-task learning, regarding different argument mining data sets as different tasks, enabling the tasks to share a hidden layer of the joint learning module, performing training learning in parallel, and learning context information of a text by using a stack type bidirectional long-time and short-time memory neural network to complete sequence labeling tasks; the argument mining tag output module is used for completing argument mining argument type prediction work and outputting the type of arguments in a text.

Preferably, the embodiment provides a working method of a argument mining system based on multitask joint learning, comprising the following steps:

step S3: the joint learning module adopts a multi-task hard parameter sharing mechanism according to the text data output by the text embedding module, treats each different data set as different argument mining tasks, uses bidirectional long-time memory neural network learning hidden layer representation, and performs training parameters in parallel; in addition, context information of the learning text data of the bidirectional long-time and short-time memory neural network is used in the module, and the context information is added into the bidirectional long-time and short-time memory neural network to finish the detection of the argument boundary and the subsequent prediction of the argument component type;

Preferably, in this embodiment, the text embedding module mainly includes two parts, i.e., word-level CNN and character-level CNNs-highway.

In this embodiment, the input online forensic document contains rich information but is also accompanied by some noise. Therefore, the data is preprocessed firstly, and the preprocessing of the data by the data preprocessing module comprises the following steps:

step SB: performing word segmentation processing on the document;

step SC: carrying out stem reduction processing on English data;

In this embodiment, the text embedding module uses a neural network to extract the expressions of semantics, parts of speech, and types of arguments in the input text at the word level and the character level, which are specifically divided into two parts as follows:

CNN-based word-level representation:

And

as a padding vector, so that the sequence remains fixed length;

step S23: according to the following steps:

character level representation based on CNNs-Highway:

the embodiment expands the application of the CNN model to solve the sequence PTB. A plurality of filters of different widths are used to obtain the characteristics of the word k. For a given word k we stack its characters to form a matrix C _k To C _k Convolution operations with multiple filters followed by maximum pooling to obtain a fixed dimensional representation for word k as high speedInput of the neural network.

Step S24: let C be the character set, d represent the dimension of the character vector, Q ∈ R ^d×|C| Representing a matrix of character vectors. Let the word K ∈ V be composed of the character sequence [ c [ ] ₁ ,...,c _l ]Composition, where l represents the length of the word K; the character-level representation of the word K is represented by a matrix C ^K ∈R ^d×l Given, where the jth column represents the character vector c ^j (ii) a Thus, at C ^k A narrow convolution and a filter of width w H ═ R are called between ^d×w ；

Step S25: then adding an offset variable bias and applying a nonlinear function to obtain a feature mapping f ^k ∈R ^l-w+1 (ii) a For a particular ith feature map, there are: f. of ^k [i]＝tanh(<C ^k [*,i:i+w+1],H>+ b) wherein C ^k [*,i:i+w+1]C from the ith column to the (i + w +1) th column ^k Symbol of<A,B>＝Tr(AB ^T ) Represents Frobenius inner product;

step S26: introducing a maximization function:

this is applied to the word as the feature corresponding to filter H.

In this embodiment, the joint learning module adopts a hard sharing mechanism of parameters in multi-task learning, regards different argument mining data sets as different tasks, and when a model is trained, the model is not trained for each task independently, but a plurality of tasks share a hidden layer of the model, so that the risk of overfitting can be effectively reduced. In addition, context information of a stack type bidirectional long-time memory neural network learning text is added into the model to complete a sequence labeling task.

The step S3 specifically includes the following steps:

step S31, the joint learning module inputs the shared characteristic parameter vector output by the text embedding module, including the word V ^w Character V ^c Part of speech V ^p Learning context information of the text and identifying a point of interest component entity;

step S32, the input of the stack LSTM neural network is a shared characteristic parameter vector output by a text embedding layer to learn the context information of the text and identify the entity of the point of discourse component, the two-way stack LSTM is constructed, the sentence vector is obtained by calculation, each LSTM unit consists of an n-dimensional vector at the t-th word, and the method comprises the following steps: an input gate (i nput gate) i _t A forgetting gate (forget gate) f _t An output gate o _t A memory cell (memory cell l) c _t And a hidden unit h _t Each LSTM unit receives an n-dimensional vector input, the previous hidden state being h _t-1 The previous memory cell is c _t-1 ；

The parameters are updated according to the following formula:

i _t ＝σ(W ⁽ⁱ⁾ x _t +I ⁽ⁱ⁾ h _t-1 +b ⁱ )

f _t ＝σ(W ^(f) x _t +I ^(f) h _t-1 +b ^f )

o _t ＝σ(W ^(o) x _t +I ^(o) h _t-1 +b ^o )

u _t ＝tanh(W ^(u) x _t +I ^(u) h _t-1 +b ^u )

c _t ＝i _t ⊙u _t +f _t ⊙c _t-1

h _t ＝o _t ⊙tanh(c _t )

where σ represents the logistic activation functionAnd, indicates a dot product of vectors, W and I indicate weight matrices, b indicates a deviation vector, and the input of the LSTM cell on the t-th word is the word V of the t-th word _t ^w Character V _t ^c And part of speech V ^e Of a vector of linkage

Two reverse LSTM hidden units

And

is connected into

As an output;

step S34, regarding the detection of the boundary of the task one and the identification of the type of the task two as the sequence labeling problem, firstly labeling each word of the input sentence with BIO label, then labeling the type of the word, that is, each word forms the form of "BIO-type of the word", such labeling method is the label of the task one and the task two. The two tasks are completed at the top layer of the sequence coding layer, and a two-layer neural network consisting of DenseNet and Softmax is constructed:

wherein W is a weight matrix and b is a deviation vector;

step S35, adding S _t And the vector e of the previous word _i-1 As an input to the process, the process may,then the data are input into a Softmax layer through a layer of neural network to obtain the type label of the argument component entity, and the obtained output is mapped into a vector e _i 。

The stack type LSTM (StackedLSTM) can learn the feature representation of a higher time domain, and by fusing the hidden layer state of the current LSTM and the hidden layer state of the last LSTM, the context information in the text can be fully learned, and the deeper features can be obtained. In the training process, the stack LSTM uses hidden state and capture argument to mine the information of 'past' and 'future' in the text by forward or backward propagation, and finally combines the two states into the output of the network.

In this embodiment, as a final predicted tag output module of the system, a Conditional Random Field (CRF) is adopted in this embodiment, and compared with a common softmax analysis and classification, the Conditional Random Field (CRF) can directly separate the association of the output level, so that the relevant information of the context is easier to learn, and the effect is more excellent.

The step S4 specifically includes the following steps:

step S43: the predicted dataset tag result is output according to the score of step S42. The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A argumentation mining system based on multitask joint learning is characterized in that: the system comprises a data preprocessing module, a text embedding module, a joint learning module and a debate mining tag output module; the data preprocessing module is used for preprocessing data; the text embedding module is used for extracting the expression of semantics, part of speech and argument types in the input text at a word level and a character level respectively, and expressing the characteristics by vectors as the vector input of the joint learning module; the joint learning module is used for adopting a hard sharing mechanism of parameters in multi-task learning, regarding different argument mining data sets as different tasks, enabling the tasks to share a hidden layer of the joint learning module, performing training learning in parallel, and learning context information of a text by using a stack type bidirectional long-time and short-time memory neural network to complete sequence labeling tasks; the argument mining tag output module is used for completing argument mining argument type prediction work and outputting the type of arguments in a text;

the joint learning module specifically comprises a shared characteristic parameter vector which is input by the joint learning module and output by the text embedding module, and comprises a word V ^w Character V ^c Part of speech V ^p Learning context information of the text and identifying a point of interest component entity;

constructing a bidirectional stack type LSTM, calculating to obtain sentence vectors, wherein each LSTM unit consists of an n-dimensional vector at the t-th word, and the method comprises the following steps: an input gate i _t A forgetting door f _t An output gate o _t A memory cell c _t And a hidden unit h _t Each LSTM unit receives an n-dimensional vector input, the previous hidden state being h _t-1 The previous memory cell is c _t-1 ；

The parameters are updated according to the following formula:

i _t ＝σ(W ⁽ⁱ⁾ x _t +I ⁽ⁱ⁾ h _t-1 +b ⁱ )

f _t ＝σ(W ^(f) x _t +I ^(f) h _t-1 +b ^f )

o _t ＝σ(W ^(o) x _t +I ^(o) h _t-1 +b ^o )

u _t ＝tanh(W ^(u) x _t +I ^(u) h _t-1 +b ^u )

c _t ＝i _t ⊙u _t +f _t ⊙c _t-1

h _t ＝o _t ⊙tanh(c _t )

Two reverse LSTM hidden units

And

is connected into

As an output;

marking each word of an input sentence with a BIO label, and then marking the argument type of the word to form a BIO-argument type form;

constructing a two-layer neural network consisting of DenseNet and Softmax:

wherein W is a weight matrix and b is a deviation vector;

will s _t And the vector e of the previous word _i-1 As input, the input is input into a Softmax layer through a layer of neural network to obtain a type label of the entity of the point of interest component, and the obtained output is mapped into a vector e _i ；

The text embedding module comprises:

let the input of CNN be a preprocessed text with length n as x ═ x ₁ ,x ₂ ,x ₃ ,...x _n ]According to the definition of word vector, each word is expressed as d-dimensional vector;

invoking a narrow convolution and a convolution kernel of window size k on the sequence x to

And

as a padding vector, so that the sequence remains fixed length;

according to the following steps:

obtaining local context feature representation in a theory mining text by adopting convolution kernel windows with four different lengths of k being 1,2,3 and 4;

let the word K ∈ V be composed of the character sequence [ c [ ] ₁ ,...,c _l ]Composition, where l represents the length of the word K; the character-level representation of the word K is represented by a matrix C ^K ∈R ^d×l Given, where the jth column represents the character vector c ^j (ii) a Thus at C ^k A narrow convolution and a filter of width w H ═ R are called between ^d×w ；

Then adding an offset variable bias and applying a nonlinear function to obtain a feature mapping f ^k ∈R ^l-w+1 (ii) a For a particular ith feature map, there are: f. of ^k [i]＝tanh(<C ^k [*,i:i+w+1],H>+ b) wherein C ^k [*,i:i+w+1]C from the ith column to the (i + w +1) th column ^k Symbol of<A,B>＝Tr(AB ^T ) Represents Frobenius inner product;

introducing a maximization function:

applying the characteristic as the characteristic corresponding to the filter H to the word;

inputting the CNN extraction result into a high-speed neural network layer, filtering valuable features through a conversion gate of the high-speed neural network, and inputting the obtained result into a joint learning module;

where g denotes a nonlinear function, and t ═ σ (W) _T y+b _T ) Called a conversion gate, and (1-t) called a carry gate;

the argument mining tag output module comprises:

adopting a conditional random field to directly separate the association of the output level and learn the relevant information of the context;

during the training process, the loss function is given as follows: loss (y, x) ═ E (y, x) + log (z)

and outputting the predicted data set label result according to the score.

2. A method of operating a multi-task joint learning based argument mining system of claim 1, characterized by: the method comprises the following steps:

step S2: the text embedding module adopts a word-level CNN and a character-level CNNs-highways to respectively extract word-level and character-level characteristics of the preprocessed text, wherein the characteristics comprise digital character proportion information and character size writing information of words, and are expressed by vectors and input into the joint learning module;

step S3: the joint learning module adopts a multi-task hard parameter sharing mechanism according to the text data output by the text embedding module, treats each different data set as different argument mining tasks, uses a bidirectional long-time memory neural network learning hidden layer to express, trains parameters and learns the context information of the text data in parallel so as to finish the detection of the point boundary and the subsequent prediction of the type of the point component;

3. The working method of the argument mining system based on multitask joint learning according to claim 2, characterized in that: the data preprocessing module for preprocessing the data comprises the following steps:

step SB: performing word segmentation processing on the document;

step SC: carrying out stem reduction processing on English data;