CN110941700B - Multi-task joint learning-based argument mining system and working method thereof - Google Patents

Multi-task joint learning-based argument mining system and working method thereof Download PDF

Info

Publication number
CN110941700B
CN110941700B CN201911161645.2A CN201911161645A CN110941700B CN 110941700 B CN110941700 B CN 110941700B CN 201911161645 A CN201911161645 A CN 201911161645A CN 110941700 B CN110941700 B CN 110941700B
Authority
CN
China
Prior art keywords
text
word
module
argument
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911161645.2A
Other languages
Chinese (zh)
Other versions
CN110941700A (en
Inventor
廖祥文
倪继昌
叶锴
张铭洲
苏锦河
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201911161645.2A priority Critical patent/CN110941700B/en
Publication of CN110941700A publication Critical patent/CN110941700A/en
Application granted granted Critical
Publication of CN110941700B publication Critical patent/CN110941700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a multi-task joint learning-based argument mining system and a working method thereof, wherein the argument mining system comprises a data preprocessing module, a data preprocessing module and a data processing module, wherein the data preprocessing module is used for preprocessing data; the text embedding module extracts feature representation from the word level and the character level by a CNN network and combines the specific features of the task as vector input of the next module; the joint learning module adopts a hard sharing mechanism of parameters in multi-task learning, a plurality of tasks share a hidden layer of a model, the training and learning of the model are carried out in parallel, and in addition, context information of a stack type bidirectional long-time and short-time memory neural network (LSTM) learning text is used for completing a sequence labeling task; and the argument mining tag prediction output module is used for finishing the argument mining argument type prediction work and outputting the argument types in the text. The invention can learn high-quality text features from the debate text data and finally detect the type of the argument contained in the text.

Description

Multi-task joint learning-based argument mining system and working method thereof
Technical Field
The invention relates to the field of natural language processing, in particular to a multi-task joint learning-based argument mining system and a working method thereof.
Background
Currently, there are many methods that can be applied in the discussion mining task. In early work, most of research only starts from a single subtask which is discussed and mined, a modeling solution task is carried out aiming at the single subtask, and correlation information among the three subtasks is ignored, so that the system performance is low.
In addition, part of the work jointly models the three subtasks by adopting a pipeline model so as to join the connection of the three subtasks. The error of the point type identification of the pipeline method can influence the extraction error of the point relation, and the problem of error propagation exists. In addition, the method pairwise pairs the identified points and classifies the point relations to generate redundant information of the point relation pairs.
Therefore, a labeling model of Bi-LSTM-CRF is adopted in work, the three subtasks are spliced into a whole, and the overall distribution of the neural network prediction label is trained. However, the method cannot fully consider the diversity of label samples of the debate mining corpus, only can solve the problems of sufficient data quantity and complete labels of some data sets and data sparsity, so that the generalization performance of the model is poor, and the method is difficult to apply to new data sets. In the face of the above problems, people hope to find a more detailed method which can fully consider the problem of inconsistent labels of different data sets, further solve the problem of data sparsity, and improve the generalization capability of the theory mining model.
Disclosure of Invention
In view of this, the present invention provides a argument mining system based on multi-task joint learning and a working method thereof, which can learn high-quality text vector characteristics from existing argument text data, improve generalization capability, apply to a new predicted data set, and finally detect an argument structure of a text.
The invention is realized by adopting the following scheme: the system comprises a data preprocessing module, a text embedding module, a joint learning module and a debate mining tag output module; the data preprocessing module is used for preprocessing data; the text embedding module is used for extracting the expression of semantics, part of speech and argument types in the input text at a word level and a character level respectively, and expressing the characteristics by vectors as the vector input of the joint learning module; the joint learning module is used for adopting a hard sharing mechanism of parameters in multi-task learning, regarding different argument mining data sets as different tasks, enabling the tasks to share a hidden layer of the joint learning module, performing training learning in parallel, and learning context information of a text by using a stack type bidirectional long-time and short-time memory neural network to complete sequence labeling tasks; the argument mining tag output module is used for completing argument mining argument type prediction work and outputting the type of arguments in a text.
Further, the invention provides a working method of a debate mining system based on multi-task joint learning, which comprises the following steps:
step S1: inputting a debate document of a debate structure to be mined, namely the debate type, into a data preprocessing module for preprocessing to obtain a preprocessed text, and inputting the preprocessed text into a text embedding module;
step S2: the text embedding module adopts a word-level CNN and a character-level CNNs-highway to respectively extract word-level and character-level characteristics of the preprocessed text, wherein the characteristics comprise digital character proportion information and character size writing information of words, and are expressed by vectors and input into the joint learning module;
step S3: the joint learning module adopts a multi-task hard parameter sharing mechanism according to the text data output by the text embedding module, treats each different data set as different argument mining tasks, uses a bidirectional long-time memory neural network learning hidden layer to express, trains parameters in parallel, learns the context information of the text data, and is used for completing the detection of the point boundary and the subsequent prediction of the type of the point component;
step S4: and the argument mining tag output module outputs the obtained type tag of the argument.
Further, the data preprocessing module for preprocessing the data comprises the following steps:
step SA: removing web page links, special characters and punctuation marks in the document;
step SB: performing word segmentation processing on the document;
step SC: carrying out stem reduction processing on English data;
step SD: and respectively filtering out stop words contained in the data set according to the Chinese and English stop word list.
Further, the step S2 specifically includes the following steps:
step S21: let the input of CNN be a preprocessed text with length n as x ═ x 1 ,x 2 ,x 3 ,...x n ]According to the definition of word vector, each word is expressed as d-dimensional vector;
step S22: invoking a narrow convolution and a convolution kernel of window size k on the sequence x to
Figure BDA0002285495810000031
And
Figure BDA0002285495810000032
as a padding vector, so that the sequence remains fixed length;
step S23: according to the following steps:
Figure BDA0002285495810000041
calculating a sequence C ═ C of the text x after CNN extraction features 1 ,c 2 ,c 3 ,...c n ](ii) a Wherein f represents a non-linear activation function, x i:j Represent the i to j words, b represents the deviation;
step S23: obtaining local context feature representation in a theory mining text by adopting convolution kernel windows with four different lengths of k being 1,2,3 and 4;
step S24: let the word K ∈ V be composed of the character sequence [ c [ ] 1 ,...,c l ]Composition, where l represents the length of the word K; the character-level representation of the word K is represented by a matrix C K ∈R d×l Given, where the jth column represents the character vector c j (ii) a Thus, at C k A narrow convolution and a filter of width w H ═ R are called between d×w
Step S25: then adding an offset variable bias and applying a nonlinear function to obtain a feature mapping f k ∈R l-w+1 (ii) a For a particular ith feature map, there are: f. of k [i]=tanh(〈C k [*,i:i+w+1],H>+ b) wherein C k [*,i:i+w+1]C from the ith column to the (i + w +1) th column k Symbol < A, B>=Tr(AB T ) Represents Frobenius inner product;
step S26: introducing a maximization function:
Figure BDA0002285495810000042
this is applied to the word as the feature corresponding to filter H.
Step S27: inputting the CNN extraction result into a high-speed neural network layer, filtering valuable features through a conversion gate of the high-speed neural network, and inputting the obtained result into a joint learning module;
Figure BDA0002285495810000043
where g denotes a nonlinear function, and t ═ σ (W) T y+b T ) Called the conversion gate, and (1-t) called the carry gate.
Further, the step S3 specifically includes the following steps:
step S31, the joint learning module inputs the shared characteristic parameter vector output by the text embedding module, including the word V w Character V c Part of speech V p The system is used for learning the context information of the text and identifying a point component entity;
step S32, constructing a bidirectional stack type LSTM, calculating to obtain sentence vectors, wherein each LSTM unit consists of an n-dimensional vector at the t-th word and comprises the following steps: an input gate i t Forget toDoor f t An output gate o t A memory cell c t And a hidden unit h t Each LSTM unit receives an n-dimensional vector input, the previous hidden state being h t-1 The previous memory cell is c t-1
The parameters are updated according to the following formula:
i t =σ(W (i) x t +I (i) h t-1 +b i )
f t =σ(W (f) x t +I (f) h t-1 +b f )
o t =σ(W (o) x t +I (o) h t-1 +b o )
u t =tanh(W (u) x t +I (u) h t-1 +b u )
c t =i t ⊙u t +f t ⊙c t-1
h t =o t ⊙tanh(c t )
where σ denotes a logistic activation function, <' > denotes a dot product of vectors, W and I denote weight matrices, b denotes a disparity vector, and the input of the LSTM cell on the t-th word is the word V of the t-th word t w Character V t c And part of speech V e Of a vector of linkage
Figure BDA0002285495810000051
Two reverse LSTM hidden units
Figure BDA0002285495810000052
And
Figure BDA0002285495810000053
is connected into
Figure BDA0002285495810000054
As an output;
step S33, marking each word of the input sentence with BIO label, then marking its argument type to form BIO-argument type form;
step S34, constructing a two-layer neural network consisting of DenseNet and Softmax:
Figure BDA0002285495810000061
Figure BDA0002285495810000062
wherein W is a weight matrix and b is a deviation vector;
step S35, adding S t And the vector e of the previous word i-1 As input, the input is input into a Softmax layer through a layer of neural network to obtain a type label of the entity of the point of interest component, and the obtained output is mapped into a vector e i
Further, the step S4 specifically includes the following steps:
step S41: adopting a Conditional Random Field (CRF) to directly separate the association of an output level and learn the relevant information of the context;
step S42: during the training process, the loss function is given as follows: loss (y, x) ═ E (y, x) + log (z)
Where y represents the true tag sequence, x is the input sequence, z is the normalization constant, and E (y, x) refers to the path energy of the input x and tag y, and is expressed as
Figure BDA0002285495810000063
h(y i (ii) a x) represents the scoring function of the label, g (y) k ,y k+1 ) Is a parameter matrix to be trained;
step S43: the predicted dataset tag result is output according to the score of step S42.
Compared with the prior art, the invention has the following beneficial effects:
the invention can combine the multi-task learning method to do the debate mining from the debate text with high quality.
Drawings
Fig. 1 is a schematic configuration diagram of a system according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the embodiment provides a argument mining system based on multitask joint learning, which includes a data preprocessing module, a text embedding module, a joint learning module, and an argument mining tag output module; the data preprocessing module is used for preprocessing data; the text embedding module is used for extracting the expression of semantics, part of speech and argument types in the input text at a word level and a character level respectively, and expressing the characteristics by vectors as the vector input of the joint learning module; the joint learning module is used for adopting a hard sharing mechanism of parameters in multi-task learning, regarding different argument mining data sets as different tasks, enabling the tasks to share a hidden layer of the joint learning module, performing training learning in parallel, and learning context information of a text by using a stack type bidirectional long-time and short-time memory neural network to complete sequence labeling tasks; the argument mining tag output module is used for completing argument mining argument type prediction work and outputting the type of arguments in a text.
Preferably, the embodiment provides a working method of a argument mining system based on multitask joint learning, comprising the following steps:
step S1: inputting a debate document of a debate structure to be mined, namely the debate type, into a data preprocessing module for preprocessing to obtain a preprocessed text, and inputting the preprocessed text into a text embedding module;
step S2: the text embedding module adopts a word-level CNN and a character-level CNNs-highway to respectively extract word-level and character-level characteristics of the preprocessed text, wherein the characteristics comprise digital character proportion information and character size writing information of words, and are expressed by vectors and input into the joint learning module;
step S3: the joint learning module adopts a multi-task hard parameter sharing mechanism according to the text data output by the text embedding module, treats each different data set as different argument mining tasks, uses bidirectional long-time memory neural network learning hidden layer representation, and performs training parameters in parallel; in addition, context information of the learning text data of the bidirectional long-time and short-time memory neural network is used in the module, and the context information is added into the bidirectional long-time and short-time memory neural network to finish the detection of the argument boundary and the subsequent prediction of the argument component type;
step S4: and the argument mining tag output module outputs the obtained type tag of the argument.
Preferably, in this embodiment, the text embedding module mainly includes two parts, i.e., word-level CNN and character-level CNNs-highway.
In this embodiment, the input online forensic document contains rich information but is also accompanied by some noise. Therefore, the data is preprocessed firstly, and the preprocessing of the data by the data preprocessing module comprises the following steps:
step SA: removing web page links, special characters and punctuation marks in the document;
step SB: performing word segmentation processing on the document;
step SC: carrying out stem reduction processing on English data;
step SD: and respectively filtering out stop words contained in the data set according to the Chinese and English stop word list.
In this embodiment, the text embedding module uses a neural network to extract the expressions of semantics, parts of speech, and types of arguments in the input text at the word level and the character level, which are specifically divided into two parts as follows:
CNN-based word-level representation:
step S21: let the input of CNN be a preprocessed text with length n as x ═ x 1 ,x 2 ,x 3 ,...x n ]According to the definition of word vector, each word is expressed as d-dimensional vector;
step S22: invoking a narrow convolution and a convolution kernel of window size k on the sequence x to
Figure BDA0002285495810000091
And
Figure BDA0002285495810000092
as a padding vector, so that the sequence remains fixed length;
step S23: according to the following steps:
Figure BDA0002285495810000093
calculating a sequence C ═ C of the text x after CNN extraction features 1 ,c 2 ,c 3 ,...c n ](ii) a Wherein f represents a non-linear activation function, x i:j Represent the i to j words, b represents the deviation;
step S23: obtaining local context feature representation in a theory mining text by adopting convolution kernel windows with four different lengths of k being 1,2,3 and 4;
character level representation based on CNNs-Highway:
the embodiment expands the application of the CNN model to solve the sequence PTB. A plurality of filters of different widths are used to obtain the characteristics of the word k. For a given word k we stack its characters to form a matrix C k To C k Convolution operations with multiple filters followed by maximum pooling to obtain a fixed dimensional representation for word k as high speedInput of the neural network.
Step S24: let C be the character set, d represent the dimension of the character vector, Q ∈ R d×|C| Representing a matrix of character vectors. Let the word K ∈ V be composed of the character sequence [ c [ ] 1 ,...,c l ]Composition, where l represents the length of the word K; the character-level representation of the word K is represented by a matrix C K ∈R d×l Given, where the jth column represents the character vector c j (ii) a Thus, at C k A narrow convolution and a filter of width w H ═ R are called between d×w
Step S25: then adding an offset variable bias and applying a nonlinear function to obtain a feature mapping f k ∈R l-w+1 (ii) a For a particular ith feature map, there are: f. of k [i]=tanh(<C k [*,i:i+w+1],H>+ b) wherein C k [*,i:i+w+1]C from the ith column to the (i + w +1) th column k Symbol of<A,B>=Tr(AB T ) Represents Frobenius inner product;
step S26: introducing a maximization function:
Figure BDA0002285495810000101
this is applied to the word as the feature corresponding to filter H.
Step S27: inputting the CNN extraction result into a high-speed neural network layer, filtering valuable features through a conversion gate of the high-speed neural network, and inputting the obtained result into a joint learning module;
Figure BDA0002285495810000102
where g denotes a nonlinear function, and t ═ σ (W) T y+b T ) Called the conversion gate, and (1-t) called the carry gate.
In this embodiment, the joint learning module adopts a hard sharing mechanism of parameters in multi-task learning, regards different argument mining data sets as different tasks, and when a model is trained, the model is not trained for each task independently, but a plurality of tasks share a hidden layer of the model, so that the risk of overfitting can be effectively reduced. In addition, context information of a stack type bidirectional long-time memory neural network learning text is added into the model to complete a sequence labeling task.
The step S3 specifically includes the following steps:
step S31, the joint learning module inputs the shared characteristic parameter vector output by the text embedding module, including the word V w Character V c Part of speech V p Learning context information of the text and identifying a point of interest component entity;
step S32, the input of the stack LSTM neural network is a shared characteristic parameter vector output by a text embedding layer to learn the context information of the text and identify the entity of the point of discourse component, the two-way stack LSTM is constructed, the sentence vector is obtained by calculation, each LSTM unit consists of an n-dimensional vector at the t-th word, and the method comprises the following steps: an input gate (i nput gate) i t A forgetting gate (forget gate) f t An output gate o t A memory cell (memory cell l) c t And a hidden unit h t Each LSTM unit receives an n-dimensional vector input, the previous hidden state being h t-1 The previous memory cell is c t-1
The parameters are updated according to the following formula:
i t =σ(W (i) x t +I (i) h t-1 +b i )
f t =σ(W (f) x t +I (f) h t-1 +b f )
o t =σ(W (o) x t +I (o) h t-1 +b o )
u t =tanh(W (u) x t +I (u) h t-1 +b u )
c t =i t ⊙u t +f t ⊙c t-1
h t =o t ⊙tanh(c t )
where σ represents the logistic activation functionAnd, indicates a dot product of vectors, W and I indicate weight matrices, b indicates a deviation vector, and the input of the LSTM cell on the t-th word is the word V of the t-th word t w Character V t c And part of speech V e Of a vector of linkage
Figure BDA0002285495810000121
Two reverse LSTM hidden units
Figure BDA0002285495810000122
And
Figure BDA0002285495810000123
is connected into
Figure BDA0002285495810000124
As an output;
step S33, marking each word of the input sentence with BIO label, then marking its argument type to form BIO-argument type form;
step S34, regarding the detection of the boundary of the task one and the identification of the type of the task two as the sequence labeling problem, firstly labeling each word of the input sentence with BIO label, then labeling the type of the word, that is, each word forms the form of "BIO-type of the word", such labeling method is the label of the task one and the task two. The two tasks are completed at the top layer of the sequence coding layer, and a two-layer neural network consisting of DenseNet and Softmax is constructed:
Figure BDA0002285495810000125
Figure BDA0002285495810000126
wherein W is a weight matrix and b is a deviation vector;
step S35, adding S t And the vector e of the previous word i-1 As an input to the process, the process may,then the data are input into a Softmax layer through a layer of neural network to obtain the type label of the argument component entity, and the obtained output is mapped into a vector e i
The stack type LSTM (StackedLSTM) can learn the feature representation of a higher time domain, and by fusing the hidden layer state of the current LSTM and the hidden layer state of the last LSTM, the context information in the text can be fully learned, and the deeper features can be obtained. In the training process, the stack LSTM uses hidden state and capture argument to mine the information of 'past' and 'future' in the text by forward or backward propagation, and finally combines the two states into the output of the network.
In this embodiment, as a final predicted tag output module of the system, a Conditional Random Field (CRF) is adopted in this embodiment, and compared with a common softmax analysis and classification, the Conditional Random Field (CRF) can directly separate the association of the output level, so that the relevant information of the context is easier to learn, and the effect is more excellent.
The step S4 specifically includes the following steps:
step S41: adopting a Conditional Random Field (CRF) to directly separate the association of an output level and learn the relevant information of the context;
step S42: during the training process, the loss function is given as follows: loss (y, x) ═ E (y, x) + log (z)
Where y represents the true tag sequence, x is the input sequence, z is the normalization constant, and E (y, x) refers to the path energy of the input x and tag y, and is expressed as
Figure BDA0002285495810000131
h(y i (ii) a x) represents the scoring function of the label, g (y) k ,y k+1 ) Is a parameter matrix to be trained;
step S43: the predicted dataset tag result is output according to the score of step S42. The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims (3)

1. A argumentation mining system based on multitask joint learning is characterized in that: the system comprises a data preprocessing module, a text embedding module, a joint learning module and a debate mining tag output module; the data preprocessing module is used for preprocessing data; the text embedding module is used for extracting the expression of semantics, part of speech and argument types in the input text at a word level and a character level respectively, and expressing the characteristics by vectors as the vector input of the joint learning module; the joint learning module is used for adopting a hard sharing mechanism of parameters in multi-task learning, regarding different argument mining data sets as different tasks, enabling the tasks to share a hidden layer of the joint learning module, performing training learning in parallel, and learning context information of a text by using a stack type bidirectional long-time and short-time memory neural network to complete sequence labeling tasks; the argument mining tag output module is used for completing argument mining argument type prediction work and outputting the type of arguments in a text;
the joint learning module specifically comprises a shared characteristic parameter vector which is input by the joint learning module and output by the text embedding module, and comprises a word V w Character V c Part of speech V p Learning context information of the text and identifying a point of interest component entity;
constructing a bidirectional stack type LSTM, calculating to obtain sentence vectors, wherein each LSTM unit consists of an n-dimensional vector at the t-th word, and the method comprises the following steps: an input gate i t A forgetting door f t An output gate o t A memory cell c t And a hidden unit h t Each LSTM unit receives an n-dimensional vector input, the previous hidden state being h t-1 The previous memory cell is c t-1
The parameters are updated according to the following formula:
i t =σ(W (i) x t +I (i) h t-1 +b i )
f t =σ(W (f) x t +I (f) h t-1 +b f )
o t =σ(W (o) x t +I (o) h t-1 +b o )
u t =tanh(W (u) x t +I (u) h t-1 +b u )
c t =i t ⊙u t +f t ⊙c t-1
h t =o t ⊙tanh(c t )
where σ denotes a logistic activation function, <' > denotes a dot product of vectors, W and I denote weight matrices, b denotes a disparity vector, and the input of the LSTM cell on the t-th word is the word V of the t-th word t w Character V t c And part of speech V e Of a vector of linkage
Figure FDA0003597631300000021
Two reverse LSTM hidden units
Figure FDA0003597631300000022
And
Figure FDA0003597631300000023
is connected into
Figure FDA0003597631300000024
As an output;
marking each word of an input sentence with a BIO label, and then marking the argument type of the word to form a BIO-argument type form;
constructing a two-layer neural network consisting of DenseNet and Softmax:
Figure FDA0003597631300000025
Figure FDA0003597631300000026
wherein W is a weight matrix and b is a deviation vector;
will s t And the vector e of the previous word i-1 As input, the input is input into a Softmax layer through a layer of neural network to obtain a type label of the entity of the point of interest component, and the obtained output is mapped into a vector e i
The text embedding module comprises:
let the input of CNN be a preprocessed text with length n as x ═ x 1 ,x 2 ,x 3 ,...x n ]According to the definition of word vector, each word is expressed as d-dimensional vector;
invoking a narrow convolution and a convolution kernel of window size k on the sequence x to
Figure FDA0003597631300000027
And
Figure FDA0003597631300000028
as a padding vector, so that the sequence remains fixed length;
according to the following steps:
Figure FDA0003597631300000029
calculating a sequence C ═ C of the text x after CNN extraction features 1 ,c 2 ,c 3 ,...c n ](ii) a Wherein f represents a non-linear activation function, x i:j Represent the i to j words, b represents the deviation;
obtaining local context feature representation in a theory mining text by adopting convolution kernel windows with four different lengths of k being 1,2,3 and 4;
let the word K ∈ V be composed of the character sequence [ c [ ] 1 ,...,c l ]Composition, where l represents the length of the word K; the character-level representation of the word K is represented by a matrix C K ∈R d×l Given, where the jth column represents the character vector c j (ii) a Thus at C k A narrow convolution and a filter of width w H ═ R are called between d×w
Then adding an offset variable bias and applying a nonlinear function to obtain a feature mapping f k ∈R l-w+1 (ii) a For a particular ith feature map, there are: f. of k [i]=tanh(<C k [*,i:i+w+1],H>+ b) wherein C k [*,i:i+w+1]C from the ith column to the (i + w +1) th column k Symbol of<A,B>=Tr(AB T ) Represents Frobenius inner product;
introducing a maximization function:
Figure FDA0003597631300000031
applying the characteristic as the characteristic corresponding to the filter H to the word;
inputting the CNN extraction result into a high-speed neural network layer, filtering valuable features through a conversion gate of the high-speed neural network, and inputting the obtained result into a joint learning module;
Figure FDA0003597631300000032
where g denotes a nonlinear function, and t ═ σ (W) T y+b T ) Called a conversion gate, and (1-t) called a carry gate;
the argument mining tag output module comprises:
adopting a conditional random field to directly separate the association of the output level and learn the relevant information of the context;
during the training process, the loss function is given as follows: loss (y, x) ═ E (y, x) + log (z)
Where y represents the true tag sequence, x is the input sequence, z is the normalization constant, and E (y, x) refers to the path energy of the input x and tag y, and is expressed as
Figure FDA0003597631300000041
h(y i (ii) a x) represents the scoring function of the label, g (y) k ,y k+1 ) Is a parameter matrix to be trained;
and outputting the predicted data set label result according to the score.
2. A method of operating a multi-task joint learning based argument mining system of claim 1, characterized by: the method comprises the following steps:
step S1: inputting a debate document of a debate structure to be mined, namely the debate type, into a data preprocessing module for preprocessing to obtain a preprocessed text, and inputting the preprocessed text into a text embedding module;
step S2: the text embedding module adopts a word-level CNN and a character-level CNNs-highways to respectively extract word-level and character-level characteristics of the preprocessed text, wherein the characteristics comprise digital character proportion information and character size writing information of words, and are expressed by vectors and input into the joint learning module;
step S3: the joint learning module adopts a multi-task hard parameter sharing mechanism according to the text data output by the text embedding module, treats each different data set as different argument mining tasks, uses a bidirectional long-time memory neural network learning hidden layer to express, trains parameters and learns the context information of the text data in parallel so as to finish the detection of the point boundary and the subsequent prediction of the type of the point component;
step S4: and the argument mining tag output module outputs the obtained type tag of the argument.
3. The working method of the argument mining system based on multitask joint learning according to claim 2, characterized in that: the data preprocessing module for preprocessing the data comprises the following steps:
step SA: removing web page links, special characters and punctuation marks in the document;
step SB: performing word segmentation processing on the document;
step SC: carrying out stem reduction processing on English data;
step SD: and respectively filtering out stop words contained in the data set according to the Chinese and English stop word list.
CN201911161645.2A 2019-11-22 2019-11-22 Multi-task joint learning-based argument mining system and working method thereof Active CN110941700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911161645.2A CN110941700B (en) 2019-11-22 2019-11-22 Multi-task joint learning-based argument mining system and working method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911161645.2A CN110941700B (en) 2019-11-22 2019-11-22 Multi-task joint learning-based argument mining system and working method thereof

Publications (2)

Publication Number Publication Date
CN110941700A CN110941700A (en) 2020-03-31
CN110941700B true CN110941700B (en) 2022-08-09

Family

ID=69907634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911161645.2A Active CN110941700B (en) 2019-11-22 2019-11-22 Multi-task joint learning-based argument mining system and working method thereof

Country Status (1)

Country Link
CN (1) CN110941700B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347269A (en) * 2020-11-11 2021-02-09 重庆邮电大学 Method for recognizing argument pairs based on BERT and Att-BilSTM
CN112613316B (en) * 2020-12-31 2023-06-20 北京师范大学 Method and system for generating ancient Chinese labeling model
US11527074B1 (en) * 2021-11-24 2022-12-13 Continental Automotive Technologies GmbH Systems and methods for deep multi-task learning for embedded machine vision applications

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255119A (en) * 2018-07-18 2019-01-22 五邑大学 A kind of sentence trunk analysis method and system based on the multitask deep neural network for segmenting and naming Entity recognition
CN109635109A (en) * 2018-11-28 2019-04-16 华南理工大学 Sentence classification method based on LSTM and combination part of speech and more attention mechanism
CN109710946A (en) * 2019-01-15 2019-05-03 福州大学 A kind of joint debate digging system and method based on dependence analytic tree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3537346A1 (en) * 2018-03-09 2019-09-11 Tata Consultancy Services Limited Method and system for incorporating regression into stacked auto encoder (sae)

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255119A (en) * 2018-07-18 2019-01-22 五邑大学 A kind of sentence trunk analysis method and system based on the multitask deep neural network for segmenting and naming Entity recognition
CN109635109A (en) * 2018-11-28 2019-04-16 华南理工大学 Sentence classification method based on LSTM and combination part of speech and more attention mechanism
CN109710946A (en) * 2019-01-15 2019-05-03 福州大学 A kind of joint debate digging system and method based on dependence analytic tree

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Xiuyi Fan 等."A first step towards explained activity recognition with ".《2016 IEEE International Conference on Multisensor Fusion and 》.2017, *
廖祥文 等."基于多任务迭代学习的论辩挖掘方法".《计算机学报》.2018, *
鄂海红 等."深度学习实体关系抽取研究综述".《软件学报》.2019,第30卷(第6期), *

Also Published As

Publication number Publication date
CN110941700A (en) 2020-03-31

Similar Documents

Publication Publication Date Title
Qiu et al. DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain
CN106469560B (en) Voice emotion recognition method based on unsupervised domain adaptation
CN110941700B (en) Multi-task joint learning-based argument mining system and working method thereof
CN109858041B (en) Named entity recognition method combining semi-supervised learning with user-defined dictionary
Li et al. Improving convolutional neural network for text classification by recursive data pruning
CN107729311B (en) Chinese text feature extraction method fusing text moods
Sun et al. Deep LSTM networks for online Chinese handwriting recognition
CN113344206A (en) Knowledge distillation method, device and equipment integrating channel and relation feature learning
CN113255294A (en) Named entity recognition model training method, recognition method and device
CN111222318A (en) Trigger word recognition method based on two-channel bidirectional LSTM-CRF network
CN109710946A (en) A kind of joint debate digging system and method based on dependence analytic tree
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN112287106A (en) Online comment emotion classification method based on dual-channel hybrid neural network
CN115630156A (en) Mongolian emotion analysis method and system fusing Prompt and SRU
Gajurel et al. A fine-grained visual attention approach for fingerspelling recognition in the wild
Reyes-Nava et al. Performance analysis of deep neural networks for classification of gene-expression microarrays
El-Demerdash et al. Psychological human traits detection based on universal language modeling
CN115795044A (en) Knowledge injection-based user relationship mining method and device
CN111078881A (en) Fine-grained emotion analysis method and system, electronic equipment and storage medium
CN114818718A (en) Contract text recognition method and device
Elleuch et al. The Effectiveness of Transfer Learning for Arabic Handwriting Recognition using Deep CNN.
CN113779966A (en) Mongolian emotion analysis method of bidirectional CNN-RNN depth model based on attention
CN111309849A (en) Fine-grained numerical information extraction method based on joint learning model
Nouhaila et al. Arabic sentiment analysis based on 1-D convolutional neural network
CN113761106B (en) Self-attention-strengthening bond transaction intention recognition system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant