CN110941700B - Multi-task joint learning-based argument mining system and working method thereof - Google Patents
Multi-task joint learning-based argument mining system and working method thereof Download PDFInfo
- Publication number
- CN110941700B CN110941700B CN201911161645.2A CN201911161645A CN110941700B CN 110941700 B CN110941700 B CN 110941700B CN 201911161645 A CN201911161645 A CN 201911161645A CN 110941700 B CN110941700 B CN 110941700B
- Authority
- CN
- China
- Prior art keywords
- text
- word
- module
- argument
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a multi-task joint learning-based argument mining system and a working method thereof, wherein the argument mining system comprises a data preprocessing module, a data preprocessing module and a data processing module, wherein the data preprocessing module is used for preprocessing data; the text embedding module extracts feature representation from the word level and the character level by a CNN network and combines the specific features of the task as vector input of the next module; the joint learning module adopts a hard sharing mechanism of parameters in multi-task learning, a plurality of tasks share a hidden layer of a model, the training and learning of the model are carried out in parallel, and in addition, context information of a stack type bidirectional long-time and short-time memory neural network (LSTM) learning text is used for completing a sequence labeling task; and the argument mining tag prediction output module is used for finishing the argument mining argument type prediction work and outputting the argument types in the text. The invention can learn high-quality text features from the debate text data and finally detect the type of the argument contained in the text.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a multi-task joint learning-based argument mining system and a working method thereof.
Background
Currently, there are many methods that can be applied in the discussion mining task. In early work, most of research only starts from a single subtask which is discussed and mined, a modeling solution task is carried out aiming at the single subtask, and correlation information among the three subtasks is ignored, so that the system performance is low.
In addition, part of the work jointly models the three subtasks by adopting a pipeline model so as to join the connection of the three subtasks. The error of the point type identification of the pipeline method can influence the extraction error of the point relation, and the problem of error propagation exists. In addition, the method pairwise pairs the identified points and classifies the point relations to generate redundant information of the point relation pairs.
Therefore, a labeling model of Bi-LSTM-CRF is adopted in work, the three subtasks are spliced into a whole, and the overall distribution of the neural network prediction label is trained. However, the method cannot fully consider the diversity of label samples of the debate mining corpus, only can solve the problems of sufficient data quantity and complete labels of some data sets and data sparsity, so that the generalization performance of the model is poor, and the method is difficult to apply to new data sets. In the face of the above problems, people hope to find a more detailed method which can fully consider the problem of inconsistent labels of different data sets, further solve the problem of data sparsity, and improve the generalization capability of the theory mining model.
Disclosure of Invention
In view of this, the present invention provides a argument mining system based on multi-task joint learning and a working method thereof, which can learn high-quality text vector characteristics from existing argument text data, improve generalization capability, apply to a new predicted data set, and finally detect an argument structure of a text.
The invention is realized by adopting the following scheme: the system comprises a data preprocessing module, a text embedding module, a joint learning module and a debate mining tag output module; the data preprocessing module is used for preprocessing data; the text embedding module is used for extracting the expression of semantics, part of speech and argument types in the input text at a word level and a character level respectively, and expressing the characteristics by vectors as the vector input of the joint learning module; the joint learning module is used for adopting a hard sharing mechanism of parameters in multi-task learning, regarding different argument mining data sets as different tasks, enabling the tasks to share a hidden layer of the joint learning module, performing training learning in parallel, and learning context information of a text by using a stack type bidirectional long-time and short-time memory neural network to complete sequence labeling tasks; the argument mining tag output module is used for completing argument mining argument type prediction work and outputting the type of arguments in a text.
Further, the invention provides a working method of a debate mining system based on multi-task joint learning, which comprises the following steps:
step S1: inputting a debate document of a debate structure to be mined, namely the debate type, into a data preprocessing module for preprocessing to obtain a preprocessed text, and inputting the preprocessed text into a text embedding module;
step S2: the text embedding module adopts a word-level CNN and a character-level CNNs-highway to respectively extract word-level and character-level characteristics of the preprocessed text, wherein the characteristics comprise digital character proportion information and character size writing information of words, and are expressed by vectors and input into the joint learning module;
step S3: the joint learning module adopts a multi-task hard parameter sharing mechanism according to the text data output by the text embedding module, treats each different data set as different argument mining tasks, uses a bidirectional long-time memory neural network learning hidden layer to express, trains parameters in parallel, learns the context information of the text data, and is used for completing the detection of the point boundary and the subsequent prediction of the type of the point component;
step S4: and the argument mining tag output module outputs the obtained type tag of the argument.
Further, the data preprocessing module for preprocessing the data comprises the following steps:
step SA: removing web page links, special characters and punctuation marks in the document;
step SB: performing word segmentation processing on the document;
step SC: carrying out stem reduction processing on English data;
step SD: and respectively filtering out stop words contained in the data set according to the Chinese and English stop word list.
Further, the step S2 specifically includes the following steps:
step S21: let the input of CNN be a preprocessed text with length n as x ═ x 1 ,x 2 ,x 3 ,...x n ]According to the definition of word vector, each word is expressed as d-dimensional vector;
step S22: invoking a narrow convolution and a convolution kernel of window size k on the sequence x toAndas a padding vector, so that the sequence remains fixed length;
step S23: according to the following steps:calculating a sequence C ═ C of the text x after CNN extraction features 1 ,c 2 ,c 3 ,...c n ](ii) a Wherein f represents a non-linear activation function, x i:j Represent the i to j words, b represents the deviation;
step S23: obtaining local context feature representation in a theory mining text by adopting convolution kernel windows with four different lengths of k being 1,2,3 and 4;
step S24: let the word K ∈ V be composed of the character sequence [ c [ ] 1 ,...,c l ]Composition, where l represents the length of the word K; the character-level representation of the word K is represented by a matrix C K ∈R d×l Given, where the jth column represents the character vector c j (ii) a Thus, at C k A narrow convolution and a filter of width w H ═ R are called between d×w ;
Step S25: then adding an offset variable bias and applying a nonlinear function to obtain a feature mapping f k ∈R l-w+1 (ii) a For a particular ith feature map, there are: f. of k [i]=tanh(〈C k [*,i:i+w+1],H>+ b) wherein C k [*,i:i+w+1]C from the ith column to the (i + w +1) th column k Symbol < A, B>=Tr(AB T ) Represents Frobenius inner product;
step S26: introducing a maximization function:this is applied to the word as the feature corresponding to filter H.
Step S27: inputting the CNN extraction result into a high-speed neural network layer, filtering valuable features through a conversion gate of the high-speed neural network, and inputting the obtained result into a joint learning module;
where g denotes a nonlinear function, and t ═ σ (W) T y+b T ) Called the conversion gate, and (1-t) called the carry gate.
Further, the step S3 specifically includes the following steps:
step S31, the joint learning module inputs the shared characteristic parameter vector output by the text embedding module, including the word V w Character V c Part of speech V p The system is used for learning the context information of the text and identifying a point component entity;
step S32, constructing a bidirectional stack type LSTM, calculating to obtain sentence vectors, wherein each LSTM unit consists of an n-dimensional vector at the t-th word and comprises the following steps: an input gate i t Forget toDoor f t An output gate o t A memory cell c t And a hidden unit h t Each LSTM unit receives an n-dimensional vector input, the previous hidden state being h t-1 The previous memory cell is c t-1 ;
The parameters are updated according to the following formula:
i t =σ(W (i) x t +I (i) h t-1 +b i )
f t =σ(W (f) x t +I (f) h t-1 +b f )
o t =σ(W (o) x t +I (o) h t-1 +b o )
u t =tanh(W (u) x t +I (u) h t-1 +b u )
c t =i t ⊙u t +f t ⊙c t-1
h t =o t ⊙tanh(c t )
where σ denotes a logistic activation function, <' > denotes a dot product of vectors, W and I denote weight matrices, b denotes a disparity vector, and the input of the LSTM cell on the t-th word is the word V of the t-th word t w Character V t c And part of speech V e Of a vector of linkageTwo reverse LSTM hidden unitsAndis connected intoAs an output;
step S33, marking each word of the input sentence with BIO label, then marking its argument type to form BIO-argument type form;
step S34, constructing a two-layer neural network consisting of DenseNet and Softmax:
wherein W is a weight matrix and b is a deviation vector;
step S35, adding S t And the vector e of the previous word i-1 As input, the input is input into a Softmax layer through a layer of neural network to obtain a type label of the entity of the point of interest component, and the obtained output is mapped into a vector e i 。
Further, the step S4 specifically includes the following steps:
step S41: adopting a Conditional Random Field (CRF) to directly separate the association of an output level and learn the relevant information of the context;
step S42: during the training process, the loss function is given as follows: loss (y, x) ═ E (y, x) + log (z)
Where y represents the true tag sequence, x is the input sequence, z is the normalization constant, and E (y, x) refers to the path energy of the input x and tag y, and is expressed ash(y i (ii) a x) represents the scoring function of the label, g (y) k ,y k+1 ) Is a parameter matrix to be trained;
step S43: the predicted dataset tag result is output according to the score of step S42.
Compared with the prior art, the invention has the following beneficial effects:
the invention can combine the multi-task learning method to do the debate mining from the debate text with high quality.
Drawings
Fig. 1 is a schematic configuration diagram of a system according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the embodiment provides a argument mining system based on multitask joint learning, which includes a data preprocessing module, a text embedding module, a joint learning module, and an argument mining tag output module; the data preprocessing module is used for preprocessing data; the text embedding module is used for extracting the expression of semantics, part of speech and argument types in the input text at a word level and a character level respectively, and expressing the characteristics by vectors as the vector input of the joint learning module; the joint learning module is used for adopting a hard sharing mechanism of parameters in multi-task learning, regarding different argument mining data sets as different tasks, enabling the tasks to share a hidden layer of the joint learning module, performing training learning in parallel, and learning context information of a text by using a stack type bidirectional long-time and short-time memory neural network to complete sequence labeling tasks; the argument mining tag output module is used for completing argument mining argument type prediction work and outputting the type of arguments in a text.
Preferably, the embodiment provides a working method of a argument mining system based on multitask joint learning, comprising the following steps:
step S1: inputting a debate document of a debate structure to be mined, namely the debate type, into a data preprocessing module for preprocessing to obtain a preprocessed text, and inputting the preprocessed text into a text embedding module;
step S2: the text embedding module adopts a word-level CNN and a character-level CNNs-highway to respectively extract word-level and character-level characteristics of the preprocessed text, wherein the characteristics comprise digital character proportion information and character size writing information of words, and are expressed by vectors and input into the joint learning module;
step S3: the joint learning module adopts a multi-task hard parameter sharing mechanism according to the text data output by the text embedding module, treats each different data set as different argument mining tasks, uses bidirectional long-time memory neural network learning hidden layer representation, and performs training parameters in parallel; in addition, context information of the learning text data of the bidirectional long-time and short-time memory neural network is used in the module, and the context information is added into the bidirectional long-time and short-time memory neural network to finish the detection of the argument boundary and the subsequent prediction of the argument component type;
step S4: and the argument mining tag output module outputs the obtained type tag of the argument.
Preferably, in this embodiment, the text embedding module mainly includes two parts, i.e., word-level CNN and character-level CNNs-highway.
In this embodiment, the input online forensic document contains rich information but is also accompanied by some noise. Therefore, the data is preprocessed firstly, and the preprocessing of the data by the data preprocessing module comprises the following steps:
step SA: removing web page links, special characters and punctuation marks in the document;
step SB: performing word segmentation processing on the document;
step SC: carrying out stem reduction processing on English data;
step SD: and respectively filtering out stop words contained in the data set according to the Chinese and English stop word list.
In this embodiment, the text embedding module uses a neural network to extract the expressions of semantics, parts of speech, and types of arguments in the input text at the word level and the character level, which are specifically divided into two parts as follows:
CNN-based word-level representation:
step S21: let the input of CNN be a preprocessed text with length n as x ═ x 1 ,x 2 ,x 3 ,...x n ]According to the definition of word vector, each word is expressed as d-dimensional vector;
step S22: invoking a narrow convolution and a convolution kernel of window size k on the sequence x toAndas a padding vector, so that the sequence remains fixed length;
step S23: according to the following steps:calculating a sequence C ═ C of the text x after CNN extraction features 1 ,c 2 ,c 3 ,...c n ](ii) a Wherein f represents a non-linear activation function, x i:j Represent the i to j words, b represents the deviation;
step S23: obtaining local context feature representation in a theory mining text by adopting convolution kernel windows with four different lengths of k being 1,2,3 and 4;
character level representation based on CNNs-Highway:
the embodiment expands the application of the CNN model to solve the sequence PTB. A plurality of filters of different widths are used to obtain the characteristics of the word k. For a given word k we stack its characters to form a matrix C k To C k Convolution operations with multiple filters followed by maximum pooling to obtain a fixed dimensional representation for word k as high speedInput of the neural network.
Step S24: let C be the character set, d represent the dimension of the character vector, Q ∈ R d×|C| Representing a matrix of character vectors. Let the word K ∈ V be composed of the character sequence [ c [ ] 1 ,...,c l ]Composition, where l represents the length of the word K; the character-level representation of the word K is represented by a matrix C K ∈R d×l Given, where the jth column represents the character vector c j (ii) a Thus, at C k A narrow convolution and a filter of width w H ═ R are called between d×w ;
Step S25: then adding an offset variable bias and applying a nonlinear function to obtain a feature mapping f k ∈R l-w+1 (ii) a For a particular ith feature map, there are: f. of k [i]=tanh(<C k [*,i:i+w+1],H>+ b) wherein C k [*,i:i+w+1]C from the ith column to the (i + w +1) th column k Symbol of<A,B>=Tr(AB T ) Represents Frobenius inner product;
step S26: introducing a maximization function:this is applied to the word as the feature corresponding to filter H.
Step S27: inputting the CNN extraction result into a high-speed neural network layer, filtering valuable features through a conversion gate of the high-speed neural network, and inputting the obtained result into a joint learning module;
where g denotes a nonlinear function, and t ═ σ (W) T y+b T ) Called the conversion gate, and (1-t) called the carry gate.
In this embodiment, the joint learning module adopts a hard sharing mechanism of parameters in multi-task learning, regards different argument mining data sets as different tasks, and when a model is trained, the model is not trained for each task independently, but a plurality of tasks share a hidden layer of the model, so that the risk of overfitting can be effectively reduced. In addition, context information of a stack type bidirectional long-time memory neural network learning text is added into the model to complete a sequence labeling task.
The step S3 specifically includes the following steps:
step S31, the joint learning module inputs the shared characteristic parameter vector output by the text embedding module, including the word V w Character V c Part of speech V p Learning context information of the text and identifying a point of interest component entity;
step S32, the input of the stack LSTM neural network is a shared characteristic parameter vector output by a text embedding layer to learn the context information of the text and identify the entity of the point of discourse component, the two-way stack LSTM is constructed, the sentence vector is obtained by calculation, each LSTM unit consists of an n-dimensional vector at the t-th word, and the method comprises the following steps: an input gate (i nput gate) i t A forgetting gate (forget gate) f t An output gate o t A memory cell (memory cell l) c t And a hidden unit h t Each LSTM unit receives an n-dimensional vector input, the previous hidden state being h t-1 The previous memory cell is c t-1 ;
The parameters are updated according to the following formula:
i t =σ(W (i) x t +I (i) h t-1 +b i )
f t =σ(W (f) x t +I (f) h t-1 +b f )
o t =σ(W (o) x t +I (o) h t-1 +b o )
u t =tanh(W (u) x t +I (u) h t-1 +b u )
c t =i t ⊙u t +f t ⊙c t-1
h t =o t ⊙tanh(c t )
where σ represents the logistic activation functionAnd, indicates a dot product of vectors, W and I indicate weight matrices, b indicates a deviation vector, and the input of the LSTM cell on the t-th word is the word V of the t-th word t w Character V t c And part of speech V e Of a vector of linkageTwo reverse LSTM hidden unitsAndis connected intoAs an output;
step S33, marking each word of the input sentence with BIO label, then marking its argument type to form BIO-argument type form;
step S34, regarding the detection of the boundary of the task one and the identification of the type of the task two as the sequence labeling problem, firstly labeling each word of the input sentence with BIO label, then labeling the type of the word, that is, each word forms the form of "BIO-type of the word", such labeling method is the label of the task one and the task two. The two tasks are completed at the top layer of the sequence coding layer, and a two-layer neural network consisting of DenseNet and Softmax is constructed:
wherein W is a weight matrix and b is a deviation vector;
step S35, adding S t And the vector e of the previous word i-1 As an input to the process, the process may,then the data are input into a Softmax layer through a layer of neural network to obtain the type label of the argument component entity, and the obtained output is mapped into a vector e i 。
The stack type LSTM (StackedLSTM) can learn the feature representation of a higher time domain, and by fusing the hidden layer state of the current LSTM and the hidden layer state of the last LSTM, the context information in the text can be fully learned, and the deeper features can be obtained. In the training process, the stack LSTM uses hidden state and capture argument to mine the information of 'past' and 'future' in the text by forward or backward propagation, and finally combines the two states into the output of the network.
In this embodiment, as a final predicted tag output module of the system, a Conditional Random Field (CRF) is adopted in this embodiment, and compared with a common softmax analysis and classification, the Conditional Random Field (CRF) can directly separate the association of the output level, so that the relevant information of the context is easier to learn, and the effect is more excellent.
The step S4 specifically includes the following steps:
step S41: adopting a Conditional Random Field (CRF) to directly separate the association of an output level and learn the relevant information of the context;
step S42: during the training process, the loss function is given as follows: loss (y, x) ═ E (y, x) + log (z)
Where y represents the true tag sequence, x is the input sequence, z is the normalization constant, and E (y, x) refers to the path energy of the input x and tag y, and is expressed ash(y i (ii) a x) represents the scoring function of the label, g (y) k ,y k+1 ) Is a parameter matrix to be trained;
step S43: the predicted dataset tag result is output according to the score of step S42. The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.
Claims (3)
1. A argumentation mining system based on multitask joint learning is characterized in that: the system comprises a data preprocessing module, a text embedding module, a joint learning module and a debate mining tag output module; the data preprocessing module is used for preprocessing data; the text embedding module is used for extracting the expression of semantics, part of speech and argument types in the input text at a word level and a character level respectively, and expressing the characteristics by vectors as the vector input of the joint learning module; the joint learning module is used for adopting a hard sharing mechanism of parameters in multi-task learning, regarding different argument mining data sets as different tasks, enabling the tasks to share a hidden layer of the joint learning module, performing training learning in parallel, and learning context information of a text by using a stack type bidirectional long-time and short-time memory neural network to complete sequence labeling tasks; the argument mining tag output module is used for completing argument mining argument type prediction work and outputting the type of arguments in a text;
the joint learning module specifically comprises a shared characteristic parameter vector which is input by the joint learning module and output by the text embedding module, and comprises a word V w Character V c Part of speech V p Learning context information of the text and identifying a point of interest component entity;
constructing a bidirectional stack type LSTM, calculating to obtain sentence vectors, wherein each LSTM unit consists of an n-dimensional vector at the t-th word, and the method comprises the following steps: an input gate i t A forgetting door f t An output gate o t A memory cell c t And a hidden unit h t Each LSTM unit receives an n-dimensional vector input, the previous hidden state being h t-1 The previous memory cell is c t-1 ;
The parameters are updated according to the following formula:
i t =σ(W (i) x t +I (i) h t-1 +b i )
f t =σ(W (f) x t +I (f) h t-1 +b f )
o t =σ(W (o) x t +I (o) h t-1 +b o )
u t =tanh(W (u) x t +I (u) h t-1 +b u )
c t =i t ⊙u t +f t ⊙c t-1
h t =o t ⊙tanh(c t )
where σ denotes a logistic activation function, <' > denotes a dot product of vectors, W and I denote weight matrices, b denotes a disparity vector, and the input of the LSTM cell on the t-th word is the word V of the t-th word t w Character V t c And part of speech V e Of a vector of linkageTwo reverse LSTM hidden unitsAndis connected intoAs an output;
marking each word of an input sentence with a BIO label, and then marking the argument type of the word to form a BIO-argument type form;
constructing a two-layer neural network consisting of DenseNet and Softmax:
wherein W is a weight matrix and b is a deviation vector;
will s t And the vector e of the previous word i-1 As input, the input is input into a Softmax layer through a layer of neural network to obtain a type label of the entity of the point of interest component, and the obtained output is mapped into a vector e i ;
The text embedding module comprises:
let the input of CNN be a preprocessed text with length n as x ═ x 1 ,x 2 ,x 3 ,...x n ]According to the definition of word vector, each word is expressed as d-dimensional vector;
invoking a narrow convolution and a convolution kernel of window size k on the sequence x toAndas a padding vector, so that the sequence remains fixed length;
according to the following steps:calculating a sequence C ═ C of the text x after CNN extraction features 1 ,c 2 ,c 3 ,...c n ](ii) a Wherein f represents a non-linear activation function, x i:j Represent the i to j words, b represents the deviation;
obtaining local context feature representation in a theory mining text by adopting convolution kernel windows with four different lengths of k being 1,2,3 and 4;
let the word K ∈ V be composed of the character sequence [ c [ ] 1 ,...,c l ]Composition, where l represents the length of the word K; the character-level representation of the word K is represented by a matrix C K ∈R d×l Given, where the jth column represents the character vector c j (ii) a Thus at C k A narrow convolution and a filter of width w H ═ R are called between d×w ;
Then adding an offset variable bias and applying a nonlinear function to obtain a feature mapping f k ∈R l-w+1 (ii) a For a particular ith feature map, there are: f. of k [i]=tanh(<C k [*,i:i+w+1],H>+ b) wherein C k [*,i:i+w+1]C from the ith column to the (i + w +1) th column k Symbol of<A,B>=Tr(AB T ) Represents Frobenius inner product;
introducing a maximization function:applying the characteristic as the characteristic corresponding to the filter H to the word;
inputting the CNN extraction result into a high-speed neural network layer, filtering valuable features through a conversion gate of the high-speed neural network, and inputting the obtained result into a joint learning module;
where g denotes a nonlinear function, and t ═ σ (W) T y+b T ) Called a conversion gate, and (1-t) called a carry gate;
the argument mining tag output module comprises:
adopting a conditional random field to directly separate the association of the output level and learn the relevant information of the context;
during the training process, the loss function is given as follows: loss (y, x) ═ E (y, x) + log (z)
Where y represents the true tag sequence, x is the input sequence, z is the normalization constant, and E (y, x) refers to the path energy of the input x and tag y, and is expressed ash(y i (ii) a x) represents the scoring function of the label, g (y) k ,y k+1 ) Is a parameter matrix to be trained;
and outputting the predicted data set label result according to the score.
2. A method of operating a multi-task joint learning based argument mining system of claim 1, characterized by: the method comprises the following steps:
step S1: inputting a debate document of a debate structure to be mined, namely the debate type, into a data preprocessing module for preprocessing to obtain a preprocessed text, and inputting the preprocessed text into a text embedding module;
step S2: the text embedding module adopts a word-level CNN and a character-level CNNs-highways to respectively extract word-level and character-level characteristics of the preprocessed text, wherein the characteristics comprise digital character proportion information and character size writing information of words, and are expressed by vectors and input into the joint learning module;
step S3: the joint learning module adopts a multi-task hard parameter sharing mechanism according to the text data output by the text embedding module, treats each different data set as different argument mining tasks, uses a bidirectional long-time memory neural network learning hidden layer to express, trains parameters and learns the context information of the text data in parallel so as to finish the detection of the point boundary and the subsequent prediction of the type of the point component;
step S4: and the argument mining tag output module outputs the obtained type tag of the argument.
3. The working method of the argument mining system based on multitask joint learning according to claim 2, characterized in that: the data preprocessing module for preprocessing the data comprises the following steps:
step SA: removing web page links, special characters and punctuation marks in the document;
step SB: performing word segmentation processing on the document;
step SC: carrying out stem reduction processing on English data;
step SD: and respectively filtering out stop words contained in the data set according to the Chinese and English stop word list.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911161645.2A CN110941700B (en) | 2019-11-22 | 2019-11-22 | Multi-task joint learning-based argument mining system and working method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911161645.2A CN110941700B (en) | 2019-11-22 | 2019-11-22 | Multi-task joint learning-based argument mining system and working method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110941700A CN110941700A (en) | 2020-03-31 |
CN110941700B true CN110941700B (en) | 2022-08-09 |
Family
ID=69907634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911161645.2A Active CN110941700B (en) | 2019-11-22 | 2019-11-22 | Multi-task joint learning-based argument mining system and working method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110941700B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112347269A (en) * | 2020-11-11 | 2021-02-09 | 重庆邮电大学 | Method for recognizing argument pairs based on BERT and Att-BilSTM |
CN112613316B (en) * | 2020-12-31 | 2023-06-20 | 北京师范大学 | Method and system for generating ancient Chinese labeling model |
US11527074B1 (en) * | 2021-11-24 | 2022-12-13 | Continental Automotive Technologies GmbH | Systems and methods for deep multi-task learning for embedded machine vision applications |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255119A (en) * | 2018-07-18 | 2019-01-22 | 五邑大学 | A kind of sentence trunk analysis method and system based on the multitask deep neural network for segmenting and naming Entity recognition |
CN109635109A (en) * | 2018-11-28 | 2019-04-16 | 华南理工大学 | Sentence classification method based on LSTM and combination part of speech and more attention mechanism |
CN109710946A (en) * | 2019-01-15 | 2019-05-03 | 福州大学 | A kind of joint debate digging system and method based on dependence analytic tree |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3537346A1 (en) * | 2018-03-09 | 2019-09-11 | Tata Consultancy Services Limited | Method and system for incorporating regression into stacked auto encoder (sae) |
-
2019
- 2019-11-22 CN CN201911161645.2A patent/CN110941700B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109255119A (en) * | 2018-07-18 | 2019-01-22 | 五邑大学 | A kind of sentence trunk analysis method and system based on the multitask deep neural network for segmenting and naming Entity recognition |
CN109635109A (en) * | 2018-11-28 | 2019-04-16 | 华南理工大学 | Sentence classification method based on LSTM and combination part of speech and more attention mechanism |
CN109710946A (en) * | 2019-01-15 | 2019-05-03 | 福州大学 | A kind of joint debate digging system and method based on dependence analytic tree |
Non-Patent Citations (3)
Title |
---|
Xiuyi Fan 等."A first step towards explained activity recognition with ".《2016 IEEE International Conference on Multisensor Fusion and 》.2017, * |
廖祥文 等."基于多任务迭代学习的论辩挖掘方法".《计算机学报》.2018, * |
鄂海红 等."深度学习实体关系抽取研究综述".《软件学报》.2019,第30卷(第6期), * |
Also Published As
Publication number | Publication date |
---|---|
CN110941700A (en) | 2020-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qiu et al. | DGeoSegmenter: A dictionary-based Chinese word segmenter for the geoscience domain | |
CN106469560B (en) | Voice emotion recognition method based on unsupervised domain adaptation | |
CN110941700B (en) | Multi-task joint learning-based argument mining system and working method thereof | |
CN109858041B (en) | Named entity recognition method combining semi-supervised learning with user-defined dictionary | |
Li et al. | Improving convolutional neural network for text classification by recursive data pruning | |
CN107729311B (en) | Chinese text feature extraction method fusing text moods | |
Sun et al. | Deep LSTM networks for online Chinese handwriting recognition | |
CN113344206A (en) | Knowledge distillation method, device and equipment integrating channel and relation feature learning | |
CN113255294A (en) | Named entity recognition model training method, recognition method and device | |
CN111222318A (en) | Trigger word recognition method based on two-channel bidirectional LSTM-CRF network | |
CN109710946A (en) | A kind of joint debate digging system and method based on dependence analytic tree | |
CN114239574A (en) | Miner violation knowledge extraction method based on entity and relationship joint learning | |
CN112287106A (en) | Online comment emotion classification method based on dual-channel hybrid neural network | |
CN115630156A (en) | Mongolian emotion analysis method and system fusing Prompt and SRU | |
Gajurel et al. | A fine-grained visual attention approach for fingerspelling recognition in the wild | |
Reyes-Nava et al. | Performance analysis of deep neural networks for classification of gene-expression microarrays | |
El-Demerdash et al. | Psychological human traits detection based on universal language modeling | |
CN115795044A (en) | Knowledge injection-based user relationship mining method and device | |
CN111078881A (en) | Fine-grained emotion analysis method and system, electronic equipment and storage medium | |
CN114818718A (en) | Contract text recognition method and device | |
Elleuch et al. | The Effectiveness of Transfer Learning for Arabic Handwriting Recognition using Deep CNN. | |
CN113779966A (en) | Mongolian emotion analysis method of bidirectional CNN-RNN depth model based on attention | |
CN111309849A (en) | Fine-grained numerical information extraction method based on joint learning model | |
Nouhaila et al. | Arabic sentiment analysis based on 1-D convolutional neural network | |
CN113761106B (en) | Self-attention-strengthening bond transaction intention recognition system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |