CN113609267B

CN113609267B - Speech relation recognition method and system based on GCNDT-MacBERT neural network framework

Info

Publication number: CN113609267B
Application number: CN202110826586.7A
Authority: CN
Inventors: 江浩; 傅洛伊
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2023-11-07
Anticipated expiration: 2041-07-21
Also published as: CN113609267A

Abstract

The application provides a speech relation recognition method and a speech relation recognition system based on a GCNDT-MacBERT neural network framework, wherein the speech relation recognition method comprises the following steps: step S1: obtaining word sequences and syntactic dependency trees of the text; step S2: dividing the text into words to obtain a word sequence of the text, and obtaining a word vector sequence containing context information; step S3: fusing the word vector sequences according to the word sequences to obtain word vector sequences; s4, converting the syntactic dependency tree into a graph structure to obtain a word vector sequence with more information; step S5: vector feature fusion is carried out on the word vector sequence by utilizing a convolutional neural network, and the word vector sequence is mapped into a vector with fixed length; step S6: and inputting the word vector sequence into a multi-layer perceptron to obtain a classification result. The application fully utilizes the context information and grammar relation in the text, and greatly promotes the model to learn the internal semantics of the text by using the pre-training model and the graph rolling network.

Description

Speech relation recognition method and system based on GCNDT-MacBERT neural network framework

Technical Field

The application relates to the technical field of speech relation recognition, in particular to a method and a system for speech relation recognition based on a GCNDT-MacBERT neural network framework.

Background

Utterance relation: the utterance relation describes how two adjacent text units (e.g., clauses, sentences, and longer sentence groups) are logically connected to each other. Chapter analysis is one of the basic tasks of natural language processing (Natural Language Processing) and can help other downstream natural language processing tasks work better, and speech relationship recognition is an important component of chapter analysis and has received increasing attention from researchers in recent years. Emphasis is placed on determining how two adjacent phonetic units (e.g., clauses, sentences, and sentence groups) are semantically connected to each other. Obviously, recognizing the utterance relation can help many downstream NLP applications, such as automatic summarization, information extraction and problem solving. In principle, the speech connection between two clauses is important to recognize the relationship between them. For implicit speech relationship recognition, it is more challenging due to the lack of precise connective, often depending on the understanding of the text. It is a descriptive language (formal cohesive) compared to english, whereas chinese is a semantic language (semantic cohesive), tending to join words from sentences.

The language analysis is a basic task in natural language processing, and is a key step of subsequent tasks such as machine translation, question and answer, text generation, text abstract and the like. Shallow language parsing, such as Penn Discourse Treebank 2.0 (PDTB 2.0) and colll 2015 share tasks, focuses on identifying the relationship between two text intervals (i.e., clauses or sentences) named units of a language. In different subtasks of the PDTB and colll sharing task, implicit relationship classification between two sentences is considered to be the most difficult problem, because there is no phonetic sign or connective (e.g., however and because of) between the two sentences.

Utterance relation data set: several of the currently prevailing sets of chinese speech relationship data are SUDA-CDTB, colll-CDTB and SHARED TASK, respectively, which are planned and annotated by the university of sulzhou and LDC language alliance, respectively. CoNLL-CDTB: the coill 2016SHARED TASK is a game initiated in 2016 by sign (ACL's Special Interest Group on Natural Language Learning), and aims to expand the attention of researchers to speech relationship recognition TASKs and improve the recognition accuracy of speech relationship recognition TASKs. The system comprises two data sets of Chinese and English, wherein the English data set is formed by PDTB2.0, and the Chinese data set CoNLL-CDTB is obtained by expanding CDTB 0.5. SUDA-CDTB: a connection driven dependency tree (CDT, connection-driven Dependency Tree) scheme was proposed by the university of su to represent chinese utterance structure, where the element utterance units are leaf nodes and the Connective words are non-leaf nodes, which is mainly the utterance tree library and structure theory driven by the university of pennsylvania. In particular, the connective is used to directly represent the hierarchical structure of the tree structure and the phonetic relationships of the utterances, while the core of the utterance unit is determined with reference to the dependency theory. Under the direction of the CDT scheme, they manually annotated a chinese utterance tree library of 500 documents (CDTB, chinese Discourse Treebank).

Development of a pre-training model: machine learning is widely used in implicit speech relationship classification. Traditional research has focused mainly on the generation and selection of linguistic features, and recently, contextualized word representation learning, such as ELMo, BERT, and RoBERTa, has been demonstrated to significantly improve many downstream tasks. From Word2Vec and GloVe to ELMo and GPT to BERT which is the most common at present, the pre-training model walks a long way, and brings great vigor and breakthrough to the field of natural language processing. In real life, a large amount of text data lacking in manual annotation exists, the characteristics of the pre-training model can be just utilized to perform pre-training by combining with an unsupervised task, and then the model can learn a universal language representation very well after the pre-training is performed on a very large-scale text database; the pre-training model has the advantages of better generalization capability, acceleration of model convergence speed of downstream tasks, prevention of model overfitting on a small-scale data set, and the like.

Google researchers have been inspired by the advent of BERT, an Encoder using a transducer. BERT is pre-trained using MLM (Masked Language Model) tasks and NSP (Next-sentence Prediction) tasks. In performing the MLM task, BERT proposes: for words in text, there is a 15% probability that masking is required; in the words to be masked, [ MASK ] is used for masking at 80% of the probability, any word is used for replacing the original word at 10% of the probability, and the original word is used at the remaining 10% of the probability.

Following BERT, a large number of pre-trained models emerge, roBERTa and MacBERT have attracted considerable attention. Roberta is a major two-point improvement over BERT: 1. the BERT static masking is changed into dynamic masking; 2. larger scale corpora are used.

MacBERT is a major three-point improvement over BERT: 1. the candidate word is selected for masking using a full word masking (WWM, whole Word Masking), N-Gram masking strategy, with masking percentages of 40%, 30%, 20%, 10% from single character to four character. 2. Synonyms are used instead of [ MASK ]: the BERT has a great problem that a plurality of words in an input sentence are masked by [ MASK ] when performing an MLM task, but the [ MASK ] word does not appear in a fine tuning stage, which causes inconsistency between a pre-training task and a downstream fine tuning task; macBERT therefore proposes to use synonym substitution [ MASK ], which will select random words for substitution when there are no synonyms; 3. using a Sentence sequence Prediction (NSP) task proposed by the ALBERT model to replace an original NSP task of the BERT model; the SOP task creates a negative sample by switching the original order of two consecutive sentences.

The Chinese patent document with publication number of CN111382569A discloses a method, a device and computer equipment for identifying entities in dialogue corpus, which comprises the following steps: acquiring corpus texts of entities to be identified; word segmentation is carried out on the corpus text to obtain word segmentation results, wherein the word segmentation results comprise a plurality of words; acquiring a word vector corresponding to each word in the word segmentation result, and combining the word vectors corresponding to each word to obtain a text matrix corresponding to the corpus text; and taking the text matrix as the input of the entity recognition model, and acquiring the entity in the corpus text output by the entity recognition model.

In view of the above related art, the inventors consider that the above method is to obtain an entity in a text, the utilization rate of context information and grammatical relations in the text is small, and the inherent semantics of the text are difficult to learn, so that the model effect is poor.

Disclosure of Invention

Aiming at the defects in the prior art, the application aims to provide a speech relation recognition method and a speech relation recognition system based on a GCNDT-MacBERT neural network framework.

The application provides a speech relation recognition method based on a GCNDT-MacBERT neural network framework, which is characterized by comprising the following steps of:

step S1: preprocessing data to obtain a word sequence and a syntactic dependency tree of a text;

step S2: establishing a pre-training model, dividing the text according to the pre-training model to obtain a word sequence of the text, and inputting the word sequence of the text into the pre-training model to obtain a word vector sequence containing context information;

step S3: fusing the word vector sequences of the step S2 according to the word sequences obtained in the step S1 to obtain word vector sequences;

step S4: establishing a graph convolution neural network module, converting the syntactic dependency tree in the step S1 into a graph structure taking words as nodes, and then obtaining word vector sequences with more information by using the graph convolution neural network module;

step S5: establishing a CNN-Highway module, comprising a convolutional neural network and Highway, inputting a word vector sequence into the convolutional neural network, carrying out vector feature fusion on the word vector sequence by utilizing the convolutional neural network, mapping the vector feature fusion into a vector with a fixed length, and extracting more information of the word vector sequence by the Highway;

step S6: and (3) inputting the word vector sequence obtained in the step (S5) into a multi-layer perceptron by adopting the multi-layer perceptron, obtaining a classification result by using a Softmax function, and knowing the logical relationship between texts according to the classification result.

Preferably, in the step S1, the text is processed by a natural language processing tool kit to obtain a word sequence and a syntactic dependency tree of the text.

Preferably, the step S2 includes the steps of:

step S2.1: the pre-training model comprises a word segmentation device and a word list, a word sequence is obtained by respectively segmenting a plurality of texts through the word segmentation device of the pre-training model, a position sequence and a fragment sequence which are the same as the length of the word vector sequence are generated, and then the respective word sequences, the position sequences and the fragment sequences of the texts are spliced in sequence;

step S2.2: the pre-training model comprises a plurality of embedded layers, the word sequence, the position sequence and the fragment sequence in the step S2.1 are respectively input into the corresponding embedded layers to obtain corresponding embedded vector sequences, and the plurality of embedded vector sequences are added to obtain a final input vector sequence;

step S2.3: the pre-training model comprises a Transformer Encoder neural network, an input vector sequence is input into a Transformer Encoder neural network of the pre-training model, and a word vector sequence containing context information is obtained through calculation.

Preferably, the step S3 includes the steps of:

step S3.1: after the word vector sequence in the step S2.3 is obtained, dividing the word vector sequence into a plurality of independent word vector sequences according to the original sequence lengths of a plurality of texts;

step S3.2: according to the word sequence generated by the natural language processing tool kit in the step S1, respectively fusing the plurality of independent word vector sequences in the step S3.1 into word vectors to obtain a word vector sequence.

Preferably, the step S4 further includes: obtaining an adjacency matrix taking words as nodes according to the syntactic dependency tree generated in the step S1, and then fusing the syntactic characteristics of the word vector sequence by using a graph convolution neural network module.

Preferably, the step S5 performs vector feature fusion through a CNN-Highway module and maps the vector feature fusion to a vector with a fixed length; inputting a word vector sequence into a convolutional neural network, setting a plurality of convolutional layers with convolutional kernels, fusing information in a text by using the convolutional kernels, mapping the vector sequence into a vector with a fixed length through a pooling layer in the convolutional neural network, and finally extracting more information by using Highway.

The application provides an utterance relation recognition system based on a GCNDT-MacBERT neural network framework, which comprises the following modules:

module M1: preprocessing data to obtain a word sequence and a syntactic dependency tree of a text;

module M2: establishing a pre-training model, dividing the text according to the pre-training model to obtain a word sequence of the text, and inputting the word sequence of the text into the pre-training model to obtain a word vector sequence containing context information;

module M3: fusing the word vector sequences of the module M2 according to the word sequences obtained by the module M1 to obtain word vector sequences;

module M4: establishing a graph convolution neural network module, converting a syntactic dependency tree of the module M1 into a graph structure taking words as nodes, and then obtaining word vector sequences with more information by using the graph convolution neural network module;

module M5: establishing a CNN-Highway module, comprising a convolutional neural network and Highway, inputting a word vector sequence into the convolutional neural network, carrying out vector feature fusion on the word vector sequence by utilizing the convolutional neural network, mapping the vector feature fusion into a vector with a fixed length, and extracting more information of the word vector sequence by the Highway;

module M6: and (3) inputting the word vector sequence obtained by the module M5 into the multi-layer perceptron by adopting the multi-layer perceptron, obtaining a classification result by using a Softmax function, and knowing the logical relationship between texts according to the classification result.

Preferably, the module M1 processes the text through a natural language processing tool kit to obtain a word sequence and a syntactic dependency tree of the text.

Preferably, the module M2 comprises the following steps:

module M2.1: the pre-training model comprises a word segmentation device and a word list, a word sequence is obtained by respectively segmenting a plurality of texts through the word segmentation device of the pre-training model, a position sequence and a fragment sequence which are the same as the length of the word vector sequence are generated, and then the respective word sequences, the position sequences and the fragment sequences of the texts are spliced in sequence;

module M2.2: the pre-training model comprises a plurality of embedded layers, the word sequence, the position sequence and the fragment sequence in the module M2.1 are respectively input into the corresponding embedded layers to obtain corresponding embedded vector sequences, and the plurality of embedded vector sequences are added to obtain a final input vector sequence;

module M2.3: the pre-training model comprises a Transformer Encoder neural network, an input vector sequence is input into a Transformer Encoder neural network of the pre-training model, and a word vector sequence containing context information is obtained through calculation.

Preferably, the module M3 comprises the following steps:

module M3.1: after the word vector sequence of the module M2.3 is obtained, dividing the word vector sequence into a plurality of independent word vector sequences according to the original sequence lengths of a plurality of texts;

module M3.2: according to the word sequence generated by the natural language processing tool kit in the module M1, respectively fusing a plurality of independent word vector sequences of the module M3.1 into word vectors to obtain a word vector sequence.

Compared with the prior art, the application has the following beneficial effects:

1. the method fully utilizes the context information of the text, and extracts the context characteristics by using the latest pre-training model;

2. the application obtains the Chinese word vector by fusing the Chinese word vector, and overcomes the defect that the pre-training model can only be expressed by taking the unit as the unit in Chinese;

3. the method comprises the steps of extracting a syntactic dependency tree by using a CoreNLP natural language processing tool kit, and then extracting grammar characteristics by using a graph convolution neural network;

4. the application fully utilizes the context information and grammar relation in the text, and can greatly promote the model to learn the internal semantics of the text by using the pre-training model and the graph convolution network;

5. the method uses a pre-training model such as MacBERT, roBERTa and the like to obtain a representation vector of a context, then fuses word vectors (English text does not need to be in this step) to obtain word vectors, and learns the grammar characteristics of the text through a graph convolution network.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a diagram of the overall architecture of a speech relationship recognition system based on a GCNDT-MacBERT neural network framework in the present application;

FIG. 2 is a syntactic dependency tree generated using the CoreNLP natural language processing toolkit in the present application.

Detailed Description

The present application will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the application in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present application.

The embodiment of the application discloses a method and a system for recognizing speech relation based on a GCNDT-MacBERT neural network framework, as shown in fig. 1 and 2, comprising the following steps: step S1: preprocessing data (each sample is two texts and corresponding categories) to obtain a word sequence and a syntactic dependency tree of the texts. And processing the text through a natural language processing tool kit to obtain a word sequence and a syntactic dependency tree of the text. And processing the text by using a Stanford CoreNLP natural language processing tool kit to obtain a word sequence and a syntactic dependency tree. And preprocessing the text data by using a CoreNLP natural language processing tool kit to obtain respective word sequences and syntactic dependency tree structures of the two image texts.

Experiments were performed on two datasets, the Suda-CDTB and CoNLL-CDTB datasets, with raw text sources for the two datasets shown in Table 1 and sample type statistics for the Suda-CDTB and CoNLL-CDTB datasets shown in tables 2 and 3. And respectively analyzing the two image text contents of each sample by using a CoreNLP natural language processing tool kit to obtain a word sequence and a syntactic dependency tree structure of the text.

Table 1 total data set sample size and field of art

TABLE 2Suda-CDTB data set sample type and quantity statistics

TABLE 3CoNLL-CDTB data set sample type and quantity statistics

Step S2: and establishing a pre-training model, dividing the text according to the pre-training model (MacBERT and the like) to obtain a word sequence of the text, and inputting the word sequence of the text into the pre-training model to obtain a word vector sequence containing the context information. The word segmentation device of the pre-training model (MacBERT, roBERTa and the like) is used for processing two image texts to obtain a word sequence of the texts, then the word sequences of the two image texts are spliced and input into the pre-training model to obtain a word vector sequence containing context information.

Step S2 includes the steps of: step S2.1: the pre-training model comprises a word segmentation device and a word list, the word segmentation device of the pre-training model is used for respectively segmenting two texts to obtain a word sequence, a position sequence and a fragment sequence which are the same as the word vector sequence in length are generated, and then the word sequences, the position sequences and the fragment sequences of the two texts are spliced in sequence.

And respectively dividing the texts of the two fragments by a word divider of the pre-training model to obtain word sequences (word sequences are the word division results of English texts), and then respectively splicing the respective sequences of the two fragments according to the sequence of the following formula. Generating a position sequence and a segment sequence with the same length as the two segment word vector sequences respectively, wherein the position sequence of the segment 1 is increased by 1 step from 0, the position sequence of the segment 2 is also increased by 1 step from the last element of the segment 1, for the segment sequence, the segment sequence of the segment 1 is set to 0, and the segment sequence of the segment 2 is set to 1; the respective three sequences of the two images are then spliced in the order of the following formulas, respectively.

Arg ₁ A sequence of word vectors representing an image 1; CLS represents the header tag added by the pre-training model; SEP represents the segmentation label added by the pre-training model; EOS represents a header tag added by the pre-training model;representing a word vector in image 1; arg (Arg) ₂ A sequence of word vectors representing image 2; />Representing a word vector in image 2; subscript M represents the number of words in the number 1; subscript N represents the number of words in the number 2; e, e ₀ ～e _M+N+3 Representing the new word vector after merging.

Step S2.2: the pre-training model comprises a plurality of embedded layers, the word sequence, the position sequence and the fragment sequence in the step S2.1 are respectively input into the corresponding embedded layers to obtain a corresponding embedded vector sequence, and the plurality of embedded vector sequences are added to obtain a final input vector sequence. The pre-training model has three embedded layers, namely Word Embedding, position Embedding and Segment Embedding, the three sequences in the step S2.1 are respectively input into the corresponding embedded layers to obtain corresponding embedded vector sequences (Word vector sequences, position vector sequences and segment vector sequences), and the three embedded vector sequences are added (corresponding position addition) to obtain a final input vector sequence. The Word vector sequence is generated by Word encoding, the position vector sequence is generated by Position Embedding, and the segment vector sequence is generated by Segment Embedding.

Step S2.3: the pre-training model comprises a Transformer Encoder neural network, an input vector sequence is input into a Transformer Encoder neural network of the pre-training model, and a word vector sequence containing context information is obtained through calculation. Inputting the input vector sequence into a Transformer Encoder neural network module of the pre-training model, and obtaining a word vector sequence [ h ] containing context information through calculation of a multi-layer module ₀ ,h ₁ ,…,h _M+N+2 ,h _M+N+3 ]；h ₀ ～h _M+N+3 Representation e ₀ ～e _M+N+3 And inputting the information word vector sequence obtained by fusing the upper and lower information into the pre-training model.

Step S3: and (3) fusing the word vector sequence in the step (S2) according to the word sequence obtained in the step (S1) to obtain the word vector sequence. And fusing the word vector sequences to obtain word vector sequences. And according to the word sequence generated by the CoreNLP natural language processing tool kit in the step S1, respectively fusing the word vector sequences of the two fragments obtained in the step S2 to obtain the word vector sequences of the two fragments.

Step S3 includes the steps of: step S3.1: after the word vector sequence of step S2.3 is obtained, the word vector sequence is split into a plurality of independent word vector sequences according to the original sequence lengths of the plurality of texts. After the word vector sequence of S2.3 is obtained, dividing the word vector sequence into two independent word vector sequences according to the original sequence length of the two fragments, namely a word vector sequence of fragment 1 and a word vector sequence of fragment 2;

representing the segmented word vector of the figure 1; />Representing the segmented word vector of the figure 2; h is a _M+1 Representing word vectors before segmentation; m represents the number of words of the figure 1; n represents the number of words of figure 2.

Step S3.2: according to the word sequence generated by the natural language processing tool kit in the step S1, respectively fusing the plurality of independent word vector sequences in the step S3.1 into word vectors to obtain a word vector sequence. According to the word sequence generated by the CoreNLP natural language processing tool kit in the step S1, respectively fusing a plurality of independent word vector sequences in the step S3.1 into word vectors to obtain a word vector sequence.

Step S4: and (3) establishing a graph convolution neural network module, converting the syntactic dependency tree in the step (S1) into a graph structure taking words as nodes, and then obtaining word vector sequences with more information by using the graph convolution neural network module. Obtaining an adjacency matrix taking words as nodes according to the syntactic dependency tree generated in the step S1, and then fusing the syntactic characteristics of the word vector sequence by using a graph convolution neural network module. In the step S1, a syntactic dependency tree generated by a CoreNLP natural language processing tool package is a tree structure formed by taking words as nodes, and an adjacency matrix taking the words as nodes is obtained according to the syntactic dependency tree; and then fusing the grammar characteristics of the word vector sequence by using a graph convolution neural network module.

Step S4 includes the steps of: in step S4.1, the syntactic dependency tree generated by the natural language processing tool kit in step S1 is a tree structure formed by taking words as nodes, the tree structure is a special graph structure, and the tree structure is also an undirected graph structure in nature, so that an adjacency matrix taking words as nodes is obtained according to the syntactic dependency tree.

And S4.2, inputting the word vector sequence and the adjacency matrix into a graph convolution neural network module, and obtaining the word vector sequence containing the grammar relation characteristics through the graph neural network module.

H ^l+1 ＝f(H ^l ,A)

H ⁰ X is the input of the first layer; x represents a sequence of word vectors and,the dimension representing X is N'. Times.D,>is the sign of the vector dimension, ">N' is the number of nodes of the graph, D is the dimension of the feature vector of each node, A is the adjacency matrix, f is the convolution function, H ^l Representing the input of the first layer.

Step S5: the method comprises the steps of establishing a CNN-high way module, comprising a convolutional neural network and high way, inputting a word vector sequence into the convolutional neural network, carrying out vector feature fusion on the word vector sequence by utilizing the convolutional neural network, mapping the word vector sequence into a vector with a fixed length, extracting more information of the word vector sequence through the high way, wherein the information/feature is an abstract concept, and the high way is essentially that the information represented and contained by the vector can be more. Vector feature fusion is carried out through a CNN-Highway module, and the vector feature fusion is mapped to a vector with fixed length; inputting the word vector sequence into a convolutional neural network, setting a plurality of convolutional layers with convolutional kernels, fusing information in a text by using the convolutional kernels, mapping the word vector sequence into vectors with fixed lengths through a pooling layer in the convolutional neural network, and finally extracting more information by using Highway. The CNN-Highway module contains two sub-modules, CNN (convolutional neural network) and Highway.

And the CNN-Highway module utilizes a convolutional neural network and a maximum pooling layer to perform vector feature fusion on the word vector sequence and map the word vector sequence into a vector with a fixed length, and then extracts more features through the Highway module. Vector feature fusion is carried out on the word vector sequence through a CNN-high way module, the word vector sequence is mapped to a vector with a fixed length, a Convolutional Neural Network (CNN) is input first, n-gram information in a convolutional kernel fusion text is utilized, the word vector sequence is mapped to the vector with the fixed length through a maximum pooling layer, and finally more features are extracted through the high way module.

Step S5 comprises the following specific steps: s5.1, fusing word vector sequences by using a Convolutional Neural Network (CNN); setting a plurality of convolution layers with different convolution kernel sizes, firstly inputting a word vector sequence into a Convolution Neural Network (CNN), fusing n-gram information in a text by using convolution operation, and mapping the word vector sequence into a vector with a fixed length through a maximum pooling layer, wherein the formula is as follows:

representing vectors after operation of a convolution function and a pooling layer with a convolution kernel of c, the subscript c representing the size of the convolution kernel, reLU representing a nonlinear function, conv _c A convolution function representing a convolution kernel of size c, < >>Representing the i-th word vector,>representing the i+c word vector.

And S5.2, finally, extracting more features by using a Highway module, wherein the formula is as follows:

o ¹ ＝g ¹ ⊙u ^1′ +(1-g ¹ )⊙u ¹

representing vectors after a convolution function with a convolution kernel z and a pooling layer operation, flatten merges vector sequences into vectors, W ^h 、W ^g Representing parameters in the module, superscript T representing the matrix transpose, sigmod, reLU representing the nonlinear function, o ¹ An area vector of a final fixed length is indicated, and the symbol ≡indicates an inner product operation.

Step S6: and (3) inputting the word vector sequence obtained in the step (S5) into a multi-layer perceptron by adopting the multi-layer perceptron, obtaining a classification result by using a Softmax function, and obtaining a logic relationship between texts according to the classification result.

The multi-layer perceptron is a basic module in the field of deep learning, the characterization vectors of two arments can be directly spliced and input into the multi-layer perceptron, the final length of the vector is 4, the vector is normalized by using a Softmax function to be used as the score of each category, and the category with the highest score is the result of model prediction. The logical relationship between the two texts can be known according to the classified categories, and help is provided for downstream tasks. The output is a defined type, e.g., causal, turn, etc.; the task of the project is to get this category.

Step S6 comprises the following specific steps: and S6.1, obtaining two fixed-length characterization vectors of the image through the step S5, then splicing the two vectors, and inputting the two vectors into the multi-layer perceptron, wherein the final length of the vector is 4. The token vector is a fixed length vector.

Step S6.2, normalization is then carried out by using a Softmax function to obtain a score of each category, and the category with the highest score is the result of model prediction. Each dataset itself defines a category.

In addition to the deep learning model-GCNDT-MacBERT neural network framework, the GCNDT-MacBERT model (MacBERT) without GCN is added as comparison, experiments are carried out on the CoNLL-CDTB and Suda-CDTB data sets together with the previous model, and the performance difference of different models is compared through the Acc index and the F1 index.

Acc represents the accuracy that we predict, which in binary classification statistical analysis represents how many samples predicted to be positive are truly positive samples.

F1 is a measure of test accuracy. It is calculated from the accuracy of the test and the recall, where in a binary classification statistical analysis, the accuracy is the number of true positive results divided by the number of all positive results, including incorrectly identified results, and the recall is the number of true positive results divided by the number of all samples that should be determined to be positive; f1 can also be extended to multi-classification problems as well.

Table 4 summarizes the results of the GCNDT-MacBERT and all baseline methods, where the optimal results have been highlighted in bold. It is apparent that the proposed GCNDT-MacBERT exhibits optimal performance on all indicators on each dataset. The GCNDT-MacBERT better learns the context information of the text through a pre-training model, and extracts the grammar information of the text through the structure of the syntactic dependency tree and the graph convolution neural network module; the increase in Acc on the CoNLL-CDTB and Suda-CDTB datasets was 5.4% and 13.96% compared to the suboptimal TTN, respectively.

Table 4 representation of models on database

The application provides a speech relation recognition model of a GCNDT-MacBERT neural network framework based on a transducer and a GCN neural network, which uses a MacBERT, roBERTa pre-training model to obtain a context expression vector, then fuses word vectors (English text does not need one step) to obtain word vectors, and learns the grammar characteristics of the text through a graph convolution network. The application uses CoreNLP natural language processing tool package to extract syntactic dependency tree, and then uses graph convolution neural network to extract characteristic grammar characteristics. The application relates to the field of text classification and speech relation between two texts, fully utilizes context information and grammar relation in the texts, and can greatly promote the model to learn the internal semantics of the texts by using a pre-training model and a graph convolution network.

Those skilled in the art will appreciate that the application provides a system and its individual devices, modules, units, etc. that can be implemented entirely by logic programming of method steps, in addition to being implemented as pure computer readable program code, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Therefore, the system and various devices, modules and units thereof provided by the application can be regarded as a hardware component, and the devices, modules and units for realizing various functions included in the system can also be regarded as structures in the hardware component; means, modules, and units for implementing the various functions may also be considered as either software modules for implementing the methods or structures within hardware components.

The foregoing describes specific embodiments of the present application. It is to be understood that the application is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the application. The embodiments of the application and the features of the embodiments may be combined with each other arbitrarily without conflict.

Claims

1. A speech relation recognition method based on a GCNDT-MacBERT neural network framework is characterized by comprising the following steps:

2. The method for recognizing speech relation based on GCNDT-MacBERT neural network framework according to claim 1, wherein in step S1, the text is processed by a natural language processing tool kit to obtain word sequence and syntactic dependency tree of the text.

3. The speech relationship recognition method based on GCNDT-MacBERT neural network framework according to claim 2, wherein the step S2 comprises the steps of:

4. The speech relationship recognition method based on GCNDT-MacBERT neural network framework of claim 3, wherein said step S3 comprises the steps of:

5. The method for recognizing speech relations based on the GCNDT-MacBERT neural network framework according to claim 1, wherein the step S4 further comprises: obtaining an adjacency matrix taking words as nodes according to the syntactic dependency tree generated in the step S1, and then fusing the syntactic characteristics of the word vector sequence by using a graph convolution neural network module.

6. The speech relationship recognition method based on GCNDT-MacBERT neural network framework according to claim 1, wherein the step S5 performs vector feature fusion by CNN-Highway module and maps it to a vector of fixed length; inputting a word vector sequence into a convolutional neural network, setting a plurality of convolutional layers with convolutional kernels, fusing information in a text by using the convolutional kernels, mapping the vector sequence into a vector with a fixed length through a pooling layer in the convolutional neural network, and finally extracting more information by using Highway.

7. A GCNDT-MacBERT neural network framework-based speech relationship recognition system, characterized in that a GCNDT-MacBERT neural network framework-based speech relationship recognition method according to any one of claims 1 to 6 is applied, comprising the following modules:

8. The speech relationship recognition system based on GCNDT-MacBERT neural network framework of claim 7, wherein the module M1 processes the text by a natural language processing tool kit to obtain a word sequence and a syntactic dependency tree of the text.

9. The speech relationship recognition system based on GCNDT-MacBERT neural network framework of claim 8, wherein the module M2 comprises the steps of:

10. The speech relationship recognition system based on GCNDT-MacBERT neural network framework of claim 9, wherein the module M3 comprises the steps of: