CN113609267B - Speech relation recognition method and system based on GCNDT-MacBERT neural network framework - Google Patents

Speech relation recognition method and system based on GCNDT-MacBERT neural network framework Download PDF

Info

Publication number
CN113609267B
CN113609267B CN202110826586.7A CN202110826586A CN113609267B CN 113609267 B CN113609267 B CN 113609267B CN 202110826586 A CN202110826586 A CN 202110826586A CN 113609267 B CN113609267 B CN 113609267B
Authority
CN
China
Prior art keywords
word
sequence
neural network
module
word vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110826586.7A
Other languages
Chinese (zh)
Other versions
CN113609267A (en
Inventor
江浩
傅洛伊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110826586.7A priority Critical patent/CN113609267B/en
Publication of CN113609267A publication Critical patent/CN113609267A/en
Application granted granted Critical
Publication of CN113609267B publication Critical patent/CN113609267B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/154Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a speech relation recognition method and a speech relation recognition system based on a GCNDT-MacBERT neural network framework, wherein the speech relation recognition method comprises the following steps: step S1: obtaining word sequences and syntactic dependency trees of the text; step S2: dividing the text into words to obtain a word sequence of the text, and obtaining a word vector sequence containing context information; step S3: fusing the word vector sequences according to the word sequences to obtain word vector sequences; s4, converting the syntactic dependency tree into a graph structure to obtain a word vector sequence with more information; step S5: vector feature fusion is carried out on the word vector sequence by utilizing a convolutional neural network, and the word vector sequence is mapped into a vector with fixed length; step S6: and inputting the word vector sequence into a multi-layer perceptron to obtain a classification result. The application fully utilizes the context information and grammar relation in the text, and greatly promotes the model to learn the internal semantics of the text by using the pre-training model and the graph rolling network.

Description

Speech relation recognition method and system based on GCNDT-MacBERT neural network framework
Technical Field
The application relates to the technical field of speech relation recognition, in particular to a method and a system for speech relation recognition based on a GCNDT-MacBERT neural network framework.
Background
Utterance relation: the utterance relation describes how two adjacent text units (e.g., clauses, sentences, and longer sentence groups) are logically connected to each other. Chapter analysis is one of the basic tasks of natural language processing (Natural Language Processing) and can help other downstream natural language processing tasks work better, and speech relationship recognition is an important component of chapter analysis and has received increasing attention from researchers in recent years. Emphasis is placed on determining how two adjacent phonetic units (e.g., clauses, sentences, and sentence groups) are semantically connected to each other. Obviously, recognizing the utterance relation can help many downstream NLP applications, such as automatic summarization, information extraction and problem solving. In principle, the speech connection between two clauses is important to recognize the relationship between them. For implicit speech relationship recognition, it is more challenging due to the lack of precise connective, often depending on the understanding of the text. It is a descriptive language (formal cohesive) compared to english, whereas chinese is a semantic language (semantic cohesive), tending to join words from sentences.
The language analysis is a basic task in natural language processing, and is a key step of subsequent tasks such as machine translation, question and answer, text generation, text abstract and the like. Shallow language parsing, such as Penn Discourse Treebank 2.0 (PDTB 2.0) and colll 2015 share tasks, focuses on identifying the relationship between two text intervals (i.e., clauses or sentences) named units of a language. In different subtasks of the PDTB and colll sharing task, implicit relationship classification between two sentences is considered to be the most difficult problem, because there is no phonetic sign or connective (e.g., however and because of) between the two sentences.
Utterance relation data set: several of the currently prevailing sets of chinese speech relationship data are SUDA-CDTB, colll-CDTB and SHARED TASK, respectively, which are planned and annotated by the university of sulzhou and LDC language alliance, respectively. CoNLL-CDTB: the coill 2016SHARED TASK is a game initiated in 2016 by sign (ACL's Special Interest Group on Natural Language Learning), and aims to expand the attention of researchers to speech relationship recognition TASKs and improve the recognition accuracy of speech relationship recognition TASKs. The system comprises two data sets of Chinese and English, wherein the English data set is formed by PDTB2.0, and the Chinese data set CoNLL-CDTB is obtained by expanding CDTB 0.5. SUDA-CDTB: a connection driven dependency tree (CDT, connection-driven Dependency Tree) scheme was proposed by the university of su to represent chinese utterance structure, where the element utterance units are leaf nodes and the Connective words are non-leaf nodes, which is mainly the utterance tree library and structure theory driven by the university of pennsylvania. In particular, the connective is used to directly represent the hierarchical structure of the tree structure and the phonetic relationships of the utterances, while the core of the utterance unit is determined with reference to the dependency theory. Under the direction of the CDT scheme, they manually annotated a chinese utterance tree library of 500 documents (CDTB, chinese Discourse Treebank).
Development of a pre-training model: machine learning is widely used in implicit speech relationship classification. Traditional research has focused mainly on the generation and selection of linguistic features, and recently, contextualized word representation learning, such as ELMo, BERT, and RoBERTa, has been demonstrated to significantly improve many downstream tasks. From Word2Vec and GloVe to ELMo and GPT to BERT which is the most common at present, the pre-training model walks a long way, and brings great vigor and breakthrough to the field of natural language processing. In real life, a large amount of text data lacking in manual annotation exists, the characteristics of the pre-training model can be just utilized to perform pre-training by combining with an unsupervised task, and then the model can learn a universal language representation very well after the pre-training is performed on a very large-scale text database; the pre-training model has the advantages of better generalization capability, acceleration of model convergence speed of downstream tasks, prevention of model overfitting on a small-scale data set, and the like.
Google researchers have been inspired by the advent of BERT, an Encoder using a transducer. BERT is pre-trained using MLM (Masked Language Model) tasks and NSP (Next-sentence Prediction) tasks. In performing the MLM task, BERT proposes: for words in text, there is a 15% probability that masking is required; in the words to be masked, [ MASK ] is used for masking at 80% of the probability, any word is used for replacing the original word at 10% of the probability, and the original word is used at the remaining 10% of the probability.
Following BERT, a large number of pre-trained models emerge, roBERTa and MacBERT have attracted considerable attention. Roberta is a major two-point improvement over BERT: 1. the BERT static masking is changed into dynamic masking; 2. larger scale corpora are used.
MacBERT is a major three-point improvement over BERT: 1. the candidate word is selected for masking using a full word masking (WWM, whole Word Masking), N-Gram masking strategy, with masking percentages of 40%, 30%, 20%, 10% from single character to four character. 2. Synonyms are used instead of [ MASK ]: the BERT has a great problem that a plurality of words in an input sentence are masked by [ MASK ] when performing an MLM task, but the [ MASK ] word does not appear in a fine tuning stage, which causes inconsistency between a pre-training task and a downstream fine tuning task; macBERT therefore proposes to use synonym substitution [ MASK ], which will select random words for substitution when there are no synonyms; 3. using a Sentence sequence Prediction (NSP) task proposed by the ALBERT model to replace an original NSP task of the BERT model; the SOP task creates a negative sample by switching the original order of two consecutive sentences.
The Chinese patent document with publication number of CN111382569A discloses a method, a device and computer equipment for identifying entities in dialogue corpus, which comprises the following steps: acquiring corpus texts of entities to be identified; word segmentation is carried out on the corpus text to obtain word segmentation results, wherein the word segmentation results comprise a plurality of words; acquiring a word vector corresponding to each word in the word segmentation result, and combining the word vectors corresponding to each word to obtain a text matrix corresponding to the corpus text; and taking the text matrix as the input of the entity recognition model, and acquiring the entity in the corpus text output by the entity recognition model.
In view of the above related art, the inventors consider that the above method is to obtain an entity in a text, the utilization rate of context information and grammatical relations in the text is small, and the inherent semantics of the text are difficult to learn, so that the model effect is poor.
Disclosure of Invention
Aiming at the defects in the prior art, the application aims to provide a speech relation recognition method and a speech relation recognition system based on a GCNDT-MacBERT neural network framework.
The application provides a speech relation recognition method based on a GCNDT-MacBERT neural network framework, which is characterized by comprising the following steps of:
step S1: preprocessing data to obtain a word sequence and a syntactic dependency tree of a text;
step S2: establishing a pre-training model, dividing the text according to the pre-training model to obtain a word sequence of the text, and inputting the word sequence of the text into the pre-training model to obtain a word vector sequence containing context information;
step S3: fusing the word vector sequences of the step S2 according to the word sequences obtained in the step S1 to obtain word vector sequences;
step S4: establishing a graph convolution neural network module, converting the syntactic dependency tree in the step S1 into a graph structure taking words as nodes, and then obtaining word vector sequences with more information by using the graph convolution neural network module;
step S5: establishing a CNN-Highway module, comprising a convolutional neural network and Highway, inputting a word vector sequence into the convolutional neural network, carrying out vector feature fusion on the word vector sequence by utilizing the convolutional neural network, mapping the vector feature fusion into a vector with a fixed length, and extracting more information of the word vector sequence by the Highway;
step S6: and (3) inputting the word vector sequence obtained in the step (S5) into a multi-layer perceptron by adopting the multi-layer perceptron, obtaining a classification result by using a Softmax function, and knowing the logical relationship between texts according to the classification result.
Preferably, in the step S1, the text is processed by a natural language processing tool kit to obtain a word sequence and a syntactic dependency tree of the text.
Preferably, the step S2 includes the steps of:
step S2.1: the pre-training model comprises a word segmentation device and a word list, a word sequence is obtained by respectively segmenting a plurality of texts through the word segmentation device of the pre-training model, a position sequence and a fragment sequence which are the same as the length of the word vector sequence are generated, and then the respective word sequences, the position sequences and the fragment sequences of the texts are spliced in sequence;
step S2.2: the pre-training model comprises a plurality of embedded layers, the word sequence, the position sequence and the fragment sequence in the step S2.1 are respectively input into the corresponding embedded layers to obtain corresponding embedded vector sequences, and the plurality of embedded vector sequences are added to obtain a final input vector sequence;
step S2.3: the pre-training model comprises a Transformer Encoder neural network, an input vector sequence is input into a Transformer Encoder neural network of the pre-training model, and a word vector sequence containing context information is obtained through calculation.
Preferably, the step S3 includes the steps of:
step S3.1: after the word vector sequence in the step S2.3 is obtained, dividing the word vector sequence into a plurality of independent word vector sequences according to the original sequence lengths of a plurality of texts;
step S3.2: according to the word sequence generated by the natural language processing tool kit in the step S1, respectively fusing the plurality of independent word vector sequences in the step S3.1 into word vectors to obtain a word vector sequence.
Preferably, the step S4 further includes: obtaining an adjacency matrix taking words as nodes according to the syntactic dependency tree generated in the step S1, and then fusing the syntactic characteristics of the word vector sequence by using a graph convolution neural network module.
Preferably, the step S5 performs vector feature fusion through a CNN-Highway module and maps the vector feature fusion to a vector with a fixed length; inputting a word vector sequence into a convolutional neural network, setting a plurality of convolutional layers with convolutional kernels, fusing information in a text by using the convolutional kernels, mapping the vector sequence into a vector with a fixed length through a pooling layer in the convolutional neural network, and finally extracting more information by using Highway.
The application provides an utterance relation recognition system based on a GCNDT-MacBERT neural network framework, which comprises the following modules:
module M1: preprocessing data to obtain a word sequence and a syntactic dependency tree of a text;
module M2: establishing a pre-training model, dividing the text according to the pre-training model to obtain a word sequence of the text, and inputting the word sequence of the text into the pre-training model to obtain a word vector sequence containing context information;
module M3: fusing the word vector sequences of the module M2 according to the word sequences obtained by the module M1 to obtain word vector sequences;
module M4: establishing a graph convolution neural network module, converting a syntactic dependency tree of the module M1 into a graph structure taking words as nodes, and then obtaining word vector sequences with more information by using the graph convolution neural network module;
module M5: establishing a CNN-Highway module, comprising a convolutional neural network and Highway, inputting a word vector sequence into the convolutional neural network, carrying out vector feature fusion on the word vector sequence by utilizing the convolutional neural network, mapping the vector feature fusion into a vector with a fixed length, and extracting more information of the word vector sequence by the Highway;
module M6: and (3) inputting the word vector sequence obtained by the module M5 into the multi-layer perceptron by adopting the multi-layer perceptron, obtaining a classification result by using a Softmax function, and knowing the logical relationship between texts according to the classification result.
Preferably, the module M1 processes the text through a natural language processing tool kit to obtain a word sequence and a syntactic dependency tree of the text.
Preferably, the module M2 comprises the following steps:
module M2.1: the pre-training model comprises a word segmentation device and a word list, a word sequence is obtained by respectively segmenting a plurality of texts through the word segmentation device of the pre-training model, a position sequence and a fragment sequence which are the same as the length of the word vector sequence are generated, and then the respective word sequences, the position sequences and the fragment sequences of the texts are spliced in sequence;
module M2.2: the pre-training model comprises a plurality of embedded layers, the word sequence, the position sequence and the fragment sequence in the module M2.1 are respectively input into the corresponding embedded layers to obtain corresponding embedded vector sequences, and the plurality of embedded vector sequences are added to obtain a final input vector sequence;
module M2.3: the pre-training model comprises a Transformer Encoder neural network, an input vector sequence is input into a Transformer Encoder neural network of the pre-training model, and a word vector sequence containing context information is obtained through calculation.
Preferably, the module M3 comprises the following steps:
module M3.1: after the word vector sequence of the module M2.3 is obtained, dividing the word vector sequence into a plurality of independent word vector sequences according to the original sequence lengths of a plurality of texts;
module M3.2: according to the word sequence generated by the natural language processing tool kit in the module M1, respectively fusing a plurality of independent word vector sequences of the module M3.1 into word vectors to obtain a word vector sequence.
Compared with the prior art, the application has the following beneficial effects:
1. the method fully utilizes the context information of the text, and extracts the context characteristics by using the latest pre-training model;
2. the application obtains the Chinese word vector by fusing the Chinese word vector, and overcomes the defect that the pre-training model can only be expressed by taking the unit as the unit in Chinese;
3. the method comprises the steps of extracting a syntactic dependency tree by using a CoreNLP natural language processing tool kit, and then extracting grammar characteristics by using a graph convolution neural network;
4. the application fully utilizes the context information and grammar relation in the text, and can greatly promote the model to learn the internal semantics of the text by using the pre-training model and the graph convolution network;
5. the method uses a pre-training model such as MacBERT, roBERTa and the like to obtain a representation vector of a context, then fuses word vectors (English text does not need to be in this step) to obtain word vectors, and learns the grammar characteristics of the text through a graph convolution network.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a diagram of the overall architecture of a speech relationship recognition system based on a GCNDT-MacBERT neural network framework in the present application;
FIG. 2 is a syntactic dependency tree generated using the CoreNLP natural language processing toolkit in the present application.
Detailed Description
The present application will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the application in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present application.
The embodiment of the application discloses a method and a system for recognizing speech relation based on a GCNDT-MacBERT neural network framework, as shown in fig. 1 and 2, comprising the following steps: step S1: preprocessing data (each sample is two texts and corresponding categories) to obtain a word sequence and a syntactic dependency tree of the texts. And processing the text through a natural language processing tool kit to obtain a word sequence and a syntactic dependency tree of the text. And processing the text by using a Stanford CoreNLP natural language processing tool kit to obtain a word sequence and a syntactic dependency tree. And preprocessing the text data by using a CoreNLP natural language processing tool kit to obtain respective word sequences and syntactic dependency tree structures of the two image texts.
Experiments were performed on two datasets, the Suda-CDTB and CoNLL-CDTB datasets, with raw text sources for the two datasets shown in Table 1 and sample type statistics for the Suda-CDTB and CoNLL-CDTB datasets shown in tables 2 and 3. And respectively analyzing the two image text contents of each sample by using a CoreNLP natural language processing tool kit to obtain a word sequence and a syntactic dependency tree structure of the text.
Table 1 total data set sample size and field of art
TABLE 2Suda-CDTB data set sample type and quantity statistics
TABLE 3CoNLL-CDTB data set sample type and quantity statistics
Step S2: and establishing a pre-training model, dividing the text according to the pre-training model (MacBERT and the like) to obtain a word sequence of the text, and inputting the word sequence of the text into the pre-training model to obtain a word vector sequence containing the context information. The word segmentation device of the pre-training model (MacBERT, roBERTa and the like) is used for processing two image texts to obtain a word sequence of the texts, then the word sequences of the two image texts are spliced and input into the pre-training model to obtain a word vector sequence containing context information.
Step S2 includes the steps of: step S2.1: the pre-training model comprises a word segmentation device and a word list, the word segmentation device of the pre-training model is used for respectively segmenting two texts to obtain a word sequence, a position sequence and a fragment sequence which are the same as the word vector sequence in length are generated, and then the word sequences, the position sequences and the fragment sequences of the two texts are spliced in sequence.
And respectively dividing the texts of the two fragments by a word divider of the pre-training model to obtain word sequences (word sequences are the word division results of English texts), and then respectively splicing the respective sequences of the two fragments according to the sequence of the following formula. Generating a position sequence and a segment sequence with the same length as the two segment word vector sequences respectively, wherein the position sequence of the segment 1 is increased by 1 step from 0, the position sequence of the segment 2 is also increased by 1 step from the last element of the segment 1, for the segment sequence, the segment sequence of the segment 1 is set to 0, and the segment sequence of the segment 2 is set to 1; the respective three sequences of the two images are then spliced in the order of the following formulas, respectively.
Arg 1 A sequence of word vectors representing an image 1; CLS represents the header tag added by the pre-training model; SEP represents the segmentation label added by the pre-training model; EOS represents a header tag added by the pre-training model;representing a word vector in image 1; arg (Arg) 2 A sequence of word vectors representing image 2; />Representing a word vector in image 2; subscript M represents the number of words in the number 1; subscript N represents the number of words in the number 2; e, e 0 ~e M+N+3 Representing the new word vector after merging.
Step S2.2: the pre-training model comprises a plurality of embedded layers, the word sequence, the position sequence and the fragment sequence in the step S2.1 are respectively input into the corresponding embedded layers to obtain a corresponding embedded vector sequence, and the plurality of embedded vector sequences are added to obtain a final input vector sequence. The pre-training model has three embedded layers, namely Word Embedding, position Embedding and Segment Embedding, the three sequences in the step S2.1 are respectively input into the corresponding embedded layers to obtain corresponding embedded vector sequences (Word vector sequences, position vector sequences and segment vector sequences), and the three embedded vector sequences are added (corresponding position addition) to obtain a final input vector sequence. The Word vector sequence is generated by Word encoding, the position vector sequence is generated by Position Embedding, and the segment vector sequence is generated by Segment Embedding.
Step S2.3: the pre-training model comprises a Transformer Encoder neural network, an input vector sequence is input into a Transformer Encoder neural network of the pre-training model, and a word vector sequence containing context information is obtained through calculation. Inputting the input vector sequence into a Transformer Encoder neural network module of the pre-training model, and obtaining a word vector sequence [ h ] containing context information through calculation of a multi-layer module 0 ,h 1 ,…,h M+N+2 ,h M+N+3 ];h 0 ~h M+N+3 Representation e 0 ~e M+N+3 And inputting the information word vector sequence obtained by fusing the upper and lower information into the pre-training model.
Step S3: and (3) fusing the word vector sequence in the step (S2) according to the word sequence obtained in the step (S1) to obtain the word vector sequence. And fusing the word vector sequences to obtain word vector sequences. And according to the word sequence generated by the CoreNLP natural language processing tool kit in the step S1, respectively fusing the word vector sequences of the two fragments obtained in the step S2 to obtain the word vector sequences of the two fragments.
Step S3 includes the steps of: step S3.1: after the word vector sequence of step S2.3 is obtained, the word vector sequence is split into a plurality of independent word vector sequences according to the original sequence lengths of the plurality of texts. After the word vector sequence of S2.3 is obtained, dividing the word vector sequence into two independent word vector sequences according to the original sequence length of the two fragments, namely a word vector sequence of fragment 1 and a word vector sequence of fragment 2;
representing the segmented word vector of the figure 1; />Representing the segmented word vector of the figure 2; h is a M+1 Representing word vectors before segmentation; m represents the number of words of the figure 1; n represents the number of words of figure 2.
Step S3.2: according to the word sequence generated by the natural language processing tool kit in the step S1, respectively fusing the plurality of independent word vector sequences in the step S3.1 into word vectors to obtain a word vector sequence. According to the word sequence generated by the CoreNLP natural language processing tool kit in the step S1, respectively fusing a plurality of independent word vector sequences in the step S3.1 into word vectors to obtain a word vector sequence.
Step S4: and (3) establishing a graph convolution neural network module, converting the syntactic dependency tree in the step (S1) into a graph structure taking words as nodes, and then obtaining word vector sequences with more information by using the graph convolution neural network module. Obtaining an adjacency matrix taking words as nodes according to the syntactic dependency tree generated in the step S1, and then fusing the syntactic characteristics of the word vector sequence by using a graph convolution neural network module. In the step S1, a syntactic dependency tree generated by a CoreNLP natural language processing tool package is a tree structure formed by taking words as nodes, and an adjacency matrix taking the words as nodes is obtained according to the syntactic dependency tree; and then fusing the grammar characteristics of the word vector sequence by using a graph convolution neural network module.
Step S4 includes the steps of: in step S4.1, the syntactic dependency tree generated by the natural language processing tool kit in step S1 is a tree structure formed by taking words as nodes, the tree structure is a special graph structure, and the tree structure is also an undirected graph structure in nature, so that an adjacency matrix taking words as nodes is obtained according to the syntactic dependency tree.
And S4.2, inputting the word vector sequence and the adjacency matrix into a graph convolution neural network module, and obtaining the word vector sequence containing the grammar relation characteristics through the graph neural network module.
H l+1 =f(H l ,A)
H 0 X is the input of the first layer; x represents a sequence of word vectors and,the dimension representing X is N'. Times.D,>is the sign of the vector dimension, ">N' is the number of nodes of the graph, D is the dimension of the feature vector of each node, A is the adjacency matrix, f is the convolution function, H l Representing the input of the first layer.
Step S5: the method comprises the steps of establishing a CNN-high way module, comprising a convolutional neural network and high way, inputting a word vector sequence into the convolutional neural network, carrying out vector feature fusion on the word vector sequence by utilizing the convolutional neural network, mapping the word vector sequence into a vector with a fixed length, extracting more information of the word vector sequence through the high way, wherein the information/feature is an abstract concept, and the high way is essentially that the information represented and contained by the vector can be more. Vector feature fusion is carried out through a CNN-Highway module, and the vector feature fusion is mapped to a vector with fixed length; inputting the word vector sequence into a convolutional neural network, setting a plurality of convolutional layers with convolutional kernels, fusing information in a text by using the convolutional kernels, mapping the word vector sequence into vectors with fixed lengths through a pooling layer in the convolutional neural network, and finally extracting more information by using Highway. The CNN-Highway module contains two sub-modules, CNN (convolutional neural network) and Highway.
And the CNN-Highway module utilizes a convolutional neural network and a maximum pooling layer to perform vector feature fusion on the word vector sequence and map the word vector sequence into a vector with a fixed length, and then extracts more features through the Highway module. Vector feature fusion is carried out on the word vector sequence through a CNN-high way module, the word vector sequence is mapped to a vector with a fixed length, a Convolutional Neural Network (CNN) is input first, n-gram information in a convolutional kernel fusion text is utilized, the word vector sequence is mapped to the vector with the fixed length through a maximum pooling layer, and finally more features are extracted through the high way module.
Step S5 comprises the following specific steps: s5.1, fusing word vector sequences by using a Convolutional Neural Network (CNN); setting a plurality of convolution layers with different convolution kernel sizes, firstly inputting a word vector sequence into a Convolution Neural Network (CNN), fusing n-gram information in a text by using convolution operation, and mapping the word vector sequence into a vector with a fixed length through a maximum pooling layer, wherein the formula is as follows:
representing vectors after operation of a convolution function and a pooling layer with a convolution kernel of c, the subscript c representing the size of the convolution kernel, reLU representing a nonlinear function, conv c A convolution function representing a convolution kernel of size c, < >>Representing the i-th word vector,>representing the i+c word vector.
And S5.2, finally, extracting more features by using a Highway module, wherein the formula is as follows:
o 1 =g 1 ⊙u 1′ +(1-g 1 )⊙u 1
representing vectors after a convolution function with a convolution kernel z and a pooling layer operation, flatten merges vector sequences into vectors, W h 、W g Representing parameters in the module, superscript T representing the matrix transpose, sigmod, reLU representing the nonlinear function, o 1 An area vector of a final fixed length is indicated, and the symbol ≡indicates an inner product operation.
Step S6: and (3) inputting the word vector sequence obtained in the step (S5) into a multi-layer perceptron by adopting the multi-layer perceptron, obtaining a classification result by using a Softmax function, and obtaining a logic relationship between texts according to the classification result.
The multi-layer perceptron is a basic module in the field of deep learning, the characterization vectors of two arments can be directly spliced and input into the multi-layer perceptron, the final length of the vector is 4, the vector is normalized by using a Softmax function to be used as the score of each category, and the category with the highest score is the result of model prediction. The logical relationship between the two texts can be known according to the classified categories, and help is provided for downstream tasks. The output is a defined type, e.g., causal, turn, etc.; the task of the project is to get this category.
Step S6 comprises the following specific steps: and S6.1, obtaining two fixed-length characterization vectors of the image through the step S5, then splicing the two vectors, and inputting the two vectors into the multi-layer perceptron, wherein the final length of the vector is 4. The token vector is a fixed length vector.
Step S6.2, normalization is then carried out by using a Softmax function to obtain a score of each category, and the category with the highest score is the result of model prediction. Each dataset itself defines a category.
In addition to the deep learning model-GCNDT-MacBERT neural network framework, the GCNDT-MacBERT model (MacBERT) without GCN is added as comparison, experiments are carried out on the CoNLL-CDTB and Suda-CDTB data sets together with the previous model, and the performance difference of different models is compared through the Acc index and the F1 index.
Acc represents the accuracy that we predict, which in binary classification statistical analysis represents how many samples predicted to be positive are truly positive samples.
F1 is a measure of test accuracy. It is calculated from the accuracy of the test and the recall, where in a binary classification statistical analysis, the accuracy is the number of true positive results divided by the number of all positive results, including incorrectly identified results, and the recall is the number of true positive results divided by the number of all samples that should be determined to be positive; f1 can also be extended to multi-classification problems as well.
Table 4 summarizes the results of the GCNDT-MacBERT and all baseline methods, where the optimal results have been highlighted in bold. It is apparent that the proposed GCNDT-MacBERT exhibits optimal performance on all indicators on each dataset. The GCNDT-MacBERT better learns the context information of the text through a pre-training model, and extracts the grammar information of the text through the structure of the syntactic dependency tree and the graph convolution neural network module; the increase in Acc on the CoNLL-CDTB and Suda-CDTB datasets was 5.4% and 13.96% compared to the suboptimal TTN, respectively.
Table 4 representation of models on database
The application provides a speech relation recognition model of a GCNDT-MacBERT neural network framework based on a transducer and a GCN neural network, which uses a MacBERT, roBERTa pre-training model to obtain a context expression vector, then fuses word vectors (English text does not need one step) to obtain word vectors, and learns the grammar characteristics of the text through a graph convolution network. The application uses CoreNLP natural language processing tool package to extract syntactic dependency tree, and then uses graph convolution neural network to extract characteristic grammar characteristics. The application relates to the field of text classification and speech relation between two texts, fully utilizes context information and grammar relation in the texts, and can greatly promote the model to learn the internal semantics of the texts by using a pre-training model and a graph convolution network.
Those skilled in the art will appreciate that the application provides a system and its individual devices, modules, units, etc. that can be implemented entirely by logic programming of method steps, in addition to being implemented as pure computer readable program code, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Therefore, the system and various devices, modules and units thereof provided by the application can be regarded as a hardware component, and the devices, modules and units for realizing various functions included in the system can also be regarded as structures in the hardware component; means, modules, and units for implementing the various functions may also be considered as either software modules for implementing the methods or structures within hardware components.
The foregoing describes specific embodiments of the present application. It is to be understood that the application is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the application. The embodiments of the application and the features of the embodiments may be combined with each other arbitrarily without conflict.

Claims (10)

1. A speech relation recognition method based on a GCNDT-MacBERT neural network framework is characterized by comprising the following steps:
step S1: preprocessing data to obtain a word sequence and a syntactic dependency tree of a text;
step S2: establishing a pre-training model, dividing the text according to the pre-training model to obtain a word sequence of the text, and inputting the word sequence of the text into the pre-training model to obtain a word vector sequence containing context information;
step S3: fusing the word vector sequences of the step S2 according to the word sequences obtained in the step S1 to obtain word vector sequences;
step S4: establishing a graph convolution neural network module, converting the syntactic dependency tree in the step S1 into a graph structure taking words as nodes, and then obtaining word vector sequences with more information by using the graph convolution neural network module;
step S5: establishing a CNN-Highway module, comprising a convolutional neural network and Highway, inputting a word vector sequence into the convolutional neural network, carrying out vector feature fusion on the word vector sequence by utilizing the convolutional neural network, mapping the vector feature fusion into a vector with a fixed length, and extracting more information of the word vector sequence by the Highway;
step S6: and (3) inputting the word vector sequence obtained in the step (S5) into a multi-layer perceptron by adopting the multi-layer perceptron, obtaining a classification result by using a Softmax function, and knowing the logical relationship between texts according to the classification result.
2. The method for recognizing speech relation based on GCNDT-MacBERT neural network framework according to claim 1, wherein in step S1, the text is processed by a natural language processing tool kit to obtain word sequence and syntactic dependency tree of the text.
3. The speech relationship recognition method based on GCNDT-MacBERT neural network framework according to claim 2, wherein the step S2 comprises the steps of:
step S2.1: the pre-training model comprises a word segmentation device and a word list, a word sequence is obtained by respectively segmenting a plurality of texts through the word segmentation device of the pre-training model, a position sequence and a fragment sequence which are the same as the length of the word vector sequence are generated, and then the respective word sequences, the position sequences and the fragment sequences of the texts are spliced in sequence;
step S2.2: the pre-training model comprises a plurality of embedded layers, the word sequence, the position sequence and the fragment sequence in the step S2.1 are respectively input into the corresponding embedded layers to obtain corresponding embedded vector sequences, and the plurality of embedded vector sequences are added to obtain a final input vector sequence;
step S2.3: the pre-training model comprises a Transformer Encoder neural network, an input vector sequence is input into a Transformer Encoder neural network of the pre-training model, and a word vector sequence containing context information is obtained through calculation.
4. The speech relationship recognition method based on GCNDT-MacBERT neural network framework of claim 3, wherein said step S3 comprises the steps of:
step S3.1: after the word vector sequence in the step S2.3 is obtained, dividing the word vector sequence into a plurality of independent word vector sequences according to the original sequence lengths of a plurality of texts;
step S3.2: according to the word sequence generated by the natural language processing tool kit in the step S1, respectively fusing the plurality of independent word vector sequences in the step S3.1 into word vectors to obtain a word vector sequence.
5. The method for recognizing speech relations based on the GCNDT-MacBERT neural network framework according to claim 1, wherein the step S4 further comprises: obtaining an adjacency matrix taking words as nodes according to the syntactic dependency tree generated in the step S1, and then fusing the syntactic characteristics of the word vector sequence by using a graph convolution neural network module.
6. The speech relationship recognition method based on GCNDT-MacBERT neural network framework according to claim 1, wherein the step S5 performs vector feature fusion by CNN-Highway module and maps it to a vector of fixed length; inputting a word vector sequence into a convolutional neural network, setting a plurality of convolutional layers with convolutional kernels, fusing information in a text by using the convolutional kernels, mapping the vector sequence into a vector with a fixed length through a pooling layer in the convolutional neural network, and finally extracting more information by using Highway.
7. A GCNDT-MacBERT neural network framework-based speech relationship recognition system, characterized in that a GCNDT-MacBERT neural network framework-based speech relationship recognition method according to any one of claims 1 to 6 is applied, comprising the following modules:
module M1: preprocessing data to obtain a word sequence and a syntactic dependency tree of a text;
module M2: establishing a pre-training model, dividing the text according to the pre-training model to obtain a word sequence of the text, and inputting the word sequence of the text into the pre-training model to obtain a word vector sequence containing context information;
module M3: fusing the word vector sequences of the module M2 according to the word sequences obtained by the module M1 to obtain word vector sequences;
module M4: establishing a graph convolution neural network module, converting a syntactic dependency tree of the module M1 into a graph structure taking words as nodes, and then obtaining word vector sequences with more information by using the graph convolution neural network module;
module M5: establishing a CNN-Highway module, comprising a convolutional neural network and Highway, inputting a word vector sequence into the convolutional neural network, carrying out vector feature fusion on the word vector sequence by utilizing the convolutional neural network, mapping the vector feature fusion into a vector with a fixed length, and extracting more information of the word vector sequence by the Highway;
module M6: and (3) inputting the word vector sequence obtained by the module M5 into the multi-layer perceptron by adopting the multi-layer perceptron, obtaining a classification result by using a Softmax function, and knowing the logical relationship between texts according to the classification result.
8. The speech relationship recognition system based on GCNDT-MacBERT neural network framework of claim 7, wherein the module M1 processes the text by a natural language processing tool kit to obtain a word sequence and a syntactic dependency tree of the text.
9. The speech relationship recognition system based on GCNDT-MacBERT neural network framework of claim 8, wherein the module M2 comprises the steps of:
module M2.1: the pre-training model comprises a word segmentation device and a word list, a word sequence is obtained by respectively segmenting a plurality of texts through the word segmentation device of the pre-training model, a position sequence and a fragment sequence which are the same as the length of the word vector sequence are generated, and then the respective word sequences, the position sequences and the fragment sequences of the texts are spliced in sequence;
module M2.2: the pre-training model comprises a plurality of embedded layers, the word sequence, the position sequence and the fragment sequence in the module M2.1 are respectively input into the corresponding embedded layers to obtain corresponding embedded vector sequences, and the plurality of embedded vector sequences are added to obtain a final input vector sequence;
module M2.3: the pre-training model comprises a Transformer Encoder neural network, an input vector sequence is input into a Transformer Encoder neural network of the pre-training model, and a word vector sequence containing context information is obtained through calculation.
10. The speech relationship recognition system based on GCNDT-MacBERT neural network framework of claim 9, wherein the module M3 comprises the steps of:
module M3.1: after the word vector sequence of the module M2.3 is obtained, dividing the word vector sequence into a plurality of independent word vector sequences according to the original sequence lengths of a plurality of texts;
module M3.2: according to the word sequence generated by the natural language processing tool kit in the module M1, respectively fusing a plurality of independent word vector sequences of the module M3.1 into word vectors to obtain a word vector sequence.
CN202110826586.7A 2021-07-21 2021-07-21 Speech relation recognition method and system based on GCNDT-MacBERT neural network framework Active CN113609267B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110826586.7A CN113609267B (en) 2021-07-21 2021-07-21 Speech relation recognition method and system based on GCNDT-MacBERT neural network framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110826586.7A CN113609267B (en) 2021-07-21 2021-07-21 Speech relation recognition method and system based on GCNDT-MacBERT neural network framework

Publications (2)

Publication Number Publication Date
CN113609267A CN113609267A (en) 2021-11-05
CN113609267B true CN113609267B (en) 2023-11-07

Family

ID=78305072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110826586.7A Active CN113609267B (en) 2021-07-21 2021-07-21 Speech relation recognition method and system based on GCNDT-MacBERT neural network framework

Country Status (1)

Country Link
CN (1) CN113609267B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116187163B (en) * 2022-12-20 2024-02-20 北京知呱呱科技有限公司 Construction method and system of pre-training model for patent document processing
CN116738359A (en) * 2023-05-23 2023-09-12 内蒙古工业大学 Mongolian multi-mode emotion analysis method based on pre-training model and high-resolution network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299262A (en) * 2018-10-09 2019-02-01 中山大学 A kind of text implication relation recognition methods for merging more granular informations
CN111259142A (en) * 2020-01-14 2020-06-09 华南师范大学 Specific target emotion classification method based on attention coding and graph convolution network
CN111695341A (en) * 2020-06-16 2020-09-22 北京理工大学 Implicit discourse relation analysis method and system based on discourse structure diagram convolution
WO2020211275A1 (en) * 2019-04-18 2020-10-22 五邑大学 Pre-trained model and fine-tuning technology-based medical text relationship extraction method
CN112613273A (en) * 2020-12-16 2021-04-06 上海交通大学 Compression method and system of multi-language BERT sequence labeling model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299262A (en) * 2018-10-09 2019-02-01 中山大学 A kind of text implication relation recognition methods for merging more granular informations
WO2020211275A1 (en) * 2019-04-18 2020-10-22 五邑大学 Pre-trained model and fine-tuning technology-based medical text relationship extraction method
CN111259142A (en) * 2020-01-14 2020-06-09 华南师范大学 Specific target emotion classification method based on attention coding and graph convolution network
CN111695341A (en) * 2020-06-16 2020-09-22 北京理工大学 Implicit discourse relation analysis method and system based on discourse structure diagram convolution
CN112613273A (en) * 2020-12-16 2021-04-06 上海交通大学 Compression method and system of multi-language BERT sequence labeling model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于BLSTM算法和字词融合的文本情感分类方法;陈欣;杨小兵;姚雨虹;中国计量大学学报(第002期);225-232 *

Also Published As

Publication number Publication date
CN113609267A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
Jung Semantic vector learning for natural language understanding
CN108984526B (en) Document theme vector extraction method based on deep learning
JP2021096812A (en) Method, apparatus, electronic device and storage medium for processing semantic representation model
TWI662425B (en) A method of automatically generating semantic similar sentence samples
WO2022141878A1 (en) End-to-end language model pretraining method and system, and device and storage medium
CN112101041B (en) Entity relationship extraction method, device, equipment and medium based on semantic similarity
CN112347268A (en) Text-enhanced knowledge graph joint representation learning method and device
CN108090070B (en) Chinese entity attribute extraction method
Suleiman et al. The use of hidden Markov model in natural ARABIC language processing: a survey
CN110532328B (en) Text concept graph construction method
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN111832293B (en) Entity and relation joint extraction method based on head entity prediction
CN113609267B (en) Speech relation recognition method and system based on GCNDT-MacBERT neural network framework
WO2023108991A1 (en) Model training method and apparatus, knowledge classification method and apparatus, and device and medium
CN110717045A (en) Letter element automatic extraction method based on letter overview
CN112328800A (en) System and method for automatically generating programming specification question answers
CN113157859A (en) Event detection method based on upper concept information
CN112000802A (en) Software defect positioning method based on similarity integration
CN112541337A (en) Document template automatic generation method and system based on recurrent neural network language model
CN115098706A (en) Network information extraction method and device
CN113806493A (en) Entity relationship joint extraction method and device for Internet text data
CN114417851A (en) Emotion analysis method based on keyword weighted information
CN112925904A (en) Lightweight text classification method based on Tucker decomposition
CN114064901B (en) Book comment text classification method based on knowledge graph word meaning disambiguation
CN111178080A (en) Named entity identification method and system based on structured information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant