CN114091432A

CN114091432A - Method and device for extracting traffic quality inspection violation reasons based on multi-task learning

Info

Publication number: CN114091432A
Application number: CN202111461750.5A
Authority: CN
Inventors: 华旭明
Original assignee: Shanghai Chuang Frame Software Co ltd
Current assignee: Shanghai Chuang Frame Software Co ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-02-25

Abstract

The invention provides a method and a device for extracting traffic quality inspection violation reasons based on multitask learning, wherein the method comprises the following steps: carrying out data preprocessing on the original sentence recording, and processing the original sentence recording into a form which can be processed by a pre-training model; reconstructing the original sentence into a form of a dependency syntax tree, processing the tree structure into an adjacent matrix which can be processed by a graph convolution neural network, and processing syntax structure characteristics through the graph convolution neural network; and performing joint training on the syntactic structure processed by the graph convolution neural network and the machine reading understanding model, and taking the obtained result as the positions of the initial and end point indexes of the violation reasons predicted by the model. The invention greatly improves the information extraction capability of the model, and the extracted violation reasons are more in line with Chinese grammar rules.

Description

Method and device for extracting traffic quality inspection violation reasons based on multi-task learning

Technical Field

The invention relates to the technical field of natural language processing, in particular to a method and a device for extracting traffic quality inspection violation reasons based on multi-task learning.

Background

The telephone traffic quality inspection system is a system for detecting whether manual calls such as customer service, urging, etc. violate the content specified by the state and the enterprise. In the mode, the customer service and the collector can easily bypass the keywords and achieve the purpose in a new violation form, so that irreparable loss can be caused to customers and companies. With the continuous development of natural language processing technology in artificial intelligence, the problem is gradually solved, and through the natural language processing technology, a machine can understand the meaning of a Chinese sentence to a certain degree, so that whether the sentence violates rules or not is checked, and a corresponding violation part is extracted.

The technology used for extracting the illegal part of the dialogue is the information extraction technology in natural language processing, and the information extraction technology mainly utilizes a relevant machine learning method to automatically extract some factual information in the free text. Currently, information extraction mainly includes research on named entity identification, relationship extraction, event extraction and the like. The event extraction mainly researches how to extract events which are interesting to a user from unstructured texts, and describes the events in a structured text form so that the user can further inquire, trace and analyze the events, and the event extraction is a very important research direction in the field of natural language processing.

The event extraction aims at the modified document, and predicts an event description, an event trigger word, an element corresponding to the event and a role corresponding to the element. However, most of the existing Chinese event extraction models adopt a pipeline mode, namely, an event trigger word is recognized firstly, and then an event element is recognized. In addition, the extraction task for the violation reason does not have the event trigger words, event elements and the like in the traditional event extraction paradigm.

Disclosure of Invention

In order to solve the problem that the existing model can not process the violation reason extraction task well, the invention aims to provide a method and a device for extracting the violation reason of the telephone traffic quality inspection based on multi-task learning, which have more grammatical and logical results.

In order to solve the problems, the technical scheme of the invention is as follows:

a traffic quality inspection violation reason extraction method based on multitask learning comprises the following steps:

carrying out data preprocessing on the original sentence recording, and processing the original sentence recording into a form which can be processed by a pre-training model;

reconstructing the original sentence into a form of a dependency syntax tree, processing the tree structure into an adjacent matrix which can be processed by a graph convolution neural network, and processing syntax structure characteristics through the graph convolution neural network; and

and performing joint training on the syntactic structure processed by the graph convolution neural network and the machine reading understanding model, and taking the obtained result as the positions of the initial and end point indexes of the violation reasons predicted by the model.

Optionally, the data preprocessing of the original sentence recording specifically includes: and (3) respectively processing the Chinese characters and the pinyin by adopting two BERT models, splicing, then connecting with a softmax function, and finally performing cross entropy loss calculation.

Optionally, the step of performing joint training on the syntactic structure processed by the graph convolution neural network and the machine reading understanding model, and taking an obtained result as the positions of the initial and end point indexes of the violation cause predicted by the model specifically includes: the syntax structure information is processed through a graph convolution neural network model, violation type information is processed through a machine reading understanding model to obtain related violation reasons, and a pre-training model is used for sharing a pre-training network structure part and corresponding parameters and output implicit representation.

Optionally, the hidden representation is divided into two parts, one part is a hidden representation dimension transformation calculation loss function part in the machine reading understanding model, and the other part is a new hidden representation obtained by combining the adjacency matrix input graph convolution neural network part, and then the dimension transformation calculation loss function is performed.

Optionally, the weight parameters are dynamically updated by using a gradient update formula using a gradnorm method, and the Loss function Grad Loss defining the gradient is the sum of absolute values of differences between actual gradient norms and ideal gradient norms of each task:

wherein:

the above-mentioned

Is the actual gradient norm, is the weighted loss w of task i_i(t)L_i(t), is the L2 norm of the gradient of the neural network parameter W that needs to be updated;

is the norm of the ideal gradient and is,

is obtained for all tasks

Average value of (d);

is the reverse training speed for task i,

the larger L_iThe larger (t) the slower the training; r is_i(t) is the relative reverse training speed for task i.

Optionally, the

And

the magnitude of the penalty for balancing, when a task penalty is too great,

will be greater than

A large gradient is generated, and the weight parameter is reduced; [ r ] of_i(t)]^αIs used to balance the training speed, i.e. when a task is trained too fast [ r_i(t)]^αIt becomes smaller, resulting in a large gradient penalty and hence a smaller weight parameter.

Furthermore, the invention also provides a device for extracting the violation reason of the telephone traffic quality control based on the multitask learning, which comprises the following steps:

the data preprocessing module is used for preprocessing the original sentence recording and processing the original sentence recording into a form which can be processed by a pre-training model;

the graph convolution neural network processing module is used for reconstructing the original sentence into a dependency syntax tree form, processing the tree structure into an adjacent matrix which can be processed by a graph convolution neural network, and processing the syntax structure characteristics through the graph convolution neural network;

and the joint training module is used for carrying out joint training on the syntactic structure processed by the graph convolution neural network and the machine reading understanding model, and taking the obtained result as the positions of the initial point and the end point index of the violation cause predicted by the model.

Compared with the prior art, the method and the device fully utilize the grammatical structure information of the Chinese text sentence, utilize the graph convolution neural network and the machine reading understanding task to carry out multi-task learning, fully utilize the data of different tasks, and improve the evaluation index ROUGE from 0.43 to 0.54 compared with the method of singly using the machine reading understanding task to carry out violation reason extraction, thereby greatly improving the capability of extracting information by a model, and the extracted violation reasons are more in line with the Chinese grammatical rules.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a flow chart of a method for extracting a traffic quality inspection violation cause based on multitask learning according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an overall structure of a multi-task learning model provided by an embodiment of the invention;

FIG. 3 is a schematic diagram of a machine reading understanding model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a graph convolution neural network model structure provided by an embodiment of the present invention;

FIG. 5 is a diagram illustrating a result fusion of a machine reading understanding model and a graph convolution neural network model according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a device for extracting traffic quality inspection violation causes based on multi-task learning according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

Specifically, as shown in fig. 1, the present invention provides a method for extracting traffic quality inspection violation causes based on multitask learning, where the method includes the following steps:

s1: carrying out data preprocessing on the original sentence recording, and processing the original sentence recording into a form which can be processed by a pre-training model;

specifically, for example:

original sentence: i speak that you owe money with your relatives and friends to let your relatives and friends know that you owe money

Type of violation: threatening threat of scare

The violation causes: telling your money not to be paid with your relatives and friends

Because the original sentence is converted into a text form from the recording through the intelligent voice algorithm, there are inaccurate places, and a certain correction needs to be carried out by using a text error correction algorithm. The text error correction model not only extracts Chinese character features, but also uses standard pinyin as features, the model structure respectively adopts two BERT models to process Chinese characters and pinyin, and after splicing, the model structure is connected with a softmax function, and finally cross entropy loss calculation is carried out. For a machine reading understanding task, an original sentence needs to be constructed into a problem form, violation types are spliced to the original sentence and then serve as samples for machine reading understanding, new samples need to be processed into a form which can be processed by a model after being obtained, words are uniquely corresponding to serial numbers of word banks in a pre-training model by using tokenizer of the pre-training model, and meanwhile, label information is mapped into a digital form, so that the pre-training model can process input texts.

S2: reconstructing the original sentence into a form of a dependency syntax tree, processing the tree structure into an adjacent matrix which can be processed by a graph convolution neural network, and processing syntax structure characteristics through the graph convolution neural network;

for the neural network of the graph, it is first required to transform the original sentence into the form of a dependency syntax tree, the dependency syntax analysis is to determine the syntax structure of the sentence by analyzing the dependency relationship between words in the sentence, and the result of the DDParser processing by the hundred degree open source syntax analysis tool DDParser, as in the above example, is "[ { ' word ' [ ', ' follow ', ' of ', ' friends ', ' say ', ' you ', ' owen ', ' not ', ' also ', ' let ', ' you ', ' family ', ' friend ', ' know ' ]head ': 7,7,6,3,6,2,13,9,7,11,12,13,0,16,16,13,13], ' deprel ': SBV ', ' ATT ', ' POB ',

'SBV', 'VOB', 'ADV', 'ADV', 'HED', 'ATT', 'ATT', 'DBL', 'DBL' ] }, wherein word is followed by a word that reacts to the tree syntax structure of the whole sentence, and the word is interpreted from the tree structure, and the number n inside represents that the current word is a subtree node of the nth word, thus forming a complete tree structure.

The method comprises the steps of for an independent reading understanding task model, the structure is relatively simple, only the preprocessed result needs to be input into a pre-training model Roberta to obtain the hidden representation of each word, dimension transformation is conducted on the hidden representation of the multi-dimensional vector by respectively taking the starting position and the ending position of a reason as classification targets, in the model training stage, a loss function is calculated by using a transformed two-dimensional matrix and a two-dimensional matrix of labeled data, then back propagation is conducted according to loss to update model parameters, and in the prediction stage, the result obtained after the hidden representation of each word is processed by an argmax function is used as the position of the initial and end point indexes of the violation reason predicted by the model.

For an individual graph neural network model, the hidden representation of a word obtained by a pre-training model Roberta and an adjacent matrix obtained in feature engineering are used as the input of a graph convolution neural network, the graph convolution neural network outputs a group of hidden representations corresponding to each word combined with a syntactic structure, the hidden representations are obtained by updating the graph convolution neural network, and the graph convolution neural network formula is used for reference:

wherein

Is the self-circulation of the adjacency matrix plus the graph vertices, D is the degree matrix of the graph

Since the neighboring edges (degrees) of the graph vertices are not the same, regularization, i.e., division by d, is required to reduce the variance_i. The hidden representation exists in a multi-dimensional matrix form, so that dimension transformation and pooling operation needs to be carried out on the hidden representation, the hidden representation is converted into vector representation of a word corresponding to a number, similarly, in a model training stage, a loss function is calculated by using a processed matrix vector and a two-dimensional matrix of labeled data, then back propagation of a neural network is carried out to update model parameters, in a prediction stage, argmax function processing is carried out, and obtained results are used as positions of indexes of a starting point and an end point of violation reasons of model prediction.

S3: and performing joint training on the syntactic structure processed by the graph convolution neural network and the machine reading understanding model, and taking the obtained result as the positions of the initial and end point indexes of the violation reasons predicted by the model.

The syntax structure information can be processed through a graph convolution neural network model, violation type information can be processed through a machine reading understanding model to obtain related violation causes, a pre-training model is used on a network structure, a pre-training network structure part, corresponding parameters and output hidden representations are shared, the hidden representations are divided into two parts, one part is a hidden representation dimension transformation calculation loss function part in the machine reading understanding model, the other part is a new hidden representation obtained by combining an adjacent matrix input graph convolution neural network part, and then a dimension transformation calculation loss function is carried out.

It is particularly noted that the two calculated losses cannot be directly added, because the two losses have very different orders of magnitude, and if the two losses are directly added, the gradient is unbalanced and the convergence speeds of different tasks are inconsistent, thereby affecting the result of the whole model training. In this embodiment, the present invention introduces a weight balance gradient, uses a gradnorm method, which is an optimization method for dynamically adjusting weight parameters of a Loss function by using a gradient, and can dynamically update the weight parameters by using a gradient update formula, where a Loss function Grad Loss defining a gradient is the sum of absolute values of differences between actual gradient norms and ideal gradient norms of each task:

wherein:

the above-mentioned

is the norm of the ideal gradient and is,

is obtained for all tasks

Average value of (d);

is the reverse training speed for task i,

And

is of the order of the balance loss, when a task loss is of the order of magnitude too great,

will be greater than

A large gradient is generated, and the weight parameter is reduced; [ r ] of_i(t)]^αIs used to balance the training speed, i.e. when a task is trained too fast [ r_i(t)]^αIt becomes smaller, resulting in a large gradient penalty and hence a smaller weight parameter. Therefore, by using the gradnorm method, the problem of large loss magnitude gap between multiple tasks can be solved.

Specifically, as shown in FIG. 2, Roberta is a pre-training model module; MRC is a machine reading understanding module; the GCN is a graph convolution neural network module; the CONCAT is a module for fusing the results obtained by the MRC and the GCN. In the figure, the sentence text is mapped into word directionInputting a Roberta pre-training model after quantity splicing to obtain a new vector H fused with context information_i(implicit representation), i.e. in the figures

Vectors corresponding to word level, paragraph level, and position level, respectively.

As shown in FIG. 3, a new vector H is output by the pre-training model module_iObtaining a new hidden representation H 'after changing dimensionality through a multilayer neural network (MLP)'_i. As shown in FIG. 4, A_ij、D_ijRespectively, an adjacency matrix and a degree matrix constructed from the dependency syntax tree of the sentence and a new vector H output by the pre-training model module_iThe input of the convolutional neural network which jointly forms a graph is subjected to multilayer convolution and ReLU function operation to obtain a new implicit expression H ″_i. As shown in FIG. 5, the hidden symbol obtained from the MRC module is H'_iAnd the hidden representation obtained by the GCN model is H_iAfter the dimension is changed through a multilayer neural network (MLP), the splicing is processed by an argmax function to obtain the start and end of output results, namely the start position and the end position of the extracted reason.

As shown in fig. 6, an embodiment of the present invention discloses a device for extracting traffic quality inspection violation causes based on multitask learning, where the device includes:

the data preprocessing module 61 is used for preprocessing the original sentence recording and processing the original sentence recording into a form which can be processed by a pre-training model;

the graph convolution neural network processing module 62 is configured to reform the original sentence into a dependency syntax tree form, process the tree structure into an adjacent matrix that can be processed by the graph convolution neural network, and process the syntax structure characteristics by the graph convolution neural network;

a joint training module 63, configured to perform joint training on the syntactic structure and the machine reading understanding model processed by the graph convolution neural network, and use an obtained result as positions of start and end point indexes of violation causes predicted by the model;

the device for extracting the violation cause of the traffic quality control based on the multitask learning is used for executing the method for extracting the violation cause of the traffic quality control based on the multitask learning.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A traffic quality inspection violation reason extraction method based on multitask learning is characterized by comprising the following steps:

2. The method for extracting the traffic quality inspection violation cause based on multitask learning according to claim 1, wherein the step of performing data preprocessing on the original sentence recording specifically comprises the steps of: and (3) respectively processing the Chinese characters and the pinyin by adopting two BERT models, splicing, then connecting with a softmax function, and finally performing cross entropy loss calculation.

3. The method for extracting the traffic quality inspection violation cause based on the multitask learning according to claim 1, wherein the step of performing joint training on the syntactic structure processed by the graph convolution neural network and the machine reading understanding model to obtain a result as the position of the initial and end point indexes of the violation cause predicted by the model specifically comprises: the syntax structure information is processed through a graph convolution neural network model, violation type information is processed through a machine reading understanding model to obtain related violation reasons, and a pre-training model is used for sharing a pre-training network structure part and corresponding parameters and output implicit representation.

4. The method for extracting the traffic quality inspection violation causes based on the multitask learning according to claim 3, wherein the hidden representation is divided into two parts, one part is a hidden representation dimension transformation calculation loss function part in a machine reading understanding model, the other part is a new hidden representation obtained by combining an adjacency matrix input graph convolution neural network part, and then the dimension transformation calculation loss function is performed.

5. The method for extracting the traffic quality inspection violation cause based on multitask learning according to claim 4, wherein the weight parameter is dynamically updated by using a gradient update formula by using a gradnorm method, and a gradient-defining Loss function Grad Loss is the sum of absolute values of differences between actual gradient norms and ideal gradient norms of each task:

wherein:

the above-mentioned

is the norm of the ideal gradient and is,

is obtained for all tasks

Average value of (d);

is the reverse training speed for task i,

6. The method of claim 5, wherein the method for extracting the cause of the violation of the traffic quality inspection based on the multi-task learning is characterized in that

And

the magnitude of the penalty for balancing, when a task penalty is too great,

will be greater than

A large gradient and thus a weight is generatedReducing the parameters; [ r ] of_i(t)]^αIs used to balance the training speed, i.e. when a task is trained too fast [ r_i(t)]^αIt becomes smaller, resulting in a large gradient penalty and hence a smaller weight parameter.

7. A device for extracting traffic quality inspection violation causes based on multitask learning, the device comprising: