CN111476020B

CN111476020B - Text generation method based on meta reinforcement learning

Info

Publication number: CN111476020B
Application number: CN202010156433.1A
Authority: CN
Inventors: 赵婷婷; 宋亚静; 王嫄; 任德华; 杨巨成
Original assignee: Tianjin University of Science and Technology
Current assignee: Tianjin University of Science and Technology
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2023-07-25
Anticipated expiration: 2040-03-09
Also published as: CN111476020A

Abstract

The invention relates to a text generation method based on meta reinforcement learning, which is technically characterized by comprising the following steps: collecting different types of text data as partitions of different tasks; collecting data of a certain task randomly adopted in the text data; constructing a text generation model by adopting a recursive neural network for processing sequence data; generating K text tracks; performing strategy gradient update on the text generation model for a small number of times by using the text generation track to obtain an updated text generation model; generating a new text track; updating and sampling the text generation model on a plurality of tasks respectively to obtain the expression error of the text generation track; and carrying out secondary gradient update training on the original text generation model parameters until convergence. The invention improves on the basis of text generation by using a recurrent neural network in reinforcement learning, and uses a meta reinforcement learning training agent to transfer experiences learned on a plurality of tasks to a target task, thereby being capable of quickly realizing text generation under different scenes or contexts.

Description

Text generation method based on meta reinforcement learning

Technical Field

The invention belongs to the technical field of computer natural language processing, in particular to a text generation method based on meta reinforcement learning.

Background

Natural Language Processing (NLP), and in particular Natural Language Generation (NLG), has long been considered one of the most challenging computing tasks. Natural language generation is a technology for enabling a computer to have the same expression and writing capability as a person, and can automatically generate a section of high-quality natural language text through planning according to some key information and expression forms thereof in a machine. Generating from the initial pattern matching, organizing and generating text by some simple syntax and grammar rules; to a later time based on a statistical probability model; with the rapid development of deep learning, deep learning-based natural language generation technology has been developed remarkably, and various neural networks have been proposed to generate accurate, natural and diversified texts.

Reinforcement learning (reinforcement learning, RL for short) is an important area of research in machine learning to interact with the environment with a mechanism for trial and error, and to learn an optimal strategy by maximizing cumulative rewards. The technique can be seen as a Markov Decision Process (MDP) using recurrent neural networks to generate text, and its local optimum strategy can be found by reinforcement learning, which has achieved good results in recent studies. However, existing text generation methods are typically developed for a particular field. Whereas natural language in the real world is often multi-domain and text between different domains is consistent in rules of grammar, semantics, etc. In addition, training of neural networks often requires a large amount of data, and labeling of sufficiently learned data takes a significant amount of time and money. Thus, sample collection and scene adaptation is an important bottleneck problem in text generation applications.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a text generation method based on meta reinforcement learning, which is used for solving the bottleneck problems that a language generation model in the real world is rapidly adapted to different scenes to generate text and learning samples are not well collected in individual scenes.

The invention solves the technical problems by adopting the following technical scheme:

a text generation method based on meta reinforcement learning comprises the following steps:

step 1, collecting text data of different types as the division of different tasks;

step 2, randomly taking a task tau from the text data collected in the step 1 _i Data of (2);

step 3, constructing a text generation model f by adopting a recursive neural network for processing sequence data _θ ；

Step 4, generating a model f by using the text _θ Generating K text tracks D _i ；

Step 5, generating a track D by using the text _i Generating a model f for text _θ Performing less policy gradient update to obtain an updated text generation model f _θ '；

Step 6, generating a model f by using the text _θ ' generate a new text track D _i '；

Step 7, repeating the steps 2 to 6, and respectively updating and sampling the text generation model on a plurality of tasks to obtain the expression error of the text generation track;

and 8, performing secondary gradient update training on the original text generation model parameters by using the expression errors of the text generation tracks obtained in the step 7 until convergence.

And (3) collecting different types of text data which are different scenes of natural language in the step (1).

The recurrent neural network in the step 3 is an agent in reinforcement learning, which outputs a probability density function p (y) _t |Y _1:t-1 (d), wherein Y _1:t-1 Generating a state s of a model at time t for text _t Representing the generated character sequence string, y _t Generating action a of model at time t for text _t Representing the currently selected character.

And 4, performing gradient update on the parameters for a small time by adopting a REINFORCE method, and setting the reward function as bilingual evaluation replacement score of the real text data and the generated text data.

Said step 8 uses text to generate model f _θ ' sampling data vs. original generation model f _θ And performing secondary gradient update.

The invention has the advantages and positive effects that:

1. the invention has reasonable design, analyzes the integral association between input information through the recurrent neural network, processes the text sequence to generate, then updates the original model on a plurality of scenes by using the sampling track expression of the updated model, and simultaneously trains model parameters by meta learning, so that the model parameters can realize rapid learning on the new scene text generation task only by a small number of gradient updating. The learning agent is the necessary path to the multi-face agent which can continuously learn multiple new tasks. Therefore, the invention not only can lead the intelligent body to have the capability of fast learning and adapting to new environment, but also has the characteristics of rapidness and accuracy under the condition of less given sample number or limited budget for collecting samples.

2. According to the invention, under the condition that a small amount of texts are generated into the learning samples, the intelligent agent can also adapt to a new scene by using the text generation model and through gradient updating of a small amount of times, so that the requirement of text generation application on a large amount of learning samples is eliminated, and the bottleneck problem of insufficient data of the language generation model in certain scenes is solved to a certain extent.

3. The invention improves on the basis of text generation by using a recurrent neural network in reinforcement learning, and transfers experiences learned on a plurality of tasks to a target task by using a meta reinforcement learning training agent, so that the problems that a language generation model needs a large number of learning samples in practical application and is difficult to adapt to different scenes are solved to a certain extent, and the text generation under different scenes or contexts can be realized rapidly.

Drawings

FIG. 1 is a diagram of a meta reinforcement learning text generation of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

Meta reinforcement learning (Meta Reinforcement Learning, abbreviated as Meta RL) is a research direction of applying Meta learning to reinforcement learning, and the core idea is to hope that an agent obtains enough priori knowledge in learning a large number of reinforcement learning tasks, and then can learn faster and better in facing a new reinforcement learning task, and can adapt to a new learning environment quickly.

As shown in fig. 1, first, from database D _train Any task tau is selected _i As the current generation environment: initializing a text generation model M, and comparing the generated text with the real text as a training error Loss _n Generating model M is updated to M 'by using strategy gradient method for a few times of internal gradient' _n . Then use the updated model M' _n Continuously sampling the text track and calculating the expression error Loss of the text track _n ' repeating the above steps, calculating errors Loss for n different tasks _n '. Finally, the n errors are summed, and the external gradient update is performed on the original generated model through the performance of the plurality of updated models.

When reinforcement learning is used for text generation, a recurrent neural network is used. The recurrent neural network has a loop directed to itself to indicate that it can pass the information processed at the current time to be used at the next time. The input to the recurrent neural network is an entire sequence, i.e., x= [ x ] ₁ ,…x _t-1 ,x _t ,x _t+1 ,…x _T ]，x _t Is an input to the network at a certain point in time. Hidden state h at time t of network _t Is the hidden state h about the previous moment _t-1 And input x at the current time _t Function of (h), i.e. h _t Combines the history information and the current input information. The output of the network is related to h _t The recurrent neural network can well process the sequence problem under the condition of combining the history information and the current input, and can predict the output of the state at the next moment and the hidden state of the recurrent neural network. The current state of reinforcement learning is the character string s that has been generated at time t _t ＝Y _1:t-1 ＝(y ₁ ,…,y _t-1 ) Character y currently selected as time t _t At the time of determining a character y _t After that, the state is defined by s _t ＝Y _1:t-1 ＝(y ₁ ,…,y _t-1 ) Deterministic transfer to s _t '＝Y _1:t ＝(y ₁ ,…,y _t )。

On the basis of the mathematical model and the objective function, the invention trains the generated model under different scenes by applying the rapid adaptation performance of meta reinforcement learning, thereby improving the capability of the generated model to adapt to the scenes. The method solves the requirement of text generation application on a large number of learning samples by learning the generation capability of the text in different scenes, so as to solve the bottleneck problem that the language generation model in the real world is quickly adapted to the text generation in different scenes and the learning samples in individual scenes are not well collected.

The design idea of the invention is as follows: the whole generation model is divided into two parts, namely external gradient updating of element learning and internal updating of reinforcement learning text generation by adopting a recurrent neural network. The recursive neural network is adopted as a reinforcement learning agent, and is combined with historical data to perform gradient updating of internal strategies for a small number of times, so that the reinforcement learning agent is continuously trained to generate text data more conforming to human natural language. The method of meta learning is adopted to perform secondary gradient update on the original generation model through the expression of the updated model so as to enhance the adaptability of the intelligent body to the environment, thereby finally obtaining the text generation model with rapid learning capacity. The invention adopts the meta reinforcement learning technology, can be quickly adapted to new tasks and only needs less training data capacity, thereby solving the bottleneck problems that the language generation model in the real world is quickly adapted to different scenes to generate texts and learning samples are not well collected in individual scenes.

Based on the design thought, the invention firstly collects text data of different types; secondly, training reinforcement learning intelligent agents on a certain type of data, and carrying out gradient updating on internal strategies for a small number of times, so that the reinforcement learning intelligent agents have the capability of generating text data more conforming to human natural language; and finally, performing secondary gradient update on the original generated model through the performance of the updated model through repeated training on a plurality of types of data, so that the model has quick learning capability. The specific method comprises the following steps:

and step 1, collecting different types of text data as the division of different tasks.

The invention uses the capability of meta reinforcement learning to quickly adapt to new tasks and only needs less training data, and the advantage of the recurrent neural network in terms of processing sequence problems, thereby coping with the bottleneck problems that a language generation model in the real world is quickly adapted to different scenes to generate texts and learning samples are not well collected in individual scenes. In this step, it is necessary to collect different types of text data as meta learning tasks to learn a priori knowledge to be able to quickly adapt to new scenarios, and in addition, the data is used as dependency data for reinforcement learning winning function setting to help generate a model for training.

The collection of different types of text data at this step may be different scenarios of natural language, such as: weather, science and technology, restaurants, basketball, and the like.

Step 2, randomly taking a task tau from the text data collected in the step 1 _i Is a data of (a) a data of (b).

Step 3, constructing a text generation model f by adopting a recursive neural network for processing sequence data _θ I.e., a recurrent neural network model.

In this step, the recurrent neural network is used not only to identify individual inputs, but also to analyze the overall association between the input information, and is a neural network with a memory function.

Recurrent Neural Networks (RNNs) are seen as agents in reinforcement learning, which output a probability density function p (y _t |Y _1:t-1 (ii), rather than a deterministic prediction y _t . Y here _1:t-1 Representing the state s of the text generation model at time t _t I.e. the character sequence string that has been generated, y _t Action a representing text generation model at time t _t I.e. the currently selected character.

Step 4, generating K text tracks D by using the text generation model in step 3 _i 。

Step 5, generating a track D by using the text in the step 4 _i And performing strategy gradient update on the text generation model for a small number of times.

In this step, the policy gradient update is to perform gradient update on the parameters for a small number of times by using the REINFORCE method, wherein the reward function is set to a bilingual evaluation candidate score of the real text data and the generated text data, i.e. a BLEU (Bilingual Evaluation Understudy) score.

Step 6, utilizing the stepThe text generation model f obtained in the step 5 _θ ' generate text track D _i '。

And 7, repeating the steps 2 to 6, and respectively updating and sampling the text generation model on a plurality of tasks to obtain the expression error of the text generation track.

In this step, the secondary gradient update is to generate a model f using text _θ ' sampling data vs. original generation model f _θ And performing secondary gradient update.

It should be emphasized that the examples described herein are illustrative rather than limiting, and therefore the invention includes, but is not limited to, the examples described in the detailed description, as other embodiments derived from the technical solutions of the invention by a person skilled in the art are equally within the scope of the invention.

Claims

1. A text generation method based on meta reinforcement learning is characterized by comprising the following steps:

step 1, collecting different types of text data as different tasks, wherein the different types of text data are different scenes in natural language, and comprise weather, science and technology, restaurants and basketball data;

step 2, randomly taking a task tau from the text data collected in the step 1 _i The method is as follows: from database D _train Any task tau is selected _i As a current generation environment;

step 3, constructing a text generation model f by adopting a recurrent neural network _θ The recurrent neural network is regarded as an agent in reinforcement learning, which outputs a probability density function;

step 4, generating a model f by using the text _θ Generating K text tracks D _i Generating a text sequence with a certain length by using a text generation model;

step 5, comparing the generated text with the real text, and performing bilingual evaluation on the generated text track to replace the score, thereby using the score as a training error Loss _n Generating track D using text _i Generating a model f for text _θ Performing strategy gradient update for a small number of times to obtain an updated text generation model f' _θ ；

Step 6, generating a model f 'by using the text' _θ Generating a new text track D' _i ；

step 8, performing secondary gradient update training on the original text generation model parameters by using the expression errors of the text generation tracks obtained in the step 7 until convergence; the method comprises the following steps: performance error Loss for multiple tasks' _n And summing, and carrying out external gradient update on the original generated model through the performances of the plurality of updated models.

2. The text generation method based on meta reinforcement learning according to claim 1, wherein: the recurrent neural network in the step 3 is an agent in reinforcement learning, which outputs a probability density function p (y) _t |Y _1:t-1 (d), wherein Y _1:t-1 Generating a state s of a model at time t for text _t Representing the generated character sequence string, y _t Generating action a of model at time t for text _t Representing the currently selected character.

3. The text generation method based on meta reinforcement learning according to claim 1, wherein: and 5, performing gradient update on the parameters for a small time by adopting a REINFORCE method, and setting the reward function as bilingual evaluation replacement score of the real text data and the generated text data.

4. The text generation method based on meta reinforcement learning according to claim 1, wherein: said step 8 is based on text generationModel f' _θ Representation error of sampled data of (a) versus the original generation model f _θ And performing secondary gradient update.