CN111476020B - Text generation method based on meta reinforcement learning - Google Patents

Text generation method based on meta reinforcement learning Download PDF

Info

Publication number
CN111476020B
CN111476020B CN202010156433.1A CN202010156433A CN111476020B CN 111476020 B CN111476020 B CN 111476020B CN 202010156433 A CN202010156433 A CN 202010156433A CN 111476020 B CN111476020 B CN 111476020B
Authority
CN
China
Prior art keywords
text
text generation
model
reinforcement learning
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010156433.1A
Other languages
Chinese (zh)
Other versions
CN111476020A (en
Inventor
赵婷婷
宋亚静
王嫄
任德华
杨巨成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Science and Technology
Original Assignee
Tianjin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Science and Technology filed Critical Tianjin University of Science and Technology
Priority to CN202010156433.1A priority Critical patent/CN111476020B/en
Publication of CN111476020A publication Critical patent/CN111476020A/en
Application granted granted Critical
Publication of CN111476020B publication Critical patent/CN111476020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a text generation method based on meta reinforcement learning, which is technically characterized by comprising the following steps: collecting different types of text data as partitions of different tasks; collecting data of a certain task randomly adopted in the text data; constructing a text generation model by adopting a recursive neural network for processing sequence data; generating K text tracks; performing strategy gradient update on the text generation model for a small number of times by using the text generation track to obtain an updated text generation model; generating a new text track; updating and sampling the text generation model on a plurality of tasks respectively to obtain the expression error of the text generation track; and carrying out secondary gradient update training on the original text generation model parameters until convergence. The invention improves on the basis of text generation by using a recurrent neural network in reinforcement learning, and uses a meta reinforcement learning training agent to transfer experiences learned on a plurality of tasks to a target task, thereby being capable of quickly realizing text generation under different scenes or contexts.

Description

Text generation method based on meta reinforcement learning
Technical Field
The invention belongs to the technical field of computer natural language processing, in particular to a text generation method based on meta reinforcement learning.
Background
Natural Language Processing (NLP), and in particular Natural Language Generation (NLG), has long been considered one of the most challenging computing tasks. Natural language generation is a technology for enabling a computer to have the same expression and writing capability as a person, and can automatically generate a section of high-quality natural language text through planning according to some key information and expression forms thereof in a machine. Generating from the initial pattern matching, organizing and generating text by some simple syntax and grammar rules; to a later time based on a statistical probability model; with the rapid development of deep learning, deep learning-based natural language generation technology has been developed remarkably, and various neural networks have been proposed to generate accurate, natural and diversified texts.
Reinforcement learning (reinforcement learning, RL for short) is an important area of research in machine learning to interact with the environment with a mechanism for trial and error, and to learn an optimal strategy by maximizing cumulative rewards. The technique can be seen as a Markov Decision Process (MDP) using recurrent neural networks to generate text, and its local optimum strategy can be found by reinforcement learning, which has achieved good results in recent studies. However, existing text generation methods are typically developed for a particular field. Whereas natural language in the real world is often multi-domain and text between different domains is consistent in rules of grammar, semantics, etc. In addition, training of neural networks often requires a large amount of data, and labeling of sufficiently learned data takes a significant amount of time and money. Thus, sample collection and scene adaptation is an important bottleneck problem in text generation applications.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a text generation method based on meta reinforcement learning, which is used for solving the bottleneck problems that a language generation model in the real world is rapidly adapted to different scenes to generate text and learning samples are not well collected in individual scenes.
The invention solves the technical problems by adopting the following technical scheme:
a text generation method based on meta reinforcement learning comprises the following steps:
step 1, collecting text data of different types as the division of different tasks;
step 2, randomly taking a task tau from the text data collected in the step 1 i Data of (2);
step 3, constructing a text generation model f by adopting a recursive neural network for processing sequence data θ
Step 4, generating a model f by using the text θ Generating K text tracks D i
Step 5, generating a track D by using the text i Generating a model f for text θ Performing less policy gradient update to obtain an updated text generation model f θ ';
Step 6, generating a model f by using the text θ ' generate a new text track D i ';
Step 7, repeating the steps 2 to 6, and respectively updating and sampling the text generation model on a plurality of tasks to obtain the expression error of the text generation track;
and 8, performing secondary gradient update training on the original text generation model parameters by using the expression errors of the text generation tracks obtained in the step 7 until convergence.
And (3) collecting different types of text data which are different scenes of natural language in the step (1).
The recurrent neural network in the step 3 is an agent in reinforcement learning, which outputs a probability density function p (y) t |Y 1:t-1 (d), wherein Y 1:t-1 Generating a state s of a model at time t for text t Representing the generated character sequence string, y t Generating action a of model at time t for text t Representing the currently selected character.
And 4, performing gradient update on the parameters for a small time by adopting a REINFORCE method, and setting the reward function as bilingual evaluation replacement score of the real text data and the generated text data.
Said step 8 uses text to generate model f θ ' sampling data vs. original generation model f θ And performing secondary gradient update.
The invention has the advantages and positive effects that:
1. the invention has reasonable design, analyzes the integral association between input information through the recurrent neural network, processes the text sequence to generate, then updates the original model on a plurality of scenes by using the sampling track expression of the updated model, and simultaneously trains model parameters by meta learning, so that the model parameters can realize rapid learning on the new scene text generation task only by a small number of gradient updating. The learning agent is the necessary path to the multi-face agent which can continuously learn multiple new tasks. Therefore, the invention not only can lead the intelligent body to have the capability of fast learning and adapting to new environment, but also has the characteristics of rapidness and accuracy under the condition of less given sample number or limited budget for collecting samples.
2. According to the invention, under the condition that a small amount of texts are generated into the learning samples, the intelligent agent can also adapt to a new scene by using the text generation model and through gradient updating of a small amount of times, so that the requirement of text generation application on a large amount of learning samples is eliminated, and the bottleneck problem of insufficient data of the language generation model in certain scenes is solved to a certain extent.
3. The invention improves on the basis of text generation by using a recurrent neural network in reinforcement learning, and transfers experiences learned on a plurality of tasks to a target task by using a meta reinforcement learning training agent, so that the problems that a language generation model needs a large number of learning samples in practical application and is difficult to adapt to different scenes are solved to a certain extent, and the text generation under different scenes or contexts can be realized rapidly.
Drawings
FIG. 1 is a diagram of a meta reinforcement learning text generation of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
Meta reinforcement learning (Meta Reinforcement Learning, abbreviated as Meta RL) is a research direction of applying Meta learning to reinforcement learning, and the core idea is to hope that an agent obtains enough priori knowledge in learning a large number of reinforcement learning tasks, and then can learn faster and better in facing a new reinforcement learning task, and can adapt to a new learning environment quickly.
As shown in fig. 1, first, from database D train Any task tau is selected i As the current generation environment: initializing a text generation model M, and comparing the generated text with the real text as a training error Loss n Generating model M is updated to M 'by using strategy gradient method for a few times of internal gradient' n . Then use the updated model M' n Continuously sampling the text track and calculating the expression error Loss of the text track n ' repeating the above steps, calculating errors Loss for n different tasks n '. Finally, the n errors are summed, and the external gradient update is performed on the original generated model through the performance of the plurality of updated models.
When reinforcement learning is used for text generation, a recurrent neural network is used. The recurrent neural network has a loop directed to itself to indicate that it can pass the information processed at the current time to be used at the next time. The input to the recurrent neural network is an entire sequence, i.e., x= [ x ] 1 ,…x t-1 ,x t ,x t+1 ,…x T ],x t Is an input to the network at a certain point in time. Hidden state h at time t of network t Is the hidden state h about the previous moment t-1 And input x at the current time t Function of (h), i.e. h t Combines the history information and the current input information. The output of the network is related to h t The recurrent neural network can well process the sequence problem under the condition of combining the history information and the current input, and can predict the output of the state at the next moment and the hidden state of the recurrent neural network. The current state of reinforcement learning is the character string s that has been generated at time t t =Y 1:t-1 =(y 1 ,…,y t-1 ) Character y currently selected as time t t At the time of determining a character y t After that, the state is defined by s t =Y 1:t-1 =(y 1 ,…,y t-1 ) Deterministic transfer to s t '=Y 1:t =(y 1 ,…,y t )。
On the basis of the mathematical model and the objective function, the invention trains the generated model under different scenes by applying the rapid adaptation performance of meta reinforcement learning, thereby improving the capability of the generated model to adapt to the scenes. The method solves the requirement of text generation application on a large number of learning samples by learning the generation capability of the text in different scenes, so as to solve the bottleneck problem that the language generation model in the real world is quickly adapted to the text generation in different scenes and the learning samples in individual scenes are not well collected.
The design idea of the invention is as follows: the whole generation model is divided into two parts, namely external gradient updating of element learning and internal updating of reinforcement learning text generation by adopting a recurrent neural network. The recursive neural network is adopted as a reinforcement learning agent, and is combined with historical data to perform gradient updating of internal strategies for a small number of times, so that the reinforcement learning agent is continuously trained to generate text data more conforming to human natural language. The method of meta learning is adopted to perform secondary gradient update on the original generation model through the expression of the updated model so as to enhance the adaptability of the intelligent body to the environment, thereby finally obtaining the text generation model with rapid learning capacity. The invention adopts the meta reinforcement learning technology, can be quickly adapted to new tasks and only needs less training data capacity, thereby solving the bottleneck problems that the language generation model in the real world is quickly adapted to different scenes to generate texts and learning samples are not well collected in individual scenes.
Based on the design thought, the invention firstly collects text data of different types; secondly, training reinforcement learning intelligent agents on a certain type of data, and carrying out gradient updating on internal strategies for a small number of times, so that the reinforcement learning intelligent agents have the capability of generating text data more conforming to human natural language; and finally, performing secondary gradient update on the original generated model through the performance of the updated model through repeated training on a plurality of types of data, so that the model has quick learning capability. The specific method comprises the following steps:
and step 1, collecting different types of text data as the division of different tasks.
The invention uses the capability of meta reinforcement learning to quickly adapt to new tasks and only needs less training data, and the advantage of the recurrent neural network in terms of processing sequence problems, thereby coping with the bottleneck problems that a language generation model in the real world is quickly adapted to different scenes to generate texts and learning samples are not well collected in individual scenes. In this step, it is necessary to collect different types of text data as meta learning tasks to learn a priori knowledge to be able to quickly adapt to new scenarios, and in addition, the data is used as dependency data for reinforcement learning winning function setting to help generate a model for training.
The collection of different types of text data at this step may be different scenarios of natural language, such as: weather, science and technology, restaurants, basketball, and the like.
Step 2, randomly taking a task tau from the text data collected in the step 1 i Is a data of (a) a data of (b).
Step 3, constructing a text generation model f by adopting a recursive neural network for processing sequence data θ I.e., a recurrent neural network model.
In this step, the recurrent neural network is used not only to identify individual inputs, but also to analyze the overall association between the input information, and is a neural network with a memory function.
Recurrent Neural Networks (RNNs) are seen as agents in reinforcement learning, which output a probability density function p (y t |Y 1:t-1 (ii), rather than a deterministic prediction y t . Y here 1:t-1 Representing the state s of the text generation model at time t t I.e. the character sequence string that has been generated, y t Action a representing text generation model at time t t I.e. the currently selected character.
Step 4, generating K text tracks D by using the text generation model in step 3 i
Step 5, generating a track D by using the text in the step 4 i And performing strategy gradient update on the text generation model for a small number of times.
In this step, the policy gradient update is to perform gradient update on the parameters for a small number of times by using the REINFORCE method, wherein the reward function is set to a bilingual evaluation candidate score of the real text data and the generated text data, i.e. a BLEU (Bilingual Evaluation Understudy) score.
Step 6, utilizing the stepThe text generation model f obtained in the step 5 θ ' generate text track D i '。
And 7, repeating the steps 2 to 6, and respectively updating and sampling the text generation model on a plurality of tasks to obtain the expression error of the text generation track.
And 8, performing secondary gradient update training on the original text generation model parameters by using the expression errors of the text generation tracks obtained in the step 7 until convergence.
In this step, the secondary gradient update is to generate a model f using text θ ' sampling data vs. original generation model f θ And performing secondary gradient update.
It should be emphasized that the examples described herein are illustrative rather than limiting, and therefore the invention includes, but is not limited to, the examples described in the detailed description, as other embodiments derived from the technical solutions of the invention by a person skilled in the art are equally within the scope of the invention.

Claims (4)

1. A text generation method based on meta reinforcement learning is characterized by comprising the following steps:
step 1, collecting different types of text data as different tasks, wherein the different types of text data are different scenes in natural language, and comprise weather, science and technology, restaurants and basketball data;
step 2, randomly taking a task tau from the text data collected in the step 1 i The method is as follows: from database D train Any task tau is selected i As a current generation environment;
step 3, constructing a text generation model f by adopting a recurrent neural network θ The recurrent neural network is regarded as an agent in reinforcement learning, which outputs a probability density function;
step 4, generating a model f by using the text θ Generating K text tracks D i Generating a text sequence with a certain length by using a text generation model;
step 5, comparing the generated text with the real text, and performing bilingual evaluation on the generated text track to replace the score, thereby using the score as a training error Loss n Generating track D using text i Generating a model f for text θ Performing strategy gradient update for a small number of times to obtain an updated text generation model f' θ
Step 6, generating a model f 'by using the text' θ Generating a new text track D' i
Step 7, repeating the steps 2 to 6, and respectively updating and sampling the text generation model on a plurality of tasks to obtain the expression error of the text generation track;
step 8, performing secondary gradient update training on the original text generation model parameters by using the expression errors of the text generation tracks obtained in the step 7 until convergence; the method comprises the following steps: performance error Loss for multiple tasks' n And summing, and carrying out external gradient update on the original generated model through the performances of the plurality of updated models.
2. The text generation method based on meta reinforcement learning according to claim 1, wherein: the recurrent neural network in the step 3 is an agent in reinforcement learning, which outputs a probability density function p (y) t |Y 1:t-1 (d), wherein Y 1:t-1 Generating a state s of a model at time t for text t Representing the generated character sequence string, y t Generating action a of model at time t for text t Representing the currently selected character.
3. The text generation method based on meta reinforcement learning according to claim 1, wherein: and 5, performing gradient update on the parameters for a small time by adopting a REINFORCE method, and setting the reward function as bilingual evaluation replacement score of the real text data and the generated text data.
4. The text generation method based on meta reinforcement learning according to claim 1, wherein: said step 8 is based on text generationModel f' θ Representation error of sampled data of (a) versus the original generation model f θ And performing secondary gradient update.
CN202010156433.1A 2020-03-09 2020-03-09 Text generation method based on meta reinforcement learning Active CN111476020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010156433.1A CN111476020B (en) 2020-03-09 2020-03-09 Text generation method based on meta reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010156433.1A CN111476020B (en) 2020-03-09 2020-03-09 Text generation method based on meta reinforcement learning

Publications (2)

Publication Number Publication Date
CN111476020A CN111476020A (en) 2020-07-31
CN111476020B true CN111476020B (en) 2023-07-25

Family

ID=71748074

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010156433.1A Active CN111476020B (en) 2020-03-09 2020-03-09 Text generation method based on meta reinforcement learning

Country Status (1)

Country Link
CN (1) CN111476020B (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning
WO2019219965A1 (en) * 2018-05-18 2019-11-21 Deepmind Technologies Limited Meta-gradient updates for training return functions for reinforcement learning systems

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于强化学习的自然语言处理技术;冯少迪;;数码世界(03);全文 *
深度强化学习研究综述;赵星宇;丁世飞;;计算机科学(07);全文 *
深度强化学习进展:从AlphaGo到AlphaGo Zero;唐振韬;邵坤;赵冬斌;朱圆恒;;控制理论与应用(12);全文 *

Also Published As

Publication number Publication date
CN111476020A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
Liu et al. End-to-end optimization of task-oriented dialogue model with deep reinforcement learning
Stoyanov et al. Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure
CN110309170B (en) Complex intention recognition method in task-based multi-turn conversation
JP2022550326A (en) Contrasted pre-training for verbal tasks
CN110837548A (en) Answer matching method and device, electronic equipment and storage medium
CN109840595B (en) Knowledge tracking method based on group learning behavior characteristics
CN113254675B (en) Knowledge graph construction method based on self-adaptive few-sample relation extraction
CN110298044A (en) A kind of entity-relationship recognition method
CN110795522A (en) Method and device for predicting track position of mobile user
CN112488147A (en) Redundancy removal active learning method based on countermeasure network
CN117648950A (en) Training method and device for neural network model, electronic equipment and storage medium
CN116881996B (en) Modeling intention prediction method based on mouse operation
KR20190134965A (en) A method and system for training of neural networks
CN111476020B (en) Text generation method based on meta reinforcement learning
CN117494760A (en) Semantic tag-rich data augmentation method based on ultra-large-scale language model
CN115795017A (en) Off-line and on-line fusion application method and system for conversation system
CN116362242A (en) Small sample slot value extraction method, device, equipment and storage medium
CN114692615A (en) Small sample semantic graph recognition method for small languages
CN115840884A (en) Sample selection method, device, equipment and medium
CN115062123A (en) Knowledge base question-answer pair generation method of conversation generation system
CN112297012B (en) Robot reinforcement learning method based on self-adaptive model
CN116029261A (en) Chinese text grammar error correction method and related equipment
CN114154582A (en) Deep reinforcement learning method based on environment dynamic decomposition model
CN114970714B (en) Track prediction method and system considering uncertain behavior mode of moving target
CN116562299B (en) Argument extraction method, device and equipment of text information and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant