CN111401515A

CN111401515A - Method for constructing incremental L STM by utilizing training process compression and memory consolidation

Info

Publication number: CN111401515A
Application number: CN202010092811.4A
Authority: CN
Inventors: 牛德姣; 夏政; 蔡涛; 周时颉; 杨乐
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2020-02-14
Filing date: 2020-02-14
Publication date: 2020-07-10

Abstract

The invention discloses a method for constructing an incremental L STM by utilizing compression and memory consolidation of a training process, which utilizes the activity of a L STM gate unit to select important input time in the training process and compress the training process to form historical memory, simultaneously effectively fuses the compressed memory with new data for training, and utilizes the historical information consolidation network memory to meet the requirement of incremental processing of sequence data.

Description

Method for constructing incremental L STM by utilizing training process compression and memory consolidation

Technical Field

The invention belongs to the field of artificial intelligence deep learning, and particularly relates to a method for constructing an efficient incremental L STM by utilizing compression and memory consolidation in a training process.

Background

In recent years, with the continuous development of novel artificial intelligence technology and the explosive growth of mass data, how to process and analyze the data efficiently, accurately and quickly by means of the novel technology, and the huge value of the data in the data is mined becomes a challenging task.

The present L STM is able to process longer sequence data than a general recurrent neural network, however, the present L STM generally uses Batch learning mode (Batch L earning) when processing sequence data, i.e. it is assumed that all training samples are available at one time before training and the samples are not changed any more during training, this Batch learning approach can achieve better performance on a given data set, but this learning approach cannot effectively process dynamic growth of data, cannot adapt to the situation that sequence data grows continuously with time in practical applications, and needs a far distance from the processing of massive data.

Disclosure of Invention

To the deficiency among the prior art, the application provides a method for utilizing training process compression and memory consolidation to construct incremental L STM to solve the incremental learning problem that present L STM can not effectively carry out the sequence data, reduce the space overhead of stored data, avoid the forgetting of historical information because of the study of new data causes simultaneously, improve L STM's practicality.

The technical scheme adopted by the invention is as follows:

the method for constructing the incremental L STM by utilizing the compression and the memory consolidation of the training process comprises the following steps:

step 1, dividing sequence data into a plurality of sub-sequence data sets, and performing incremental L STM training on each sub-sequence data set according to batches;

step 2, in the current batch of the sub-sequence data set S_iAfter the incremental L STM training is finished, important parameter information of the model is retained, important moments are selected according to the activity of a L STM forgetting gate, and the sequence data of the important moments are compressed to obtain a data set

And training for the next batch of data;

step 3, in the next sub-sequence data set S_i+1In the training process, L STM memory is consolidated by using a back propagation algorithm fused with training historical information, the performance of the system on new and old data sets is ensured, the old data is prevented from being forgotten, the step 2 and the step 3 are repeated until the training of all batches is completed, and then the incremental L STM is obtained.

Further, the current batch of the sub-sequence data set S in the step 2_iPerforming incremental L STM training, stopping training of the current batch when the training convergence or the iteration number reaches a specified value, and obtaining the model M_i；

Further, the important parameter information of the model includes the long-term memory of the STM unit of the last time instant L

And short term memory

And a network weight set obtained by training the sub data sets of the current batch

Wherein

L STM forgetting gate, input gate, output gate and memory gate;

further, the back propagation algorithm process in step 3 is as follows: to what is obtained by compression

Performing error back propagation calculation to respectively calculate L STM model network weight gradient

The network weight comprises

Furthermore, the method for consolidating L STM memory is that M is the memory area of the memory area_iContinues on the basis of the sub-sequence data set S_i+1Training L STM model to utilize long-term memory of L STM cells at last time in previous batch

And short term memory

Separately initializing parameters

And

represent the initial values of L STM long-term memory and short-term memory at the beginning of training of the i +1 th batch of data sets, respectively.

Further, the parameters are carried out every timeWhen new, the obtained historical gradient

With the current gradient

The fusion being as a new gradient, i.e.

α is a balance coefficient, strengthens the memory of history information, and then uses gradient descent algorithm to update parameters to obtain a model M_i+1。

Further, the length of the sub-sequence data sets may be the same or different.

The invention has the beneficial effects that:

1. the invention designs a compression method of the sequence data historical training process by utilizing the activation characteristic of L STM forgetting gate, and reduces huge space overhead caused by storing complete sequences by detecting important moments in the training process of old sequences and reserving the moments for subsequent training;

2. carry out the error back propagation to the historical sequence after the compression, extract the historical memory information on the old data, fuse into the training process of new data with it simultaneously, remember repeatedly and consolidate for L STM can remember new and old data simultaneously, avoids forgetting because of the old knowledge that new data's study arouses, improves the training efficiency of incremental L STM, guarantees L STM model's practicality.

Drawings

FIG. 1 is a flow chart of the present invention for building incremental L STM using training process compression and memory consolidation;

FIG. 2 is a flow chart of memory consolidation in the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The method for constructing the incremental L STM by utilizing the training process compression and the memory consolidation as shown in FIG. 1 comprises the following steps:

step 1, in order to adapt to incremental learning, dividing sequence data into a plurality of batches, finishing L STM incremental training batch by batch, and reducing training overhead₁,S₂,S₃,……,S_NWhere the ith sub-sequence data set represents

For the ith sub-sequence data set S_iThe Tth data; respective sub-sequence data set S_iThe lengths of the two groups of the same or different; for a sub-sequence data set S_iThe N subsequences are used in sequence as a batch to train an incremental L STM.

Step 2, in the current batch of the sub-sequence data set S_iAfter the training is finished, important parameter information of the model is kept, important time is selected according to the activity of L STM forgetting gate, and important time sequence data are compressed and used for training the next batch of data.

Step 2.1, in the set of sub-sequence data

An upper iteration training L STM stops the training of the current batch when the training convergence or the iteration number reaches the specified value to obtain a model M_iPreserving long-term memory of STM cell at last time L

And short term memory

And a network weight set obtained by sub data set training

Wherein the content of the first and second substances,

the weights of a forgetting gate, an input gate, an output gate and a memory gate obtained after training on the ith data set in L STM are respectively.

Step 2.2, forgetting the gate f according to L STM_tSelecting important time and compressing S_iAnd (4) training. I.e., if f_tIf greater than the threshold value theta, the input and output data at the corresponding time are retained and recorded as

Wherein the content of the first and second substances,

because the opening and closing of a forgetting gate in the L STM model correspond to whether history information is allowed to propagate backwards through the L STM unit, the value size of the forgetting gate reflects the strength of dependence of data to be processed at the current time and the history information, the stronger the dependence, the more history information is imported into the time, and the richer the information carried by the L STM unit, the parameters are required to be kept.

Step 3, in the next sub-sequence data set S_i+1In the training process, L STM memory is consolidated by using a back propagation algorithm fused with training historical information, the performance of the system on new and old data sets is ensured, the catastrophic forgetting of the old data is avoided, and the incremental L STM system is realized, wherein the specific process is as follows:

step 3.1, to the compression in step 2.2

Performing an error Back Propagation (BPTT) calculation to respectively calculate L STM model network weight gradients, i.e. calculating

Wherein the content of the first and second substances,

is a loss function in the compressed sequence,

loss function values of the ith data set at the k moment;

the weights of the forgetting gate, the input gate, the output gate and the memory gate in the L STM are respectively.

Step 3.2, input the next batch of the sub-sequence dataset

Beginning at M_iContinues on the basis of the sub-sequence data set S_i+1Training L STM model Long term memory Using L STM cells at the last time in the previous batch

And short term memory

Are respectively paired

Initial parameters

And

respectively indicates the training start time of the i +1 th batch of data setsL initial values of STM long-term and short-term memory to consolidate the last batch of training memory, FIG. 2 shows a schematic diagram of the above memory consolidation, the dotted line shows the gradient information from the historical memory, the bold solid line shows the gradient information from the current sequence data, during the training of the current batch, the two will be merged and overlapped, as shown in the right part of FIG. 2, meanwhile, the historical gradient obtained in step 3.1 is added each time the parameter is updated

With the current gradient

The fusion being as a new gradient, i.e.

α is a balance coefficient to strengthen the memory of history information, and then the gradient descent algorithm is used to update parameters to obtain a model M_i+1The specific method for updating the parameters by the gradient descent algorithm is as follows:

where γ is the learning rate.

Step 3.3, repeating step 2.1, step 2.2, step 3.1, step 3.2 until S is completed_NTraining on batches to obtain a model M_NI.e., incremental L STM.

The method for constructing incremental L STM by using training process compression and memory consolidation is further explained in the following by combining with a flow chart:

the method comprises the steps of training N sets of sub-sequence data sets according to a flow chart, sequentially, processing incremental L STM, wherein the training process on each set of the sub-sequence data is similar and comprises three stages of historical information recall, memory consolidation and new memory generation.

In the above formula, w⁰Is a set of network weights

Initial value of, w_NIs the weight value after the Nth batch of data training, and theta is the forgetting gate f_tα is the equilibrium coefficient in gradient fusion, and γ is the learning rate.

When a BPTT (Back Propagation Through time) algorithm is used for training L STM in practice, due to the limitation of a back Propagation step length, historical gradient information cannot be propagated all the time along with the increase of the sequence data length, so that historical memory does not play a role in the training of new data.

The above embodiments are only used for illustrating the design idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention accordingly, and the protection scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes and modifications made in accordance with the principles and concepts disclosed herein are intended to be included within the scope of the present invention.

Claims

1. The method for constructing the incremental L STM by utilizing the compression and the memory consolidation of the training process is characterized by comprising the following steps of:

And training for the next batch of data;

2. The method for constructing an incremental L STM by using training process compression and memory consolidation as claimed in claim 1, wherein step 2 is performed on a current batch of sub-sequence data set S_iPerforming incremental L STM training, stopping training of the current batch when the training convergence or the iteration number reaches a specified value, and obtaining the model M_i。

3. The method for constructing the incremental L STM by using training process compression and memory consolidation as claimed in claim 2, wherein the important parameter information of the model comprises the long-term memory of the L STM unit at the last time

And short term memory

Wherein

L STM forgetting gate, input gate, output gate, and memory gate.

4. The method for constructing the incremental L STM by using the training process compression and the memory consolidation as claimed in claim 3, wherein the back propagation algorithm process in step 3 is to compress

5. The method for constructing the incremental L STM by utilizing the training process compression and the memory consolidation as claimed in claim 4, wherein the method for consolidating L STM memory is at M_iContinues on the basis of the sub-sequence data set S_i+1Training L STM model to utilize long-term memory of L STM cells at last time in previous batch

And short term memory

Separately initializing parameters

And

6. The method for constructing the incremental L STM by using training process compression and memory consolidation as claimed in claim 5, wherein the historical gradient obtained is obtained every time a parameter update is performed

With the current gradient

The fusion being as a new gradient, i.e.

7. The method for constructing the incremental L STM by using training process compression and memory consolidation as claimed in any one of claims 1-6, wherein the length of the sub-sequence datasets are the same or different.