CN111401515A - Method for constructing incremental L STM by utilizing training process compression and memory consolidation - Google Patents

Method for constructing incremental L STM by utilizing training process compression and memory consolidation Download PDF

Info

Publication number
CN111401515A
CN111401515A CN202010092811.4A CN202010092811A CN111401515A CN 111401515 A CN111401515 A CN 111401515A CN 202010092811 A CN202010092811 A CN 202010092811A CN 111401515 A CN111401515 A CN 111401515A
Authority
CN
China
Prior art keywords
stm
training
memory
incremental
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010092811.4A
Other languages
Chinese (zh)
Inventor
牛德姣
夏政
蔡涛
周时颉
杨乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202010092811.4A priority Critical patent/CN111401515A/en
Publication of CN111401515A publication Critical patent/CN111401515A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a method for constructing an incremental L STM by utilizing compression and memory consolidation of a training process, which utilizes the activity of a L STM gate unit to select important input time in the training process and compress the training process to form historical memory, simultaneously effectively fuses the compressed memory with new data for training, and utilizes the historical information consolidation network memory to meet the requirement of incremental processing of sequence data.

Description

Method for constructing incremental L STM by utilizing training process compression and memory consolidation
Technical Field
The invention belongs to the field of artificial intelligence deep learning, and particularly relates to a method for constructing an efficient incremental L STM by utilizing compression and memory consolidation in a training process.
Background
In recent years, with the continuous development of novel artificial intelligence technology and the explosive growth of mass data, how to process and analyze the data efficiently, accurately and quickly by means of the novel technology, and the huge value of the data in the data is mined becomes a challenging task.
The present L STM is able to process longer sequence data than a general recurrent neural network, however, the present L STM generally uses Batch learning mode (Batch L earning) when processing sequence data, i.e. it is assumed that all training samples are available at one time before training and the samples are not changed any more during training, this Batch learning approach can achieve better performance on a given data set, but this learning approach cannot effectively process dynamic growth of data, cannot adapt to the situation that sequence data grows continuously with time in practical applications, and needs a far distance from the processing of massive data.
Disclosure of Invention
To the deficiency among the prior art, the application provides a method for utilizing training process compression and memory consolidation to construct incremental L STM to solve the incremental learning problem that present L STM can not effectively carry out the sequence data, reduce the space overhead of stored data, avoid the forgetting of historical information because of the study of new data causes simultaneously, improve L STM's practicality.
The technical scheme adopted by the invention is as follows:
the method for constructing the incremental L STM by utilizing the compression and the memory consolidation of the training process comprises the following steps:
step 1, dividing sequence data into a plurality of sub-sequence data sets, and performing incremental L STM training on each sub-sequence data set according to batches;
step 2, in the current batch of the sub-sequence data set SiAfter the incremental L STM training is finished, important parameter information of the model is retained, important moments are selected according to the activity of a L STM forgetting gate, and the sequence data of the important moments are compressed to obtain a data set
Figure RE-GDA00024791023100000217
And training for the next batch of data;
step 3, in the next sub-sequence data set Si+1In the training process, L STM memory is consolidated by using a back propagation algorithm fused with training historical information, the performance of the system on new and old data sets is ensured, the old data is prevented from being forgotten, the step 2 and the step 3 are repeated until the training of all batches is completed, and then the incremental L STM is obtained.
Further, the current batch of the sub-sequence data set S in the step 2iPerforming incremental L STM training, stopping training of the current batch when the training convergence or the iteration number reaches a specified value, and obtaining the model Mi
Further, the important parameter information of the model includes the long-term memory of the STM unit of the last time instant L
Figure RE-GDA0002479102310000021
And short term memory
Figure RE-GDA0002479102310000022
And a network weight set obtained by training the sub data sets of the current batch
Figure RE-GDA0002479102310000023
Wherein
Figure RE-GDA0002479102310000024
Figure RE-GDA0002479102310000025
L STM forgetting gate, input gate, output gate and memory gate;
further, the back propagation algorithm process in step 3 is as follows: to what is obtained by compression
Figure RE-GDA0002479102310000026
Performing error back propagation calculation to respectively calculate L STM model network weight gradient
Figure RE-GDA0002479102310000027
The network weight comprises
Figure RE-GDA0002479102310000028
Furthermore, the method for consolidating L STM memory is that M is the memory area of the memory areaiContinues on the basis of the sub-sequence data set Si+1Training L STM model to utilize long-term memory of L STM cells at last time in previous batch
Figure RE-GDA0002479102310000029
And short term memory
Figure RE-GDA00024791023100000210
Separately initializing parameters
Figure RE-GDA00024791023100000211
Figure RE-GDA00024791023100000212
And
Figure RE-GDA00024791023100000213
represent the initial values of L STM long-term memory and short-term memory at the beginning of training of the i +1 th batch of data sets, respectively.
Further, the parameters are carried out every timeWhen new, the obtained historical gradient
Figure RE-GDA00024791023100000214
With the current gradient
Figure RE-GDA00024791023100000215
The fusion being as a new gradient, i.e.
Figure RE-GDA00024791023100000216
α is a balance coefficient, strengthens the memory of history information, and then uses gradient descent algorithm to update parameters to obtain a model Mi+1
Further, the length of the sub-sequence data sets may be the same or different.
The invention has the beneficial effects that:
1. the invention designs a compression method of the sequence data historical training process by utilizing the activation characteristic of L STM forgetting gate, and reduces huge space overhead caused by storing complete sequences by detecting important moments in the training process of old sequences and reserving the moments for subsequent training;
2. carry out the error back propagation to the historical sequence after the compression, extract the historical memory information on the old data, fuse into the training process of new data with it simultaneously, remember repeatedly and consolidate for L STM can remember new and old data simultaneously, avoids forgetting because of the old knowledge that new data's study arouses, improves the training efficiency of incremental L STM, guarantees L STM model's practicality.
Drawings
FIG. 1 is a flow chart of the present invention for building incremental L STM using training process compression and memory consolidation;
FIG. 2 is a flow chart of memory consolidation in the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method for constructing the incremental L STM by utilizing the training process compression and the memory consolidation as shown in FIG. 1 comprises the following steps:
step 1, in order to adapt to incremental learning, dividing sequence data into a plurality of batches, finishing L STM incremental training batch by batch, and reducing training overhead1,S2,S3,……,SNWhere the ith sub-sequence data set represents
Figure RE-GDA0002479102310000031
Figure RE-GDA0002479102310000032
For the ith sub-sequence data set SiThe Tth data; respective sub-sequence data set SiThe lengths of the two groups of the same or different; for a sub-sequence data set SiThe N subsequences are used in sequence as a batch to train an incremental L STM.
Step 2, in the current batch of the sub-sequence data set SiAfter the training is finished, important parameter information of the model is kept, important time is selected according to the activity of L STM forgetting gate, and important time sequence data are compressed and used for training the next batch of data.
Step 2.1, in the set of sub-sequence data
Figure RE-GDA0002479102310000033
An upper iteration training L STM stops the training of the current batch when the training convergence or the iteration number reaches the specified value to obtain a model MiPreserving long-term memory of STM cell at last time L
Figure RE-GDA0002479102310000034
And short term memory
Figure RE-GDA0002479102310000035
And a network weight set obtained by sub data set training
Figure RE-GDA0002479102310000036
Wherein the content of the first and second substances,
Figure RE-GDA0002479102310000037
the weights of a forgetting gate, an input gate, an output gate and a memory gate obtained after training on the ith data set in L STM are respectively.
Step 2.2, forgetting the gate f according to L STMtSelecting important time and compressing SiAnd (4) training. I.e., if ftIf greater than the threshold value theta, the input and output data at the corresponding time are retained and recorded as
Figure RE-GDA0002479102310000038
Wherein the content of the first and second substances,
Figure RE-GDA0002479102310000041
because the opening and closing of a forgetting gate in the L STM model correspond to whether history information is allowed to propagate backwards through the L STM unit, the value size of the forgetting gate reflects the strength of dependence of data to be processed at the current time and the history information, the stronger the dependence, the more history information is imported into the time, and the richer the information carried by the L STM unit, the parameters are required to be kept.
Step 3, in the next sub-sequence data set Si+1In the training process, L STM memory is consolidated by using a back propagation algorithm fused with training historical information, the performance of the system on new and old data sets is ensured, the catastrophic forgetting of the old data is avoided, and the incremental L STM system is realized, wherein the specific process is as follows:
step 3.1, to the compression in step 2.2
Figure RE-GDA0002479102310000042
Performing an error Back Propagation (BPTT) calculation to respectively calculate L STM model network weight gradients, i.e. calculating
Figure RE-GDA0002479102310000043
Wherein the content of the first and second substances,
Figure RE-GDA0002479102310000044
is a loss function in the compressed sequence,
Figure RE-GDA0002479102310000045
Figure RE-GDA0002479102310000046
loss function values of the ith data set at the k moment;
Figure RE-GDA0002479102310000047
the weights of the forgetting gate, the input gate, the output gate and the memory gate in the L STM are respectively.
Step 3.2, input the next batch of the sub-sequence dataset
Figure RE-GDA0002479102310000048
Beginning at MiContinues on the basis of the sub-sequence data set Si+1Training L STM model Long term memory Using L STM cells at the last time in the previous batch
Figure RE-GDA0002479102310000049
And short term memory
Figure RE-GDA00024791023100000410
Are respectively paired
Figure RE-GDA00024791023100000411
Initial parameters
Figure RE-GDA00024791023100000412
Figure RE-GDA00024791023100000413
And
Figure RE-GDA00024791023100000414
respectively indicates the training start time of the i +1 th batch of data setsL initial values of STM long-term and short-term memory to consolidate the last batch of training memory, FIG. 2 shows a schematic diagram of the above memory consolidation, the dotted line shows the gradient information from the historical memory, the bold solid line shows the gradient information from the current sequence data, during the training of the current batch, the two will be merged and overlapped, as shown in the right part of FIG. 2, meanwhile, the historical gradient obtained in step 3.1 is added each time the parameter is updated
Figure RE-GDA00024791023100000415
With the current gradient
Figure RE-GDA00024791023100000416
The fusion being as a new gradient, i.e.
Figure RE-GDA00024791023100000417
α is a balance coefficient to strengthen the memory of history information, and then the gradient descent algorithm is used to update parameters to obtain a model Mi+1The specific method for updating the parameters by the gradient descent algorithm is as follows:
Figure RE-GDA00024791023100000418
where γ is the learning rate.
Step 3.3, repeating step 2.1, step 2.2, step 3.1, step 3.2 until S is completedNTraining on batches to obtain a model MNI.e., incremental L STM.
The method for constructing incremental L STM by using training process compression and memory consolidation is further explained in the following by combining with a flow chart:
the method comprises the steps of training N sets of sub-sequence data sets according to a flow chart, sequentially, processing incremental L STM, wherein the training process on each set of the sub-sequence data is similar and comprises three stages of historical information recall, memory consolidation and new memory generation.
Figure RE-GDA0002479102310000051
In the above formula, w0Is a set of network weights
Figure RE-GDA0002479102310000052
Initial value of, wNIs the weight value after the Nth batch of data training, and theta is the forgetting gate ftα is the equilibrium coefficient in gradient fusion, and γ is the learning rate.
When a BPTT (Back Propagation Through time) algorithm is used for training L STM in practice, due to the limitation of a back Propagation step length, historical gradient information cannot be propagated all the time along with the increase of the sequence data length, so that historical memory does not play a role in the training of new data.
The above embodiments are only used for illustrating the design idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention accordingly, and the protection scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes and modifications made in accordance with the principles and concepts disclosed herein are intended to be included within the scope of the present invention.

Claims (7)

1. The method for constructing the incremental L STM by utilizing the compression and the memory consolidation of the training process is characterized by comprising the following steps of:
step 1, dividing sequence data into a plurality of sub-sequence data sets, and performing incremental L STM training on each sub-sequence data set according to batches;
step 2, in the current batch of the sub-sequence data set SiAfter the incremental L STM training is finished, important parameter information of the model is retained, important moments are selected according to the activity of a L STM forgetting gate, and the sequence data of the important moments are compressed to obtain a data set
Figure FDA0002384273590000011
And training for the next batch of data;
step 3, in the next sub-sequence data set Si+1In the training process, L STM memory is consolidated by using a back propagation algorithm fused with training historical information, the performance of the system on new and old data sets is ensured, the old data is prevented from being forgotten, the step 2 and the step 3 are repeated until the training of all batches is completed, and then the incremental L STM is obtained.
2. The method for constructing an incremental L STM by using training process compression and memory consolidation as claimed in claim 1, wherein step 2 is performed on a current batch of sub-sequence data set SiPerforming incremental L STM training, stopping training of the current batch when the training convergence or the iteration number reaches a specified value, and obtaining the model Mi
3. The method for constructing the incremental L STM by using training process compression and memory consolidation as claimed in claim 2, wherein the important parameter information of the model comprises the long-term memory of the L STM unit at the last time
Figure FDA0002384273590000012
And short term memory
Figure FDA0002384273590000013
And a network weight set obtained by training the sub data sets of the current batch
Figure FDA0002384273590000014
Wherein
Figure FDA0002384273590000015
Figure FDA0002384273590000016
L STM forgetting gate, input gate, output gate, and memory gate.
4. The method for constructing the incremental L STM by using the training process compression and the memory consolidation as claimed in claim 3, wherein the back propagation algorithm process in step 3 is to compress
Figure FDA0002384273590000017
Performing error back propagation calculation to respectively calculate L STM model network weight gradient
Figure FDA0002384273590000018
5. The method for constructing the incremental L STM by utilizing the training process compression and the memory consolidation as claimed in claim 4, wherein the method for consolidating L STM memory is at MiContinues on the basis of the sub-sequence data set Si+1Training L STM model to utilize long-term memory of L STM cells at last time in previous batch
Figure FDA0002384273590000019
And short term memory
Figure FDA00023842735900000110
Separately initializing parameters
Figure FDA00023842735900000111
Figure FDA00023842735900000112
And
Figure FDA00023842735900000113
represent the initial values of L STM long-term memory and short-term memory at the beginning of training of the i +1 th batch of data sets, respectively.
6. The method for constructing the incremental L STM by using training process compression and memory consolidation as claimed in claim 5, wherein the historical gradient obtained is obtained every time a parameter update is performed
Figure FDA0002384273590000021
With the current gradient
Figure FDA0002384273590000022
The fusion being as a new gradient, i.e.
Figure FDA0002384273590000023
α is a balance coefficient, strengthens the memory of history information, and then uses gradient descent algorithm to update parameters to obtain a model Mi+1
7. The method for constructing the incremental L STM by using training process compression and memory consolidation as claimed in any one of claims 1-6, wherein the length of the sub-sequence datasets are the same or different.
CN202010092811.4A 2020-02-14 2020-02-14 Method for constructing incremental L STM by utilizing training process compression and memory consolidation Pending CN111401515A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010092811.4A CN111401515A (en) 2020-02-14 2020-02-14 Method for constructing incremental L STM by utilizing training process compression and memory consolidation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010092811.4A CN111401515A (en) 2020-02-14 2020-02-14 Method for constructing incremental L STM by utilizing training process compression and memory consolidation

Publications (1)

Publication Number Publication Date
CN111401515A true CN111401515A (en) 2020-07-10

Family

ID=71428424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010092811.4A Pending CN111401515A (en) 2020-02-14 2020-02-14 Method for constructing incremental L STM by utilizing training process compression and memory consolidation

Country Status (1)

Country Link
CN (1) CN111401515A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766501A (en) * 2021-02-26 2021-05-07 上海商汤智能科技有限公司 Incremental training method and related product
CN113537591A (en) * 2021-07-14 2021-10-22 北京琥珀创想科技有限公司 Long-term weather prediction method and device, computer equipment and storage medium
CN113657596A (en) * 2021-08-27 2021-11-16 京东科技信息技术有限公司 Method and device for training model and image recognition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766501A (en) * 2021-02-26 2021-05-07 上海商汤智能科技有限公司 Incremental training method and related product
CN113537591A (en) * 2021-07-14 2021-10-22 北京琥珀创想科技有限公司 Long-term weather prediction method and device, computer equipment and storage medium
CN113657596A (en) * 2021-08-27 2021-11-16 京东科技信息技术有限公司 Method and device for training model and image recognition
CN113657596B (en) * 2021-08-27 2023-11-03 京东科技信息技术有限公司 Method and device for training model and image recognition

Similar Documents

Publication Publication Date Title
CN111401515A (en) Method for constructing incremental L STM by utilizing training process compression and memory consolidation
CN111353582B (en) Particle swarm algorithm-based distributed deep learning parameter updating method
CN111260030B (en) A-TCN-based power load prediction method and device, computer equipment and storage medium
CN107239845B (en) Construction method of oil reservoir development effect prediction model
CN110969290A (en) Runoff probability prediction method and system based on deep learning
CN103927580A (en) Project constraint parameter optimizing method based on improved artificial bee colony algorithm
CN113821983B (en) Engineering design optimization method and device based on proxy model and electronic equipment
CN115829024B (en) Model training method, device, equipment and storage medium
CN114243799B (en) Deep reinforcement learning power distribution network fault recovery method based on distributed power supply
CN114117599A (en) Shield attitude position deviation prediction method
CN110390206A (en) Gradient under the cloud system frame of side with secret protection declines accelerating algorithm
CN109344960A (en) A kind of DGRU neural network and its prediction model method for building up preventing data information loss
CN114548591A (en) Time sequence data prediction method and system based on hybrid deep learning model and Stacking
CN109681165B (en) Water injection strategy optimization method and device for oil extraction in oil field
CN116205273A (en) Multi-agent reinforcement learning method for optimizing experience storage and experience reuse
CN113849910A (en) Dropout-based BiLSTM network wing resistance coefficient prediction method
CN113435128A (en) Oil and gas reservoir yield prediction method and device based on condition generation type countermeasure network
CN115577647B (en) Power grid fault type identification method and intelligent agent construction method
CN112381664A (en) Power grid short-term load prediction method, prediction device and storage medium
CN117079744A (en) Artificial intelligent design method for energetic molecule
CN116774089A (en) Convolutional neural network battery state of health estimation method and system based on feature fusion
CN115630316A (en) Ultrashort-term wind speed prediction method based on improved long-term and short-term memory network
CN112667394B (en) Computer resource utilization rate optimization method
CN115593264A (en) Charging optimization control method and device based on edge calculation and computer equipment
CN115061444A (en) Real-time optimization method for technological parameters integrating probability network and reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination