CN111401515A - Method for constructing incremental L STM by utilizing training process compression and memory consolidation - Google Patents
Method for constructing incremental L STM by utilizing training process compression and memory consolidation Download PDFInfo
- Publication number
- CN111401515A CN111401515A CN202010092811.4A CN202010092811A CN111401515A CN 111401515 A CN111401515 A CN 111401515A CN 202010092811 A CN202010092811 A CN 202010092811A CN 111401515 A CN111401515 A CN 111401515A
- Authority
- CN
- China
- Prior art keywords
- stm
- training
- memory
- incremental
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a method for constructing an incremental L STM by utilizing compression and memory consolidation of a training process, which utilizes the activity of a L STM gate unit to select important input time in the training process and compress the training process to form historical memory, simultaneously effectively fuses the compressed memory with new data for training, and utilizes the historical information consolidation network memory to meet the requirement of incremental processing of sequence data.
Description
Technical Field
The invention belongs to the field of artificial intelligence deep learning, and particularly relates to a method for constructing an efficient incremental L STM by utilizing compression and memory consolidation in a training process.
Background
In recent years, with the continuous development of novel artificial intelligence technology and the explosive growth of mass data, how to process and analyze the data efficiently, accurately and quickly by means of the novel technology, and the huge value of the data in the data is mined becomes a challenging task.
The present L STM is able to process longer sequence data than a general recurrent neural network, however, the present L STM generally uses Batch learning mode (Batch L earning) when processing sequence data, i.e. it is assumed that all training samples are available at one time before training and the samples are not changed any more during training, this Batch learning approach can achieve better performance on a given data set, but this learning approach cannot effectively process dynamic growth of data, cannot adapt to the situation that sequence data grows continuously with time in practical applications, and needs a far distance from the processing of massive data.
Disclosure of Invention
To the deficiency among the prior art, the application provides a method for utilizing training process compression and memory consolidation to construct incremental L STM to solve the incremental learning problem that present L STM can not effectively carry out the sequence data, reduce the space overhead of stored data, avoid the forgetting of historical information because of the study of new data causes simultaneously, improve L STM's practicality.
The technical scheme adopted by the invention is as follows:
the method for constructing the incremental L STM by utilizing the compression and the memory consolidation of the training process comprises the following steps:
step 2, in the current batch of the sub-sequence data set SiAfter the incremental L STM training is finished, important parameter information of the model is retained, important moments are selected according to the activity of a L STM forgetting gate, and the sequence data of the important moments are compressed to obtain a data setAnd training for the next batch of data;
step 3, in the next sub-sequence data set Si+1In the training process, L STM memory is consolidated by using a back propagation algorithm fused with training historical information, the performance of the system on new and old data sets is ensured, the old data is prevented from being forgotten, the step 2 and the step 3 are repeated until the training of all batches is completed, and then the incremental L STM is obtained.
Further, the current batch of the sub-sequence data set S in the step 2iPerforming incremental L STM training, stopping training of the current batch when the training convergence or the iteration number reaches a specified value, and obtaining the model Mi;
Further, the important parameter information of the model includes the long-term memory of the STM unit of the last time instant LAnd short term memoryAnd a network weight set obtained by training the sub data sets of the current batchWherein L STM forgetting gate, input gate, output gate and memory gate;
further, the back propagation algorithm process in step 3 is as follows: to what is obtained by compressionPerforming error back propagation calculation to respectively calculate L STM model network weight gradientThe network weight comprises
Furthermore, the method for consolidating L STM memory is that M is the memory area of the memory areaiContinues on the basis of the sub-sequence data set Si+1Training L STM model to utilize long-term memory of L STM cells at last time in previous batchAnd short term memorySeparately initializing parameters Andrepresent the initial values of L STM long-term memory and short-term memory at the beginning of training of the i +1 th batch of data sets, respectively.
Further, the parameters are carried out every timeWhen new, the obtained historical gradientWith the current gradientThe fusion being as a new gradient, i.e.α is a balance coefficient, strengthens the memory of history information, and then uses gradient descent algorithm to update parameters to obtain a model Mi+1。
Further, the length of the sub-sequence data sets may be the same or different.
The invention has the beneficial effects that:
1. the invention designs a compression method of the sequence data historical training process by utilizing the activation characteristic of L STM forgetting gate, and reduces huge space overhead caused by storing complete sequences by detecting important moments in the training process of old sequences and reserving the moments for subsequent training;
2. carry out the error back propagation to the historical sequence after the compression, extract the historical memory information on the old data, fuse into the training process of new data with it simultaneously, remember repeatedly and consolidate for L STM can remember new and old data simultaneously, avoids forgetting because of the old knowledge that new data's study arouses, improves the training efficiency of incremental L STM, guarantees L STM model's practicality.
Drawings
FIG. 1 is a flow chart of the present invention for building incremental L STM using training process compression and memory consolidation;
FIG. 2 is a flow chart of memory consolidation in the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method for constructing the incremental L STM by utilizing the training process compression and the memory consolidation as shown in FIG. 1 comprises the following steps:
Step 2, in the current batch of the sub-sequence data set SiAfter the training is finished, important parameter information of the model is kept, important time is selected according to the activity of L STM forgetting gate, and important time sequence data are compressed and used for training the next batch of data.
Step 2.1, in the set of sub-sequence dataAn upper iteration training L STM stops the training of the current batch when the training convergence or the iteration number reaches the specified value to obtain a model MiPreserving long-term memory of STM cell at last time LAnd short term memoryAnd a network weight set obtained by sub data set trainingWherein the content of the first and second substances,the weights of a forgetting gate, an input gate, an output gate and a memory gate obtained after training on the ith data set in L STM are respectively.
Step 2.2, forgetting the gate f according to L STMtSelecting important time and compressing SiAnd (4) training. I.e., if ftIf greater than the threshold value theta, the input and output data at the corresponding time are retained and recorded asWherein the content of the first and second substances,because the opening and closing of a forgetting gate in the L STM model correspond to whether history information is allowed to propagate backwards through the L STM unit, the value size of the forgetting gate reflects the strength of dependence of data to be processed at the current time and the history information, the stronger the dependence, the more history information is imported into the time, and the richer the information carried by the L STM unit, the parameters are required to be kept.
Step 3, in the next sub-sequence data set Si+1In the training process, L STM memory is consolidated by using a back propagation algorithm fused with training historical information, the performance of the system on new and old data sets is ensured, the catastrophic forgetting of the old data is avoided, and the incremental L STM system is realized, wherein the specific process is as follows:
step 3.1, to the compression in step 2.2Performing an error Back Propagation (BPTT) calculation to respectively calculate L STM model network weight gradients, i.e. calculatingWherein the content of the first and second substances,is a loss function in the compressed sequence, loss function values of the ith data set at the k moment;the weights of the forgetting gate, the input gate, the output gate and the memory gate in the L STM are respectively.
Step 3.2, input the next batch of the sub-sequence datasetBeginning at MiContinues on the basis of the sub-sequence data set Si+1Training L STM model Long term memory Using L STM cells at the last time in the previous batchAnd short term memoryAre respectively pairedInitial parameters Andrespectively indicates the training start time of the i +1 th batch of data setsL initial values of STM long-term and short-term memory to consolidate the last batch of training memory, FIG. 2 shows a schematic diagram of the above memory consolidation, the dotted line shows the gradient information from the historical memory, the bold solid line shows the gradient information from the current sequence data, during the training of the current batch, the two will be merged and overlapped, as shown in the right part of FIG. 2, meanwhile, the historical gradient obtained in step 3.1 is added each time the parameter is updatedWith the current gradientThe fusion being as a new gradient, i.e.α is a balance coefficient to strengthen the memory of history information, and then the gradient descent algorithm is used to update parameters to obtain a model Mi+1The specific method for updating the parameters by the gradient descent algorithm is as follows:where γ is the learning rate.
Step 3.3, repeating step 2.1, step 2.2, step 3.1, step 3.2 until S is completedNTraining on batches to obtain a model MNI.e., incremental L STM.
The method for constructing incremental L STM by using training process compression and memory consolidation is further explained in the following by combining with a flow chart:
the method comprises the steps of training N sets of sub-sequence data sets according to a flow chart, sequentially, processing incremental L STM, wherein the training process on each set of the sub-sequence data is similar and comprises three stages of historical information recall, memory consolidation and new memory generation.
In the above formula, w0Is a set of network weightsInitial value of, wNIs the weight value after the Nth batch of data training, and theta is the forgetting gate ftα is the equilibrium coefficient in gradient fusion, and γ is the learning rate.
When a BPTT (Back Propagation Through time) algorithm is used for training L STM in practice, due to the limitation of a back Propagation step length, historical gradient information cannot be propagated all the time along with the increase of the sequence data length, so that historical memory does not play a role in the training of new data.
The above embodiments are only used for illustrating the design idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention accordingly, and the protection scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes and modifications made in accordance with the principles and concepts disclosed herein are intended to be included within the scope of the present invention.
Claims (7)
1. The method for constructing the incremental L STM by utilizing the compression and the memory consolidation of the training process is characterized by comprising the following steps of:
step 1, dividing sequence data into a plurality of sub-sequence data sets, and performing incremental L STM training on each sub-sequence data set according to batches;
step 2, in the current batch of the sub-sequence data set SiAfter the incremental L STM training is finished, important parameter information of the model is retained, important moments are selected according to the activity of a L STM forgetting gate, and the sequence data of the important moments are compressed to obtain a data setAnd training for the next batch of data;
step 3, in the next sub-sequence data set Si+1In the training process, L STM memory is consolidated by using a back propagation algorithm fused with training historical information, the performance of the system on new and old data sets is ensured, the old data is prevented from being forgotten, the step 2 and the step 3 are repeated until the training of all batches is completed, and then the incremental L STM is obtained.
2. The method for constructing an incremental L STM by using training process compression and memory consolidation as claimed in claim 1, wherein step 2 is performed on a current batch of sub-sequence data set SiPerforming incremental L STM training, stopping training of the current batch when the training convergence or the iteration number reaches a specified value, and obtaining the model Mi。
3. The method for constructing the incremental L STM by using training process compression and memory consolidation as claimed in claim 2, wherein the important parameter information of the model comprises the long-term memory of the L STM unit at the last timeAnd short term memoryAnd a network weight set obtained by training the sub data sets of the current batchWherein L STM forgetting gate, input gate, output gate, and memory gate.
4. The method for constructing the incremental L STM by using the training process compression and the memory consolidation as claimed in claim 3, wherein the back propagation algorithm process in step 3 is to compressPerforming error back propagation calculation to respectively calculate L STM model network weight gradient
5. The method for constructing the incremental L STM by utilizing the training process compression and the memory consolidation as claimed in claim 4, wherein the method for consolidating L STM memory is at MiContinues on the basis of the sub-sequence data set Si+1Training L STM model to utilize long-term memory of L STM cells at last time in previous batchAnd short term memorySeparately initializing parameters Andrepresent the initial values of L STM long-term memory and short-term memory at the beginning of training of the i +1 th batch of data sets, respectively.
6. The method for constructing the incremental L STM by using training process compression and memory consolidation as claimed in claim 5, wherein the historical gradient obtained is obtained every time a parameter update is performedWith the current gradientThe fusion being as a new gradient, i.e.α is a balance coefficient, strengthens the memory of history information, and then uses gradient descent algorithm to update parameters to obtain a model Mi+1。
7. The method for constructing the incremental L STM by using training process compression and memory consolidation as claimed in any one of claims 1-6, wherein the length of the sub-sequence datasets are the same or different.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010092811.4A CN111401515A (en) | 2020-02-14 | 2020-02-14 | Method for constructing incremental L STM by utilizing training process compression and memory consolidation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010092811.4A CN111401515A (en) | 2020-02-14 | 2020-02-14 | Method for constructing incremental L STM by utilizing training process compression and memory consolidation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111401515A true CN111401515A (en) | 2020-07-10 |
Family
ID=71428424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010092811.4A Pending CN111401515A (en) | 2020-02-14 | 2020-02-14 | Method for constructing incremental L STM by utilizing training process compression and memory consolidation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111401515A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112766501A (en) * | 2021-02-26 | 2021-05-07 | 上海商汤智能科技有限公司 | Incremental training method and related product |
CN113537591A (en) * | 2021-07-14 | 2021-10-22 | 北京琥珀创想科技有限公司 | Long-term weather prediction method and device, computer equipment and storage medium |
CN113657596A (en) * | 2021-08-27 | 2021-11-16 | 京东科技信息技术有限公司 | Method and device for training model and image recognition |
-
2020
- 2020-02-14 CN CN202010092811.4A patent/CN111401515A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112766501A (en) * | 2021-02-26 | 2021-05-07 | 上海商汤智能科技有限公司 | Incremental training method and related product |
CN113537591A (en) * | 2021-07-14 | 2021-10-22 | 北京琥珀创想科技有限公司 | Long-term weather prediction method and device, computer equipment and storage medium |
CN113657596A (en) * | 2021-08-27 | 2021-11-16 | 京东科技信息技术有限公司 | Method and device for training model and image recognition |
CN113657596B (en) * | 2021-08-27 | 2023-11-03 | 京东科技信息技术有限公司 | Method and device for training model and image recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111401515A (en) | Method for constructing incremental L STM by utilizing training process compression and memory consolidation | |
CN111353582B (en) | Particle swarm algorithm-based distributed deep learning parameter updating method | |
CN111260030B (en) | A-TCN-based power load prediction method and device, computer equipment and storage medium | |
CN107239845B (en) | Construction method of oil reservoir development effect prediction model | |
CN110969290A (en) | Runoff probability prediction method and system based on deep learning | |
CN103927580A (en) | Project constraint parameter optimizing method based on improved artificial bee colony algorithm | |
CN113821983B (en) | Engineering design optimization method and device based on proxy model and electronic equipment | |
CN115829024B (en) | Model training method, device, equipment and storage medium | |
CN114243799B (en) | Deep reinforcement learning power distribution network fault recovery method based on distributed power supply | |
CN114117599A (en) | Shield attitude position deviation prediction method | |
CN110390206A (en) | Gradient under the cloud system frame of side with secret protection declines accelerating algorithm | |
CN109344960A (en) | A kind of DGRU neural network and its prediction model method for building up preventing data information loss | |
CN114548591A (en) | Time sequence data prediction method and system based on hybrid deep learning model and Stacking | |
CN109681165B (en) | Water injection strategy optimization method and device for oil extraction in oil field | |
CN116205273A (en) | Multi-agent reinforcement learning method for optimizing experience storage and experience reuse | |
CN113849910A (en) | Dropout-based BiLSTM network wing resistance coefficient prediction method | |
CN113435128A (en) | Oil and gas reservoir yield prediction method and device based on condition generation type countermeasure network | |
CN115577647B (en) | Power grid fault type identification method and intelligent agent construction method | |
CN112381664A (en) | Power grid short-term load prediction method, prediction device and storage medium | |
CN117079744A (en) | Artificial intelligent design method for energetic molecule | |
CN116774089A (en) | Convolutional neural network battery state of health estimation method and system based on feature fusion | |
CN115630316A (en) | Ultrashort-term wind speed prediction method based on improved long-term and short-term memory network | |
CN112667394B (en) | Computer resource utilization rate optimization method | |
CN115593264A (en) | Charging optimization control method and device based on edge calculation and computer equipment | |
CN115061444A (en) | Real-time optimization method for technological parameters integrating probability network and reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |