CN117369263A

CN117369263A - Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism

Info

Publication number: CN117369263A
Application number: CN202311375874.0A
Authority: CN
Inventors: 梁合兰; 国宏伟; 闫炳基; 杨韬然
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-01-09
Anticipated expiration: 2043-10-23
Also published as: CN117369263B

Abstract

The invention belongs to the technical field of combustion control of hot blast stoves, and particularly relates to an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanisms, which comprises the following steps: s1: acquiring historical data of combustion of a combustion furnace, and selecting continuous data from the historical data as combustion state data by utilizing a moving time window; s2: based on the combustion state data, obtaining a trained Attention-MLP model; s3: and acquiring real-time combustion data, and controlling the gas valve position adjusting direction of the hot blast stove according to the real-time combustion data by utilizing the Attention-MLP model. The invention solves the problem that the existing hot blast stove combustion control method can not meet the requirements of control precision and real-time property, has higher accuracy and can meet the requirements of intelligent combustion optimization control of the blast furnace hot blast stove.

Description

Intelligent combustion control method of hot blast stove based on reinforcement learning and attention mechanism

Technical Field

The invention relates to the technical field of combustion control of hot blast stoves, in particular to an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanisms.

Background

The blast furnace smelting is a key link in the steel smelting process, and the production cost of the blast furnace smelting accounts for more than 60% of the cost of steel products. The hot blast stove is used as a heat exchange device which is vital in the iron-making process and has the function of generating and conveying high-temperature hot air to a blast furnace so as to meet the heat requirement of the reduction process of iron ores. The improvement of the combustion control precision of the hot blast stove is an important means for improving the air temperature, reducing the fuel consumption and reducing the emission, and is also an effective means for prolonging the service life of the hot blast stove and reducing the labor intensity of workers.

The goal of the hot blast stove combustion control is to achieve the optimal control of the vault temperature and the flue gas temperature rise rate in the combustion period by dynamically adjusting the air valve position and the gas valve position according to the combustion state. The combustion control method of the blast furnace hot blast stove mainly comprises three main types of traditional control methods, mathematical model methods and intelligent control methods. In the method, the traditional control method has the problems of control lag and overlarge control action intensity, and the mathematical model method has large investment because of more thermal parameters to be monitored. Compared with the prior art, the intelligent control method of the hot blast stove can utilize intelligent knowledge to design the controller, has the advantages of wide working range, wide application range and the like, and is the main stream direction of current development. However, in the current intelligent control method, rule induction and extraction are difficult problems, and in actual operation, rules are usually set manually, and the defects of single expression form, simple structure and the like exist.

Disclosure of Invention

Therefore, the invention aims to solve the technical problem that the control precision and the real-time performance of the optimized combustion of the hot blast stove cannot be considered in the prior art.

In order to solve the technical problems, the invention provides an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanism, which comprises the following steps:

s1: acquiring historical data of combustion of a combustion furnace, and selecting continuous data from the historical data as combustion state data by utilizing a moving time window;

s2: based on the combustion state data, obtaining a trained Attention-MLP model;

s3: acquiring real-time combustion data, and controlling the gas valve position adjusting direction of the hot blast stove according to the real-time combustion data by utilizing the Attention-MLP model;

the Attention-MLP model comprises a predefined action space, an experience pool, a Q network for outputting actions and a target network for outputting estimated values to guide the Q network, wherein initial parameters of the Q network and initial parameters of the target network are the same; after the combustion state data is obtained by the intelligent agent observing the environment, an output state is formed after the combustion state data is processed by the attention mechanism module, a state transfer record obtained by the interaction of the intelligent agent and the environment is stored in the experience pool, the Q network is trained by adopting an experience playback mechanism, the parameters of the Q network are updated, and the Q network synchronizes the updated parameters to the target network;

the combustion state data is a plurality of pieces, and any piece of combustion state data comprises values of gas pressure, air valve position, gas valve position and vault temperature.

In one embodiment of the invention, storing state transition records obtained by the agent interacting with the environment in the experience pool comprises: the intelligent agent obtains feedback rewards corresponding to the action after selecting the action from the action space to execute according to the output state, and the intelligent agent transfers to a new state, and takes the combustion state data, the action, the feedback rewards and the new state as state transfer records to be stored in the experience pool; the action space is defined as a collection of 3 adjustment directions of a gas valve position, and the 3 adjustment directions are specifically: downward adjustment, upward adjustment and constant.

In one embodiment of the present invention, the specific steps for forming the output state after processing by the attention mechanism module are:

s211: in combustion state data s _t For input, linear features are obtained by linear mappingThe following is shown:

Z _t ＝{z ₁ ,z ₂ ,...,z _ω }＝W _z s _t +b _z

wherein,representing the linear characteristics at time iI=1, …, ω, ω represents the time window length; m is the length of each piece of furnace data; />Is a neural network parameter->D is the dimension of the embedded feature, which is the bias parameter;

s212: in linear characteristic Z _t For input, computing hidden states with position coding addedThe following are provided:

H _t ＝{h ₁ ,h ₂ ,...,h _ω }＝Z _t +C

wherein,indicating the hidden state at time i, i=1, …, ω; />Is a neural network parameter, its size and Z _t The same;

s213: h by linear mapping _t Respectively mapped as keysSum->The following is shown:

K _t ＝{k ₁ ,k ₂ ,...,k _ω }＝W _k H _t +b _k

V _t ＝{v ₁ ,v ₂ ,...,v _ω }＝W _v H _t +b _v

in the method, in the process of the invention, the key vector and the value vector at the i-th moment are respectively represented, i=1, …, ω; a weight parameter for the embedded network; /> Is a bias parameter;

s214: inputting query vectorsCalculating the relevance between the query vector and the key +.>The following is shown:

A _t ＝(a ₁ ,a ₂ ,...,a _ω )＝softmax(q ^T K _t )

in the method, in the process of the invention,an attention value between the query vector and the key representing the i-th moment, i=1, …, ω; />Softmax is a normalized exponential function, a randomly initialized learnable parameter;

s215: substituting the correlation vector and the value matrix according to the following formula to obtain the combustion state data s at the moment t _t Output state of (2)

In one embodiment of the invention, the training of the Q network using an empirical playback mechanism includes:

s21: initializing experience pool D, parameters of the Q network and the target network, sampling priority of each state transition record, and recording state transition record at moment i as tau _i ；

S22: according to the sampling priority, calculating the sampling probability of each state transition record:

where β is a state transition sampling constant, τ _i For the state transition record at time i,recording τ for state transitions _i Is a sampling priority of (1);

s23: and sampling the state transition record according to the sampling probability, setting a loss function according to the Q value corresponding to the output action of the B-state transition record obtained by sampling, and updating the parameters of the Q network by adopting a gradient descent method so as to minimize the loss function value.

In one embodiment of the invention, the loss function is expressed as:

wherein L is _TD As a time sequence difference error, L _E For the purpose of large pitch class loss,for the L2 regularization loss,λ ₁ 、λ ₂ is the weight of the corresponding loss function.

In one embodiment of the present invention, the expression of the time series differential error is:

L _TD ＝Y _t -Q(s _t ,a _t ；θ)

wherein Y is _t Represents the cumulative expected return estimate at time t, Q(s) _t ,a _t The method comprises the steps of carrying out a first treatment on the surface of the θ) represents the input state s in the Q network _t Estimating an output value by accumulated expected returns at the time; wherein,

wherein r is _t+1 Rewards at time t+1; gamma epsilon (0, 1) is the discount coefficient;expressed in terms of the parameter theta ^- Input s in target network of (a) _t+1 Output action is +.>A corresponding accumulated expected return estimate; />To input state s in Q network _t+1 The action with the maximum expected return estimated output value is accumulated.

In one embodiment of the present invention, the expression of the large-pitch classification loss is:

wherein A represents an action space, s _t Indicating that the hot air furnace is inthe combustion state at time t, a represents any operation in the operation space a, Q (s _t A; θ) indicates that the Q network with parameter θ is in the input state s _t The lower action is the output of a,representing expert action at time t, i.e. s in the state transition record _t The corresponding valve position adjustment action under the state; />And b is a super-parameter greater than zero, representing a penalty function when the action output by the model is inconsistent with the expert action reflected in the record.

In one embodiment of the present invention, the expression of the L2 regularization loss is:

wherein θ is a parameter of the Q network.

In one embodiment of the present invention, the specific method for the Q network to synchronize the updated parameters to the target network is: and synchronizing the parameters of the Q network to the target network when the sampling times reach the preset target network parameter updating frequency.

In one embodiment of the present invention, the gradient descent method is as follows: and calculating importance sampling weights of all the state transition records according to the sampling probability, updating gradient values according to the average importance sampling weights of the B-state transition records obtained by sampling, and updating the sampling priority of the state transition records according to the time sequence differential error.

Compared with the prior art, the technical scheme of the invention has the following advantages:

the intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism combines the reinforcement learning and supervision learning ideas, utilizes the burning history data set to guide the intelligent body to learn offline, particularly aims at the characteristics of history data, and uses the attention mechanism to better learn the state change rule of the hot blast stove contained in the history data, thereby outputting better combustion control actions of the hot blast stove, and solving the problem that the combustion of the hot blast stove cannot meet the requirements of control precision and real-time performance due to the characteristics of nonlinearity, continuity, hysteresis and the like.

Drawings

In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings, in which

FIG. 1 is a flow chart of an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanism;

FIG. 2 is a generation map of the combustion state data in the embodiment;

FIG. 3 is a block diagram of the Q network or the target network in an embodiment;

FIG. 4 is a flow chart of the processing of state data into output states by the attention-based embedded network of FIG. 3;

FIGS. 5 (a) - (b) are the accuracy of the Attention-MLP model and the MLP model on the training set and the test set, respectively;

FIG. 6 is a graph of a comparative change in loss function values for an Attention-MLP model and an MLP model;

FIG. 7 is a graph showing Q-value change curves of the Attention-MLP model and the MLP model.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.

Referring to FIG. 1, the invention provides an intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanism, which comprises the following steps:

the combustion state data is multiple, and any one combustion state data comprises values of gas pressure, air valve position, gas valve position and vault temperature.

Specifically in S1, a strategy of moving a time window is adopted, and continuous data in the time window with the size of ω=10 is selected to construct a state, i.e. S _t ＝X _t-ω+1:t Wherein x is _t Representing the values of variables such as gas pressure, air pressure and the like at the time t, and m is the length of each record. For s _t To s _ω-1 The previous portion of data is missing and padded with 0. The generation of the status data is shown in fig. 2. The upper part of fig. 2 is the collected furnace history data, and the state data of the reinforcement learning model is shown in the lower part of the figure. Wherein the state data s at time t _t Furnace history X from t-omega+1 to t _t-ω+1:t Composition is prepared. The status data at other times can be obtained in the same way.

As shown in fig. 3, the Q network and the target network are each composed of an embedded network f and a fully connected network g, the embedded network f based on an attention mechanism integrating two-dimensional time sequence state data s _t Mapped to output state e _t Then in output state e _t As input by full connectionAnd connecting the network g to obtain Q value output estimated values corresponding to 3 actions of the gas valve position.

Specifically, storing state transition records obtained by interaction of an agent with an environment in the experience pool includes: the intelligent agent obtains feedback rewards corresponding to the action after selecting the action from the action space to execute according to the output state, and the intelligent agent transfers to a new state, and takes the combustion state data, the action, the feedback rewards and the new state as state transfer records to be stored in the experience pool; the action space is defined as a collection of 3 adjustment directions of a gas valve position, and the 3 adjustment directions are specifically: downward adjustment, upward adjustment and constant.

As shown in fig. 4, the specific steps for forming the output state after processing by the attention mechanism module are as follows:

Z _t ＝{z ₁ ,z ₂ ,...,z _ω }＝W _z s _t +b _z

wherein,representing the linear characteristic at time i, i=1,..; m is the length of each piece of combustion state data, m=5; />Is a neural network parameter->D is the dimension of the embedded feature, d=64, for the bias parameter;

H _t ＝{h ₁ ,h ₂ ,...,h _ω }＝Z _t +C

K _t ＝{k ₁ ,k ₂ ,...,k _ω }＝W _k H _t +b _k

V _t ＝{v ₁ ,v ₂ ,...,v _ω }＝W _v H _t +b _v

A _t ＝(a ₁ ,a ₂ ,...,a _ω )＝softmax(q ^T K _t )

The method for obtaining the Q value output estimated value comprises the following steps: and then output state e _t Inputting a full-connection network g, setting the node number of an output layer to be 3, and respectively outputting Q value output estimated values corresponding to three actions of increasing the gas valve position, reducing the gas valve position and keeping the gas valve position unchanged, namely:

Q＝{Q ₁ ,Q ₂ ,Q ₃ }＝MLP(e _t )。

in this embodiment, the training of the Q network using an empirical playback mechanism includes:

where β is a state transition sampling constant, β=0.4, τ in this embodiment _i For the state transition record at time i,recording τ for state transitions _i Is a sampling priority of (1);

The expression of the loss function is:

wherein L is _TD As a time sequence difference error, L _E For the purpose of large pitch class loss,lambda is the regularization loss of L2 ₁ 、λ ₂ Is the weight of the corresponding loss function.

The expression of the time sequence differential error is as follows:

L _TD ＝Y _t -Q(s _t ,a _t ；θ)

The expression of the large-space classification loss is as follows:

wherein A represents an action space, s _t The combustion state of the hot air furnace at time t is shown, a is any one of the actions in the action space A, Q (s _t A; θ) indicates that the Q network with parameter θ is in the input state s _t The lower action is the output of a,indicating time tExpert action, i.e. s in the state transition record _t The corresponding valve position adjustment action under the state; />And b is a super-parameter greater than zero, representing a penalty function when the action output by the model is inconsistent with the expert action reflected in the record.

The expression of the L2 regularization loss is:

wherein θ is a parameter of the Q network.

In this embodiment, the specific method for the Q network to synchronize the updated parameters to the target network is: and synchronizing the parameters of the Q network to the target network when the sampling times reach the preset target network parameter updating frequency.

The method for updating the parameters of the Q network by adopting the gradient descent method comprises the following steps:

and calculating importance sampling weights of all state transition records according to the sampling probability:

in the method, in the process of the invention,for the importance sampling constant +.in this embodiment>

Updating a gradient value according to the average importance sampling weight of the B-state transition record obtained by sampling:θ＝θ-λΔ _θ l, where delta _θ L(τ _i ) Representing state transition records τ _i And updating the sampling priority of the state transition record according to the time sequence difference error, wherein lambda is the learning rate of the corresponding loss function gradient value:

in order to evaluate the performance of the Attention-MLP model, the proposed model is realized by adopting software such as Python 3.11.3, JAX 0.4.8 and the like, a GNU/Linux operating system is installed, a computer with an Intel (R) Core (TM) i7-10700 CPU and a 16GB memory is used for completing calculation, and Wandb is used for recording experimental data. In terms of data, a set of stove combustion control data is obtained for training and testing. By dividing the data according to the burning period and eliminating the abnormal period, 27 burning periods are obtained, and the number of the burning periods is 7: the ratio of 3 randomly divides the training and testing sets (19 cycles for training phase and the remaining 8 for testing phase). Then 47739 state transitions are obtained in total according to the generation rules of the state data.

Determining a parameter θ of the Q network and a parameter θ of the target network using four different random number seeds ^- Four experiments were performed for each network structure. Fig. 5 to 7 show the variation of the statistical index in the training process, the Attention-MLP model is the reinforcement learning model based on the Attention mechanism proposed by the present invention, and the MLP is the reinforcement learning model only using the fully connected network.

As can be seen from the accuracy change curves of FIGS. 5 (a) - (b), as training proceeds, the accuracy of both models increases, and at the same time, the accuracy of the training set and the accuracy of the test set are close at the same time step, indicating that the Attention-MLP model does not have over-fit or under-fit. Fig. 6 shows the average loss value change of the model in the training process, and it can be seen that the overall loss is continuously reduced, which indicates that the model can effectively learn the data rule. Fig. 7 shows the average Q value of the model during training. It can be seen that the average Q values of all models eventually converge to around 8, which means that all models have similar estimates of Q values. Compared with the MLP model, the Attention-MLP model can pay Attention to important parts in the input state sequence, and meanwhile, the addition of position embedding enables the model to be optimal in solving quality in consideration of sequence order.

According to the embodiment, the intelligent optimized combustion control method for the blast furnace hot blast stove based on the deep reinforcement learning does not need to use monitoring instruments such as a flowmeter and a residual oxygen meter, does not need to explicitly provide the corresponding relation between the valve position size and detection equipment such as the flowmeter, and autonomously learns the implicit relation between the furnace burning state and the valve position adjustment parameter based on the deep reinforcement learning method according to the historical furnace burning operation. Furthermore, considering the difficulty of state feature representation, an Attention-based deep embedded network and fully connected network model is proposed. Experimental results show that the accuracy of the model reaches 86%, and the intelligent combustion optimization control requirement of the blast furnace hot blast stove can be met.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims

1. An intelligent combustion control method of a hot blast stove based on reinforcement learning and attention mechanism is characterized by comprising the following steps:

s3: acquiring real-time combustion data, and controlling the adjustment direction of a gas valve position of the hot blast stove according to the real-time combustion data by utilizing the Attention-MLP model;

2. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 2, wherein the intelligent combustion control method is characterized by comprising the following steps of: storing a state transition record obtained by interaction of an agent with an environment in the experience pool comprises: the intelligent agent obtains feedback rewards corresponding to the action after selecting the action from the action space to execute according to the output state, and the intelligent agent transfers to a new state, and takes the combustion state data, the action, the feedback rewards and the new state as state transfer records to be stored in the experience pool; the action space is defined as a collection of 3 adjustment directions of a gas valve position, and the 3 adjustment directions are specifically: downward adjustment, upward adjustment and constant.

3. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 1, wherein the intelligent combustion control method is characterized by comprising the following steps of: the specific steps for forming the output state after being processed by the attention mechanism module are as follows:

Z _t ＝{z ₁ ,z ₂ ,...,z _ω }＝W _z s _t +b _z

wherein,representing the linear characteristic at time i, i=1, …, ω, ω representing the time window length; m is the length of each piece of furnace data; />Is a neural network parameter->D is the dimension of the embedded feature, which is the bias parameter;

H _t ＝{h ₁ ,h ₂ ,...,h _ω }＝Z _t +C

K _t ＝{k ₁ ,k ₂ ,...,k _ω }＝W _k H _t +b _k

V _t ＝{v ₁ ,v ₂ ,...,v _ω }＝W _v H _t +b _v

in the method, in the process of the invention,the key vector and the value vector at the i-th moment are respectively represented, i=1, …, ω;a weight parameter for the embedded network; />Is a bias parameter;

A _t ＝(a ₁ ,a ₂ ,...,a _ω )＝softmax(q ^T K _t )

4. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 1, wherein the intelligent combustion control method is characterized by comprising the following steps of: the training of the Q network by adopting an experience playback mechanism comprises the following steps:

where β is a state transition sampling constant, τ _i State transition record for instant i, p _τi Recording τ for state transitions _i Is a sampling priority of (1);

5. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 4, wherein the intelligent combustion control method is characterized by comprising the following steps of: the expression of the loss function is:

6. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 5, wherein the intelligent combustion control method is characterized by comprising the following steps of: the expression of the time sequence differential error is as follows:

L _TD ＝Y _t -Q(s _t ,a _t ；θ)

wherein r is _t+1 Rewards at time t+1; is a discount coefficient;expressed in terms of the parameter theta ^- Input s in target network of (a) _t+1 Output action is +.>A corresponding accumulated expected return estimate; />To input state s in Q network _t+1 The action with the maximum expected return estimated output value is accumulated.

7. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 5, wherein the intelligent combustion control method is characterized by comprising the following steps of: the expression of the large-space classification loss is as follows:

wherein A represents an action space, s _t The combustion state of the hot air furnace at time t is shown, a is any one of the actions in the action space A, Q (s _t A; θ) indicates that the Q network with parameter θ is in the input state s _t The lower action is the output of a,representing expert action at time t, i.e. s in the state transition record _t The corresponding valve position adjustment action under the state; />And b is a super-parameter greater than zero, representing a penalty function when the action output by the model is inconsistent with the expert action reflected in the record.

8. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 5, wherein the intelligent combustion control method is characterized by comprising the following steps of: the expression of the L2 regularization loss is:

wherein θ is a parameter of the Q network.

9. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 4, wherein the intelligent combustion control method is characterized by comprising the following steps of: the specific method for the Q network to synchronize the updated parameters to the target network is as follows: and synchronizing the parameters of the Q network to the target network when the sampling times reach the preset target network parameter updating frequency.

10. The intelligent combustion control method for the hot blast stove based on the reinforcement learning and attention mechanism according to claim 5, wherein the intelligent combustion control method is characterized by comprising the following steps of: the gradient descent method comprises the following steps: and calculating importance sampling weights of all the state transition records according to the sampling probability, updating gradient values according to the average importance sampling weights of the B-state transition records obtained by sampling, and updating the sampling priority of the state transition records according to the time sequence differential error.