CN114943173A

CN114943173A - Ladle baking system based on deep reinforcement learning and combustion simulation coupling and optimization method

Info

Publication number: CN114943173A
Application number: CN202210388962.3A
Authority: CN
Inventors: 张琦; 卢厚杨
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-08-26
Anticipated expiration: 2042-04-13
Also published as: CN114943173B

Abstract

The invention discloses a ladle baking system based on deep reinforcement learning and combustion simulation technology coupling and an optimization method, wherein the method comprises the following steps: s1, establishing an actual baking geometric model of a steel ladle; s2, establishing and calculating a simulation model; s3, preprocessing data; s4, establishing a prediction model; s5, double-model coupling; and S6, intelligent optimization control. The method adopts a software modeling and control optimization mode to replace manual control optimization, thereby shortening the optimization period, improving the optimization efficiency, reducing the labor cost and achieving the purpose of quickly increasing the baking temperature of the steel ladle; and the software modeling optimization is the coupling modeling optimization by utilizing Python and combustion simulation software, so that the optimization can be completed under the condition of not needing a parameterized model, the problem that the internal parameters of the ladle are difficult to obtain under the actual working condition is avoided, and the modeling cost is greatly reduced.

Description

Ladle baking system based on deep reinforcement learning and combustion simulation coupling and optimization method

Technical Field

The invention relates to the technical field of optimization of the steel industry, in particular to a ladle baking system and an optimization method based on deep reinforcement learning and combustion simulation technology coupling.

Background

The steel industry is a column type industry of national economy, is a resource and energy intensive industry, and is also the key point for optimizing an energy system. At present, the energy structure of the steel industry mainly comprises low-grade ores and coal, the environmental pressure is high, and the development of green and low carbon is imperative. The digitization and the intelligence level of the steel industry are urgently needed to be improved through the advanced information technologies such as big data, cloud computing and artificial intelligence, the transformation and the upgrade of the industry are promoted through intelligent manufacturing, and the high-level development of the steel industry is promoted. Due to the requirements of production process and gas saving, the ladle roaster is often roasted to 1100 ℃ and 1200 ℃ in a short time, and meanwhile, the ladle roaster is ensured not to be extinguished due to low calorific value of the gas during working. In the existing baking process, the traditionally controlled ladle baking device roughly adjusts the gas flow through manual operation, so that the temperature is increased to a target value, the air-fuel ratio is not concerned, and the resource waste and the equipment damage are caused. In addition, because the existing baking control has the problems of weak capability, improper air-fuel ratio, insufficient combustion, low baking efficiency and the like, the baking time is often prolonged to reach the target temperature, and the method causes serious waste of coal gas and has the defects of slow production rhythm and the like; due to the complexity of distribution of the flow field and the temperature field in the steel ladle under the actual working condition on site, internal data are difficult to obtain to realize careful regulation and control. The traditional machine learning model is difficult to obtain enough related data for training and cannot be modeled and optimally controlled according to the requirements of the continuous casting production process.

Therefore, developing a new ladle baking modeling and optimization control method to realize efficient and accurate baking of the ladle is an intelligent process method which is urgently needed to be researched and developed and has great significance.

Disclosure of Invention

The invention aims to provide a ladle baking modeling and optimizing method based on deep reinforcement learning and combustion simulation technology coupling aiming at the defects of the prior baking technology, so as to solve the problem that the technological requirement of quickly reaching the baking temperature of a ladle to 1100 ℃ cannot be met and lower pollutant emission is generated during baking of the ladle in the prior art. Specifically, a software modeling and control optimization mode is used for replacing manual control optimization, so that the optimization period is shortened, the optimization efficiency is improved, the labor cost is reduced, and the purpose of rapidly increasing the baking temperature of the steel ladle is realized; and the software modeling optimization is the coupling modeling optimization by utilizing Python and combustion simulation software, so that the optimization can be completed under the condition of not needing a parameterized model, the problem that the internal parameters of the ladle are difficult to obtain under the actual working condition is avoided, and the modeling cost is greatly reduced.

In order to achieve the above object, the technical solution provided by the present invention is to provide a ladle baking system based on deep reinforcement learning and combustion simulation technology and an optimization method thereof, comprising the following steps:

s1, establishing a ladle actual baking geometric model based on the obtained ladle parameters;

the geometric model comprises a gas pipeline, a combustion-supporting gas pipeline, a steel ladle cover, an internal combustion area, a steel ladle lining and a steel ladle shell.

S2, establishing and calculating a simulation model: and (3) introducing the constructed actual ladle baking geometric model into combustion simulation software, calculating and analyzing to obtain multiple groups of parameters related to the temperature distribution of the ladle lining and pollutant discharge amount in the unsteady combustion process, and storing and outputting by using a script.

Specifically, in step S2, the process of calculating and analyzing the actual baking condition of the ladle by using the combustion simulation software includes the following steps:

s21, grid division: respectively modeling and meshing the steel ladle cover, the steel ladle body and the heat-insulating layer, encrypting the mesh of a central combustion area of the steel ladle body, and setting an inlet, an outlet and a geometric boundary of a model;

s22, calculating and solving: setting required boundary conditions and fluid-solid coupling heat exchange conditions, selecting an unsteady state solver, initializing an initial field, and starting calculation according to a set baking system (namely gas flow and combustion-supporting gas flow in a set time) after selecting a monitoring point;

and S23, obtaining the temperature distribution, the flame length and the pollutant discharge of the ladle lining in the ladle baking process by using the script, and storing.

S3, data preprocessing: and dividing the multiple groups of parameters into input data and test data to form a data set.

Specifically, in the pretreatment of this step, there are the following subdivision steps:

s31, checking data and clearing defective numbers, namely checking through data visualization processing (such as a distribution diagram, a violin diagram and the like), deleting abnormal points, and modifying time data according to actual baking time;

and S32, dividing the data set into a training set and a testing set according to a certain proportion according to the baking time sequence. The former part is a training set and is suitable for parameter training after the ladle model is established, and the latter part is a test set and is used for testing and evaluating the model precision;

and S33, carrying out standardization processing on the data set, eliminating dimension influence among indexes, and solving the comparability problem among data indexes.

S4, establishing a prediction model: and importing the data set into a deep learning model frame to obtain and verify the results of the temperature distribution, the flame length and the pollutant discharge amount of the steel ladle lining predicted by the deep learning model, and finally storing the model.

S5, double-model coupling: and connecting combustion simulation and Python software by using a CORBA interface, creating a combustion simulation server session, and sending a TUI command and a script to the server end in real time, so that the combustion simulation software automatically executes a ladle baking heating system and returns an operation result.

S6, intelligent optimization control: and (4) establishing a baking control intelligent body, importing the state by the deep learning prediction model established in the step S4, interacting (outputting decision and obtaining environment feedback) the combustion simulation model serving as an environment (in reinforcement learning, the content interacted with the intelligent body), and performing baking optimization control on the baking process in real time (the opening degree of the combustion-supporting gas valve and the gas valve is changed along with the baking process). And the optimized control result is used for regulating and controlling the actual baking process.

In detail, the intelligent optimization control in this step includes the following steps:

s61, state input: and (4) taking the deep learning prediction model established in the step (S4) as an input layer to import the state, and importing the obtained temperature distribution of the ladle lining, the flame length and the pollutant discharge amount as state quantities into the intelligent network established in the subsequent step according to the combustion-supporting gas flow and the gas flow parameters of the reference environment.

S62, establishing an intelligent agent network: an Actor strategy network and a Critic evaluation network are respectively established based on a deep learning network, a change decision of a better air-fuel ratio (ratio of combustion-supporting air flow and gas flow) is output after state quantity is input, and the strategy network weight and the evaluation network loss are adjusted after environment feedback reward is obtained.

S63, environment interaction: and (4) leading decisions such as combustion-supporting airflow and coal gas flow variation output by the step S62 into a simulation environment of combustion simulation through CORBA, obtaining environment reward and feeding back the environment reward to a Critic evaluation network, and inputting the environment states such as the fuel gas flow, the combustion-supporting airflow and the flame length into the deep learning prediction model established in the step S4. And repeating the steps S61-S63, and iterating to finally obtain the optimal combustion-supporting gas flow and the optimal coal gas flow (the optimal baking system).

S64, actual baking control: and feeding back the optimal baking system to the actual baking process for baking control.

Furthermore, in the research, a relatively fine simulation intelligent body is adopted to carry out ladle baking intelligent optimization control, the method is not limited to ladle baking control, can be expanded to the combustion control process of various hearths, and can realize Oxygen enrichment and MILD (Moderate & Intense Low Oxygen Dilution) combustion control with high requirements on combustion atmosphere under a Low-Oxygen condition.

The actual baking control has two control methods: automatic control of the system and manual control of an operator.

Furthermore, the automatic control regulates and controls the consumption of gas and combustion-supporting gas through the control system according to the intelligent optimization control result, the optimal air-fuel ratio can be realized, the combustion efficiency is improved, meanwhile, the consumption of gas is saved, the energy-saving effect is realized, and the pollutant discharge amount can be controlled.

Furthermore, the manual adjustment adjusts the gas consumption and the combustion-supporting gas consumption by the self-adjusting valve of an operator.

Compared with the prior art, the invention has the following optimization effects:

the method solves the problems that the temperature distribution inside the steel ladle is complex under the actual working condition, the actual baking condition is unknown and difficult to schedule, and a traditional machine learning model is difficult to acquire enough related data for training through a combustion simulation technology and a deep reinforcement learning intelligent body;

secondly, a mode of establishing intelligent agent optimization control by using software replaces manual operation, so that the optimization control period can be greatly shortened, the control efficiency is improved, and the labor cost is reduced; meanwhile, the obtained optimization scheme can be effectively ensured to be the optimal result, and then the ladle baking is efficiently and accurately carried out.

Drawings

Fig. 1 is a model of a ladle roaster established by operating the optimization control method provided by the present invention:

covering the steel ladle: 1. a gas pipeline, 2, a combustion-supporting gas pipeline, 3, a heat-insulating layer, 5, a gas inlet and 6, a combustion-supporting gas inlet; a ladle body: 4. a combustion zone; ladle village: 7. a steel ladle shell, 8, a permanent layer and 9, a working layer.

FIG. 2 is a flow chart of a modeling optimization method provided by the present invention;

FIG. 3 is a schematic diagram of an LSTM cell according to an embodiment of the present invention

FIG. 4 is a schematic diagram of an intelligent agent for baking according to an embodiment of the present invention

FIG. 5 is a schematic diagram of a baking optimization control structure based on deep reinforcement learning and combustion simulation according to an embodiment of the present invention

Detailed Description

In order to make the objects, techniques and methods of carrying out the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings. The specific embodiments described herein are merely illustrative of the invention and are not intended to be limiting.

Example (b):

as shown in fig. 1, the steel ladle adopted in this embodiment is of an injection type burner structure, a cone valve structure is arranged below the gas outlet, gas is sprayed out from a circular seam between a gas pipeline 1 and the cone valve, an internal combustion area can self-preheat external gas, a combustion-supporting gas pipeline 2 is arranged outside the gas pipeline, and heat-insulating cotton is additionally arranged below the steel ladle cover to play a heat storage function so as to preheat combustion-supporting gas above the steel ladle cover. The ladle lid below is the inside burning zone of ladle, and the burning zone outside is working layer 9, permanent layer 8 and ladle shell 7 respectively, and working layer, permanent layer, heat preservation 3 all use insulation material, and the ladle shell adopts the steel, and gas pipeline and combustion-supporting gas pipeline use high temperature resistance stainless steel material. After the roaster starts to work, coal gas enters the burner from a coal gas inlet 5 of a coal gas pipeline, the flow rate is increased due to the sudden reduction of the cross sectional area when the coal gas passes through the conical valve, combustion-supporting gas is sent into the burner through a combustion-supporting gas inlet 6 of a combustion-supporting gas pipeline by a fan, the coal gas and the combustion-supporting gas are mixed in the burner after passing through the conical valve and start to burn in a burning area 4, the combustion-supporting gas and blast furnace coal gas are preheated by matching with flue gas waste heat generated when an induced draft fan extracts and bakes, and the burning efficiency and the energy utilization rate are effectively improved; and because of adopting internal combustion form and oxygen-enriched combustion technology, the flame is sprayed out of the burner under the actions of blast furnace gas pressure, conical valve acceleration and gas thermal expansion, and the flame temperature is high.

When the simulated baking is carried out, the coal gas is sprayed and flows out from the circular seam at the upper part of the conical valve to drive the combustion-supporting gas to be fully mixed and combusted. The gas counterforce which is violently expanded during combustion is relied on, so that the high-temperature flue gas is rapidly sprayed out from the nozzle. Meanwhile, when the gas flows through the conical valve in a bypassing manner, a bypassing and fluid-shedding condition occurs, and a stable combustion backflow area is formed behind the blunt body structure of the conical valve. And the high-temperature flue gas in the reflux zone entrains the fuel gas and the combustion-supporting gas to ensure that the combustion is stably carried out.

Carrying out combustion simulation on the ladle baking process: because the baking process of the steel ladle is a closed high-temperature environment, an infrared temperature measuring instrument is often adopted in the industry for external measurement, and then the inside of the steel ladle is estimated, so that the temperature distribution in the steel ladle is difficult to accurately measure in real time. The simulation modeling by using the combustion simulation software can solve the problem of data source, simulate the real-time optimization effect and provide a basis for a ladle baking and heating system.

And establishing a geometric model through SolidWorks software according to the actual model of the roaster, dividing a network through ICEM software and encrypting a central combustion area. And importing the established grid into fluent software, establishing boundary conditions (inlet conditions, outlet conditions and heat insulation layer parameters) according to actual working conditions, simulating a ladle baking unsteady combustion process, and processing a simulation result to form a data set for a subsequent neural network baking prediction model.

Baking prediction using neural network model: the neural network has autonomous learning ability and strong prediction ability, the ladle baking intelligent prediction method can train, predict and identify the system operation state by using the neural network model, reduce the installation and maintenance cost of hardware equipment, and can predict the ladle baking process in real time. As an alternative embodiment, the neural network model adopted by the method may be a Back Propagation (BP) neural network model, but the prediction effect of the BP network is not ideal under a longer time period. Further, considering that the ladle baking process is a dynamic time sequence, as a preferred embodiment, the invention adopts a Long Short-Term Memory (LSTM) network with Memory capability to improve, and optimizes an intelligent prediction model for continuous casting ladle baking so as to realize automatic identification and effective prediction of air-fuel ratio in the baking process.

LSTM belongs to a variant of the Recurrent Neural Network (RNN): generally, only one simple node is in RNN, such as a σ function or a tanh function, while in LSTM networks, the structure in the hidden layer is changed from a simple node to a unit; this makes this variant not only retain most of the properties of the RNN model, but also solves the problems of gradient explosion and gradient disappearance during BP. Therefore, LSTM is suitable for handling practical tasks that are highly time-coupled.

As shown in fig. 3, represents an LSTM hidden layer unit. In the figure, each row carries a vector that is output from one node to the input of the other node. Circles indicate point-wise operationsSuch as vector addition, dot multiplication, and boxes represent the learning neural network layer. The merging of lines indicates a connection, while the crossing of lines indicates that its contents are being copied and the copy will go to a different location. The first row inside the hidden layer unit represents the unit state c, and characteristic parameters of historical time can be kept from x _t The arrows to the σ and tanh layers allow three gates to observe the cell state to train the input characteristic parameters: the memory cell comprises three parts, namely a forgetting gate f (a first sigma layer), an input gate i (divided into two parts including a sigma layer and a tanh layer), and an output gate o (laminated with the tanh layer after passing through the sigma layer and outputting). Each gate is a standard neuron in the recurrent neural network: t represents the activation of the input, output and forgetting gates at time step t; []Representing a weighted sum of differentiable functions; in the figure, the outputs of 3 control gates of the input gate, the forgetting gate and the output gate are respectively connected to one multiplication node, so as to control the input and output of information flow and the state of the cell unit: x is a radical of a fluorine atom _t The method is characterized in that the method is an input node and can correspond to a characteristic parameter, wherein the characteristic parameter is the temperature distribution of the steel ladle lining, the flame length and the pollutant discharge amount which are output by combustion simulation calculation: h is an output node which can correspond to the output state of an LSTM cell.

The operation of the LSTM unit can be represented by the following equation:

f _t ＝σ(W _f ·[h _t-1 ,x _t ]+b _f ) (1.1)

i _t ＝σ(W _i ·[h _t-1 ,x _t ]+b _i ) (1.2)

baking optimization control is carried out by using a deep reinforcement learning model: reinforcement learning is different from traditional supervised machine learning fitting methods, and learning is fast because new feedback (action is made and reward is obtained according to environment) is rapidly acquired and subsequent decisions are immediately influenced. And can adapt to environmental changes quickly. However, the weight calculation in reinforcement learning may not be able to be generalized due to lack of prediction capability, and cannot be performed in a continuous state-motion space problem. The weights may be fitted non-linearly by means of a deep learning model, i.e. a deep reinforcement learning model. The steel ladle baking depth reinforcement learning model can utilize an intelligent agent to train and control the operation state of a system, solves the problem that the internal condition of a baking process is difficult to observe, greatly reduces the manual operation cost, and can predict the steel ladle baking process in real time. Further, considering that the ladle baking process is a dynamic continuous action space, the invention adopts a near-end strategy Optimization network (PPO) suitable for continuous control problem to improve, so as to optimize the intelligent baking process of continuous casting ladle baking, so as to realize the intelligent Optimization control of the baking process and achieve the optimal air-fuel ratio.

As shown in fig. 4, it represents the ladle baking optimization control deep reinforcement learning PPO agent. In the training process, Environment (Environment) is ladle lining temperature distribution, flame length and pollutant discharge amount obtained through combustion simulation, data are collected as input data of an Actor network after pretreatment, and the Actor network outputs a group of gas flow and combustion-supporting gas change value (action) according to the input data (state), namely strategy (policy). And judging the quality of the output result of the strategy by a Critic network. And obtaining feedback (reward) and the environment state of the next time according to the collected data of the next period, and the Critic network gives a judgment (value) to the action of the Actor network in the previous period according to the feedback and the environment state of the next time, so as to guide the optimization of the Actor network.

The optimization control process of the PPO agent may be expressed as:

V _π (s)＝E _π [G _t |s _t ＝s,a _t ＝a] (1.6)

δ _t ＝r _t +γV(s _t+1 )-V(s _t ) (1.8)

in the formula, G _t Is the long-term return expectation, gamma is the long-term revenue discount factor, and r is the return at time t. V _π (s) is the value of state s, a _t Is the action taken during the policy update, s _t For this purpose, the state value is obtained.

Is the advantage value of the strategy at the time t and is used for comparing the advantages of the new strategy to the old strategy. λ is a weighted average factor, δ _t Is the time sequence differential error (TD-error), L _V Is the loss function (loss of value), L, of the critic network _P Is the loss function (policy loss) of the Actor network, r _t (θ) is the strategic ratio at time t, and ε is the hyperparameter. Pi _θ (a _t |s _t ) Probability of the latest strategy at time t, pi _θ,old (a _t |s _t ) Is the corresponding old policy.

The detailed steps of the optimization control of the PPO agent are as follows:

1. inputting environment information s (temperature distribution of a steel ladle lining, flame length and pollutant discharge amount) into an operator-new network to obtain parameters representing flow distribution: mu and sigma to obtain the Gaussian distribution represented by the sigma, namely action a (combustion-supporting gas flow and gas flow), and then action is copied to establish a new network action-old. Inputting the parameters into a combustion simulation environment to obtain an award r (weighted sum of the ladle village temperature variation, the flame length variation and the pollutant discharge) and a state s 'of the next moment, then establishing a list to store [ (s, a, r) … ], then inputting s' into an operator-new network, and circulating the step until the [ (s, a, r) … ] data quantity meets the optimization control requirement.

2. Inputting s 'obtained in the last step of the step 1 into a criticic network to obtain a v' (corresponding to the heating effect, energy saving and negative pollutant discharge amount in the state) value corresponding to the state.

3. Inputting the data stored in the step 1 into a critic network to obtain v' values of all states, and calculating the advantage value of the strategy at the moment t

4. Calculating L _V The criticc network is then updated by Back Propagation (BP).

5. Inputting the stored s list into the operator-old and operator-new networks to obtain Gaussian distribution normals 1 and 2, respectively, and inputting the stored actions as list actions into Gaussian distribution normals 1 and 2 to obtain prob1 and prob2 (i.e., pi) corresponding to each action _θ,old (a _t |s _t ) And pi _θ (a _t |s _t ) Get the importance weight (important weight), i.e., r _t (θ)。

6. Calculating L _P And then BP is carried out to update the operator-new network.

7. And 5-6 steps are circulated, after the precision requirement is met, the circulation is ended, and the latest operator-new network timing sequence difference error delta is used _t The operator-old network is updated.

8. And (5) circulating the steps 1-7, and outputting the optimal strategy.

As shown in FIG. 5, a baking optimization control structure based on deep reinforcement learning and combustion simulation techniques is shown. A ladle baking model is established by combustion simulation software, an environment is established for the intelligent body and interacts with the intelligent body, and the optimal gas flow and the combustion-supporting gas flow are output in real time to an actual production environment to guide the baking process.

While there have been shown and described fundamental principles and structural features of the invention and advantages thereof, it will be understood by those skilled in the art that the invention is not limited to the details of the foregoing preferred embodiments, but is capable of other embodiments and of being practiced in other specific forms without departing from the spirit or essential characteristics thereof; the present embodiments are therefore to be considered as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The ladle baking optimization method based on deep reinforcement learning and combustion simulation technology coupling is characterized by comprising the following steps of:

s1, establishing a ladle actual baking geometric model based on the acquired ladle parameters;

s2, establishing and calculating a simulation model: introducing the constructed ladle actual baking geometric model into combustion simulation software, calculating and analyzing, simulating a ladle baking unsteady combustion process, and obtaining multiple groups of parameters related to the temperature distribution of a ladle lining and pollutant discharge amount in the unsteady combustion process;

s3, data preprocessing: dividing the acquired multiple groups of parameters into training data and testing data to form a data set;

s4, establishing a prediction model: importing the data set into a deep learning model frame to obtain the results of the temperature distribution, flame length and pollutant discharge amount of the steel ladle lining predicted by the deep learning model, and storing the final prediction model after verification;

s5, double-model coupling: connecting combustion simulation software and Python software, creating a combustion simulation server session, and automatically executing a ladle baking heating system by the combustion simulation software and returning an operation result;

s6, intelligent optimization control: and (4) establishing a baking control intelligent agent, importing the state by the deep learning prediction model established in the step S4, interacting the combustion simulation model as an environment, performing baking optimization control on the baking process in real time, and finally feeding the optimal baking system back to the actual baking process.

2. The ladle baking optimization method based on deep reinforcement learning and combustion simulation technology coupling as claimed in claim 1, wherein the method comprises the following steps: the actual ladle baking geometric model in the step S1 comprises a gas pipeline, a combustion-supporting gas pipeline, a ladle cover, an internal combustion area, a ladle lining and a ladle shell.

3. The ladle baking optimization method based on deep reinforcement learning and combustion simulation technology coupling of claim 1, wherein: step S2 specifically includes the following steps:

s22, calculating and solving: setting required boundary conditions and fluid-solid coupling heat exchange conditions, selecting an unsteady solver, initializing an initial field, selecting a monitoring point, and then starting to calculate according to a set baking system, wherein the set baking system comprises gas flow and combustion-supporting gas flow within a set time;

and S23, obtaining the temperature distribution, flame length and pollutant discharge data of the ladle lining in the ladle baking process calculated by the simulation model.

4. The ladle baking optimization method based on deep reinforcement learning and combustion simulation technology coupling as claimed in claim 3, wherein the ladle baking optimization method is characterized in that after the ladle internal combustion data obtained by combustion simulation is preprocessed, a neural network model is used for prediction training to obtain updated lining temperature distribution, flame length and pollutant emission data.

5. The coupled ladle baking optimization method based on deep reinforcement learning and combustion simulation technology as claimed in claim 1 or 4, wherein the neural network model comprises a long-term and short-term memory network with memory capability, the input node corresponds to a characteristic parameter, and the characteristic parameter comprises ladle lining temperature distribution, flame length and pollutant discharge amount calculated and output by combustion simulation.

6. The coupled ladle baking optimization method based on deep reinforcement learning and combustion simulation technology as claimed in claim 1, wherein in the step 6, an improved near-end strategy optimization network is adopted to optimize the intelligent baking process of continuous casting ladle baking.

7. The ladle baking optimization method based on deep reinforcement learning and combustion simulation technology coupling according to claim 1 or 6, wherein the step 6 specifically comprises the following steps:

s61, state input: leading in the state by taking the deep learning prediction model established in the step S4 as an input layer, and leading in the intelligent network established in the subsequent step by taking the temperature distribution, the flame length and the pollutant discharge amount of the steel ladle lining as state quantities according to the combustion-supporting gas flow and the gas flow parameters of the reference environment;

s62, establishing an intelligent agent network: establishing an Actor strategy network and a criticic evaluation network based on a deep learning network respectively, outputting a better air-fuel ratio change decision after obtaining state quantity input, and adjusting strategy network weight and evaluating network loss after obtaining environment feedback reward;

s63, environment interaction: guiding the decision of the combustion-supporting gas flow and the coal gas flow variable quantity output by the S62 into a simulation environment of combustion simulation, obtaining environment reward and feeding back to a Critic evaluation network, inputting the environment states of the gas flow, the combustion-supporting gas flow and the flame length into the deep learning prediction model established in the step S4, repeating the steps S61-S63, carrying out iteration, and finally obtaining the optimal combustion-supporting gas flow and the optimal coal gas flow as an optimal baking system;

8. The ladle baking optimization method based on deep reinforcement learning and combustion simulation coupling as claimed in claim 7, wherein the baking optimization control of the baking process comprises controlling the opening degree of a combustion-supporting gas valve and a gas valve along with the baking process.

9. A ladle baking system based on deep reinforcement learning and combustion simulation technology is characterized by comprising:

a ladle geometric model established based on the acquired ladle parameters,

the ladle combustion simulation model is used for obtaining a plurality of groups of parameters for acquiring the temperature distribution correlation and pollutant discharge amount of the ladle lining in the unsteady combustion process through the established ladle geometric model;

a ladle baking deep learning prediction model used for deep learning processing of the acquired multiple groups of parameters,

the ladle baking depth reinforcement learning intelligent body optimization control model is used for optimizing the intelligent baking process of continuous casting ladle baking and realizing intelligent optimization control of the baking process;

a ladle baking model is established by combustion simulation software, an environment is established for the intelligent body and interacts with the intelligent body, and the optimal gas flow and the combustion-supporting gas flow are output in real time to an actual production environment to guide the baking process.