CN117408052B

CN117408052B - Coating control optimization method, device and equipment for evaporator and storage medium

Info

Publication number: CN117408052B
Application number: CN202311358274.3A
Authority: CN
Inventors: 毛震; 罗凡明
Original assignee: Nanqi Xiance Nanjing High Tech Co ltd
Current assignee: Nanqi Xiance Nanjing High Tech Co ltd
Priority date: 2023-10-18
Filing date: 2023-10-18
Publication date: 2024-07-09
Anticipated expiration: 2043-10-18
Also published as: CN117408052A

Abstract

The invention discloses a coating control optimization method, device and equipment for an evaporator and a storage medium. The optimization method comprises the following steps: establishing a film plating model of the evaporator, wherein the film plating model comprises a plurality of evaporation boats, and each evaporation boat corresponds to a test point; inputting a coating function of an evaporator to a coating model; wherein the coating function comprises a maximum wire feeding speed function, an evaporation speed function, a steam concentration fusion function and a coating thickness conversion function; and optimizing the wire feeding parameters of each evaporation boat by adopting MAPPO algorithm, so that the absolute value of the difference value between the thickness of the film coated film model and the target thickness according to the optimized wire feeding parameters is smaller than a preset threshold value. According to the technical scheme, the wire feeding parameters of the evaporation boats can be optimized, and the cooperative control of the evaporation boats is realized.

Description

Coating control optimization method, device and equipment for evaporator and storage medium

Technical Field

The invention relates to the technical field of coating control of a vapor deposition machine, in particular to a method, a device, equipment and a storage medium for optimizing coating control of the vapor deposition machine.

Background

Vacuum evaporation, abbreviated as evaporation, refers to a process method of evaporating and vaporizing a coating material (or film material) by a certain heating evaporation mode under vacuum condition, and allowing particles to fly to the surface of a substrate to form a film by condensation. The vacuum evaporation can be used for coating devices such as display screens, and can also be used for evaporating anti-counterfeiting mark coatings, textile coatings and the like, so that the application is wide.

The coating thickness of the coating film of the existing evaporator is controlled by controlling the wire feeding angles and the wire feeding speeds of a plurality of evaporation boats. The wire feed angle and wire feed speed of the evaporation boat are controlled by a Proportional-integral-differential (INTEGRAL DERIVATIVE, PID) regulator, with the goal of bringing the film thickness closer to the target thickness. The basic principle of the control is as follows: and optimizing the wire feeding angle and the wire feeding speed of the evaporation boat output by the PID regulator according to the difference value of the film thickness measured by the measuring point and the target film thickness, so that the film thickness is more approximate to the target thickness.

However, the PID regulator is a single-input single-output control strategy, and is relatively difficult to control in cooperative control of a plurality of evaporation boats, so that the coating film is difficult to control to reach the target thickness.

Disclosure of Invention

The invention provides a coating control optimization method, device and equipment for an evaporator and a storage medium, which are used for solving the problem that the cooperative control difficulty of a plurality of evaporation boats is high and the coating is difficult to control to reach the target thickness.

According to an aspect of the present invention, there is provided a coating control optimization method for an evaporation machine, the method comprising:

Establishing a coating model of an evaporator, wherein the coating model comprises a plurality of evaporation boats, and each evaporation boat corresponds to a test point;

Inputting a coating function of the evaporator to the coating model; wherein the coating function comprises a maximum wire feeding speed function, an evaporation speed function, a steam concentration fusion function and a coating thickness conversion function;

and optimizing the wire feeding parameters of each evaporation boat by adopting MAPPO algorithm, so that the absolute value of the difference value between the thickness of the film coated film model and the target thickness according to the optimized wire feeding parameters is smaller than a preset threshold value.

Optionally, the inputting the coating function of the evaporator to the coating model includes:

inputting a maximum wire feeding speed function, an evaporation speed function, a steam concentration fusion function and a coating thickness conversion function into the coating model;

And inputting a disturbance function into the evaporation speed function.

Optionally, the optimizing the wire feeding parameter of each evaporation boat by adopting MAPPO algorithm includes:

Updating and optimizing parameters of the strategy network according to the tuple data corresponding to all the test points until the preset updating times are reached; the tuple data comprises a current observed quantity, a current wire feeding parameter, a current global state, an environmental rewarding value, a next observed quantity and a next wire feeding parameter; the current observed quantity comprises the current thickness and the target thickness of a test point before coating by the current wire feeding parameter; the next observed quantity comprises the current thickness and the target thickness of the test point after coating through the current wire feeding parameter; the next wire feeding parameter is updated wire feeding parameter according to the next observed quantity;

and determining the optimized wire feeding parameters of each test point according to the current thickness and the target thickness of each test point through the updated strategy network.

Optionally, before updating and optimizing the parameters of the policy network according to the tuple data corresponding to all the test points, the method further includes:

Inputting the current thickness of the test point corresponding to each evaporation boat and the target thickness corresponding to the test point into the strategy network so that the strategy network outputs wire feeding parameters;

Inputting the wire feeding parameters output by the strategy network into the film coating model so that the film coating model performs film coating according to the wire feeding parameters to update the thickness of the test point;

inputting the updated thickness and the target thickness of each test point into an evaluation network to generate a global state; the global state comprises a difference value between the current thickness of each test point and the target thickness;

Obtaining an environmental rewarding value according to the global state and the target state;

And storing the current observed quantity, the current wire feeding parameter, the current global state, the environmental rewarding value, the next observed quantity and the next wire feeding parameter corresponding to each test point as the metadata data.

Optionally, the updating and optimizing the parameters of the policy network according to the tuple data corresponding to all the test points includes:

And updating parameters of the strategy network according to the tuple data corresponding to all the test points by adopting a decision gradient.

Optionally, the updating parameters of the policy network by using a decision gradient according to the tuple data corresponding to all the test points includes:

obtaining a state quantization value corresponding to the global state according to the global state;

according to the state quantization value and the environmental rewards, an action advantage value is obtained;

inputting all state data in the global state into a first strategy network and a second strategy network according to a decision gradient so as to obtain a similarity value;

updating parameters of the strategy network according to the action dominance value and the similarity value;

And updating and optimizing the evaluation network, updating the environmental rewarding value through the updated evaluation network, and returning to the execution step to obtain the action dominant value according to the state quantization value and the environmental rewarding value until the preset updating times are reached.

Optionally, after the optimizing the wire feeding parameter of each evaporation boat by adopting the MAPPO algorithm, the method further comprises:

and calibrating the coating function according to the actual input value or the actual output value of the coating function.

According to another aspect of the present invention, there is provided a coating control optimizing apparatus of a coating machine, the coating control optimizing apparatus of a coating machine comprising:

the model construction module is used for building a coating model of the evaporator, wherein the coating model comprises a plurality of evaporation boats, and each evaporation boat corresponds to a test point;

the environment construction module is used for inputting a coating function of the evaporator to the coating model; wherein the coating function comprises a maximum wire feeding speed function, an evaporation speed function, a steam concentration fusion function and a coating thickness conversion function;

and the parameter optimization module is used for optimizing the wire feeding parameters of each evaporation boat by adopting a MAPPO algorithm so that the absolute value of the difference value between the thickness of the film coated film model and the target thickness according to the optimized wire feeding parameters is smaller than a preset threshold value.

According to another aspect of the present invention, there is provided an electronic apparatus including:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the coating control optimizing method of the coating machine according to any embodiment of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for implementing the method for optimizing coating control of a coating machine according to any one of the embodiments of the present invention when executed by a processor.

According to the technical scheme, a film coating model comprising a plurality of evaporation boats is established, a film coating function of a vapor deposition machine is input into the film coating model, and a MAPPO algorithm is adopted to optimize the wire feeding parameters of each evaporation boat, so that cooperative control training can be performed on a plurality of test points, namely, the wire feeding parameters corresponding to each test point are optimized according to the film coating thicknesses of the plurality of test points. Therefore, the wire feeding parameters of the evaporation boats can be optimized, and the cooperative control of the evaporation boats is realized. And the optimization target is that the absolute value of the difference value between the thickness of each test point after coating according to the optimized wire feeding parameter and the target thickness is smaller than a preset threshold value. Therefore, cooperative control of a plurality of evaporation boats can be realized, and the thickness of each evaporation boat after film coating according to the optimized wire feeding parameters is equal to or close to the target thickness, so that the target thickness of the film coating is obtained, and the accuracy of the film coating is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an optimization method for controlling a film plating of an evaporator according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a film coating model of an evaporator according to an embodiment of the present invention;

FIG. 3 is a flow chart of another method for optimizing coating control of an evaporator according to an embodiment of the invention;

FIG. 4 is a flow chart of another method for optimizing coating control of an evaporator according to an embodiment of the invention;

fig. 5 is a schematic structural diagram of a film plating control optimizing device of an evaporation machine according to an embodiment of the present invention;

Fig. 6 is a schematic structural diagram of an electronic device for implementing the method for optimizing coating control of an evaporator according to an embodiment of the invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Aiming at the problems that when the coating of the evaporator is controlled, the cooperative control difficulty of a plurality of evaporation boats is high and the coating is difficult to control to reach the target thickness, the embodiment provides a coating optimizing method of the evaporator. Fig. 1 is a flowchart of a method for optimizing coating control of a coating machine according to an embodiment of the present invention, where the method may be performed by a coating control optimizing device of a coating machine, and the coating control optimizing device of a coating machine may be implemented in hardware and/or software, and the coating control optimizing device of a coating machine may be configured in an upper computer, a computer, or other control devices. As shown in fig. 1, the method includes:

S110, a film plating model of the evaporator is built, wherein the film plating model comprises a plurality of evaporation boats, and each evaporation boat corresponds to a test point.

The evaporation boat is an evaporation vessel, and the evaporation material is melted and evaporated at high temperature by evaporating the evaporation material on the evaporation boat, so that a required film is deposited on an object, and film coating is realized. The test points are thin film deposition positions corresponding to the evaporation boats.

Fig. 2 is a schematic structural diagram of a film plating model of an evaporation machine according to an embodiment of the present invention, as shown in fig. 2, a plurality of evaporation boats 101 are disposed in the film plating model, each evaporation boat 101 corresponds to a test point 102, and each evaporation boat 101 corresponds to a wire feeding angle 103 and a wire feeding speed 104. The wire feed angle 103 is an angle for feeding the vapor deposition material to the evaporation boat 101, and the wire feed speed 104 is a speed for feeding the vapor deposition material to the evaporation boat 101. By arranging the evaporation boats 101 in the film plating model, the evaporation boats can be coordinated and controlled, so that film plating parameters (such as wire feeding angle and wire feeding speed) of the evaporation boats can be optimized at the same time, and film plating control of the evaporation boats can be optimized at the same time.

S120, inputting a coating function of an evaporator to a coating model; wherein the coating function comprises a maximum wire feed speed function, an evaporation speed function, a steam concentration fusion function and a coating thickness conversion function.

The maximum wire feeding speed is limited by the wire feeding angle, the wire feeding angles are different, the temperatures of the evaporation boat areas are different, and the maximum wire feeding speed is limited by the temperatures of the evaporation boat areas. The evaporation rate function refers to different wire feed speeds corresponding to different evaporation rates, e.g., the greater the wire feed speed, the greater the evaporation rate. The steam concentration fusion function is used for fusing the evaporated steam concentrations of a plurality of adjacent boats to obtain the corresponding steam concentration near each test point. The coating thickness conversion function refers to converting the vapor concentration of each test point to a coating thickness, e.g., the greater the vapor concentration, the greater the coating thickness. Therefore, by inputting the wire feeding speed and the wire feeding angle, the wire feeding angle limits the maximum wire feeding speed, the evaporation speed function can reduce the evaporation speed corresponding to the wire feeding speed, the steam concentration fusion function can reduce the steam concentration of each test point, and the film thickness function can determine the film thickness of each test point according to the steam concentration. Therefore, the vapor plating environment can be restored, and corresponding film thickness can be output every time a group of wire feeding angles and wire feeding speeds are input.

Specifically, a maximum wire feeding speed function, an evaporation speed function, a steam concentration fusion function and a coating film thickness conversion function of the evaporator are input into the coating film model, so that the evaporation environment in the evaporation process can be reduced, the evaporation process of the evaporation boat is reduced, and the evaporation parameters can be conveniently optimized according to the coating film thickness of the test point corresponding to the evaporation boat. And by inputting a coating function of the evaporator to the coating model, the coating model can be used for testing whether the optimized evaporation parameters meet the requirements, namely testing whether the thickness of the coating at the test point is equal to or close to the target thickness after the evaporation is performed by using the optimized evaporation parameters.

And S130, optimizing the wire feeding parameters of each evaporation boat by adopting a MAPPO algorithm, so that the absolute value of the difference value between the thickness of the film coated film model and the target thickness according to the optimized wire feeding parameters is smaller than a preset threshold value.

The wire feed parameters include, for example, a wire feed angle and a wire feed speed. MAPPO is a multi-agent nearest strategy optimization deep reinforcement learning algorithm, and can perform cooperative control training on a plurality of test points, namely optimizing wire feeding parameters corresponding to each test point according to the film thickness of the plurality of test points. Therefore, the wire feeding parameters of the evaporation boats can be optimized, and the cooperative control of the evaporation boats is realized.

Specifically, the yarn feeding parameters of each evaporation boat are optimized by adopting MAPPO algorithm, for example, each evaporation boat is used as an intelligent agent. The intelligent body inputs the film coating thickness and the target thickness of a certain test point, outputs the film coating thickness and the target thickness as the wire feeding angle and the wire feeding speed of the evaporation boat, and the optimization target is that the absolute value of the difference value between the film coating thickness and the target thickness of each test point according to the optimized wire feeding parameters is smaller than a preset threshold value. Therefore, cooperative control of a plurality of evaporation boats can be realized, and the thickness of each evaporation boat after film coating according to the optimized wire feeding parameters is equal to or close to the target thickness, so that the target thickness of the film coating is obtained, and the accuracy of the film coating is improved.

On the basis of the above technical solution, fig. 3 is a flowchart of another method for optimizing coating control of a vapor deposition machine according to an embodiment of the present invention, optionally, referring to fig. 3, the method for optimizing coating control of a vapor deposition machine includes:

s210, a film plating model of the evaporator is built, wherein the film plating model comprises a plurality of evaporation boats, and each evaporation boat corresponds to a test point.

S220, inputting a maximum wire feeding speed function, an evaporation speed function, a steam concentration fusion function and a coating thickness conversion function into the coating model.

S230, inputting a disturbance function into the evaporation speed function.

Specifically, in a real vapor deposition process, there are problems of power disturbance and transfer function disturbance. The power disturbance refers to that the motor has fluctuation to a certain extent when controlling the wire feeding angle and speed, and the disturbance function is input into the evaporation speed function to perform disturbance to a certain extent on the input value (wire feeding speed) of the evaporation speed function, so as to simulate the disturbance effect of power. Therefore, when the film coating model adopts the optimized wire feeding parameters for film coating, the thickness of the test point can be closer to the real situation, and the optimization of the wire feeding parameters is facilitated, so that the accuracy of the optimization of the wire feeding parameters can be improved.

S240, updating and optimizing parameters of the strategy network according to the tuple data corresponding to all the test points until the preset updating times are reached; the tuple data comprises a current observed quantity, a current wire feeding parameter, a current global state, an environmental rewarding value, a next observed quantity and a next wire feeding parameter; the current observed quantity comprises the current thickness and the target thickness of a test point before coating by the current wire feeding parameter; the next observed quantity comprises the current thickness and the target thickness of the test point after coating by the current wire feeding parameter; the next wire feeding parameter is updated according to the next observed quantity.

The target thickness of each test point can be the same or different.

Specifically, the MAPPO algorithm includes a policy network and an evaluation network. By inputting the current observed quantity (current thickness and target thickness of test point film) of each test point into the strategy network, the strategy network can output corresponding actions (current wire feeding parameters). The current wire feeding parameters of each test point are input into a corresponding evaporation boat in a coating model, the coating model carries out coating according to the current wire feeding parameters output by a strategy network, the coating thickness of the test point is updated, and the next observed quantity (the current thickness and the target thickness of the test point after coating through the current wire feeding parameters) is obtained. And inputting the next observed quantity into an evaluation network, and calculating the difference value between the film thickness of the updated test point and the corresponding target thickness by the evaluation network. The difference value corresponding to each test point forms a difference value list, namely the current global state. And obtaining the environmental reward value according to the current global state and the target global state. The next observed quantity is input to the policy network, which outputs the updated actions (updated wire feed parameters). Therefore, the tuple data corresponding to each test point can be obtained, the strategy network can optimize parameters of the strategy network according to the tuple data of all the test points, centralized training is realized, cooperative training of a plurality of evaporation boats can be realized, and optimization of cooperative control of the plurality of evaporation boats is facilitated.

S250, determining optimized wire feeding parameters of each test point according to the current thickness and the target thickness of each test point through the updated strategy network, so that the absolute value of the difference value between the thickness of the film coated film model and the target thickness according to the optimized wire feeding parameters is smaller than a preset threshold value.

Specifically, after updating and optimizing the strategy network, the current thickness and the target thickness of the coating film of each test point are input into the updated strategy network, and the optimized wire feeding parameters are output. Therefore, the optimization of the wire feeding parameters is realized, the optimization of the wire feeding parameters of a plurality of evaporation boats is realized, and the cooperative control of the evaporation boats is realized.

On the basis of the above technical solution, optionally, before updating and optimizing parameters of the policy network according to the tuple data corresponding to all the test points, the method further includes:

and a1, inputting the current thickness of the test point corresponding to each evaporation boat and the target thickness corresponding to the test point into a strategy network so that the strategy network outputs wire feeding parameters.

Specifically, by inputting the current observed quantity (the current thickness and the target thickness of the test point film layer) of each test point into the policy network, the policy network can output the corresponding action (the current wire feeding parameters). Thus, the wire feeding parameters corresponding to each test point can be obtained.

And a2, inputting the wire feeding parameters output by the strategy network into a coating model so that the coating model performs coating according to the wire feeding parameters to update the thickness of the test point.

Specifically, the current wire feeding parameters of each test point are input into a corresponding evaporation boat in a coating model, the coating model carries out coating according to the current wire feeding parameters output by a strategy network, the coating thickness of the test point is updated, and the next observed quantity (the current thickness and the target thickness of the test point after coating through the current wire feeding parameters) is obtained.

Step a3, inputting the updated thickness and the target thickness of each test point into an evaluation network to generate a global state; the global state comprises a difference value between the current thickness of each test point and the target thickness.

Specifically, the next observed quantity is input into an evaluation network, and the evaluation network calculates the difference value between the film thickness of the updated test point and the corresponding target thickness. The difference value corresponding to each test point forms a difference value list, namely the current global state. Therefore, the difference between the film thickness after film coating according to the wire feeding parameters output by the strategy network and the corresponding target thickness is convenient to determine, and whether the wire feeding parameters output by the strategy network meet the requirements is convenient.

And a4, obtaining an environmental rewarding value according to the global state and the target state.

Specifically, the global state is s, the target state is s _target, and the formula of the environmental reward value isThus, an environmental reward value may be derived from the current global state and the target state. Thus, the strategy network is conveniently updated and optimized according to the environmental rewarding value, if the environmental rewarding value is high, the action output by the strategy network is better, and the optimization of the wire feeding parameters is still carried out according to the current parameters; if the environmental reward value is lower, the action output by the strategy network is poorer, and the strategy network updates parameters so as to better optimize the wire feeding parameters.

And a step a5, storing the current observed quantity, the current wire feeding parameter, the current global state, the environmental rewarding value, the next observed quantity and the next wire feeding parameter corresponding to each test point as metadata.

Specifically, by storing the tuple data, the policy network can be conveniently optimized according to the tuple data, and the parameters of the policy network can be conveniently updated and optimized, so that the optimization of the wire feeding parameters is facilitated.

Optionally, updating and optimizing parameters of the policy network according to the tuple data corresponding to all the test points, including:

Specifically, updating parameters of the policy network by adopting a decision gradient means that the parameters of the policy network are converged to be optimal based on gradient learning, and the decision gradient can determine the updating direction of the parameters of the policy network, so that the optimization of the parameters of the policy network is always improved towards a good direction each time, and the optimization efficiency of the parameters of the policy network is improved.

Optionally, updating parameters of the policy network according to the tuple data corresponding to all the test points by using a decision gradient, including:

and b1, obtaining a state quantization value corresponding to the global state according to the global state.

Specifically, after the strategy network outputs the wire feeding parameters according to the observed quantity (the thickness and the target thickness of the coating film of the test point), the coating film model updates the thickness of the coating film of the test point according to the wire feeding parameters, and then the strategy network continuously optimizes and outputs the wire feeding parameters according to the updated thickness and the target thickness of the coating film, and the process is repeated to obtain the multi-component data. All the tuple data are in the data buffer pool, the continuous tuple data are extracted from the data buffer pool, and all the global states s in the continuous tuple data are input into a critic network (evaluation network) to obtain a state quantization value V.

And b2, according to the state quantization value and the environmental rewards, obtaining an action dominant value.

Specifically, the action dominance value represents a dominance of performing the current action. Action dominance value a _t = R-V, V representing the current state quantization value, R representing the action scoring value for each of the T time steps. R [ T ] =r [ T ] +γ [ t+1] +γ ²*r[t+1]+...+γ^T-t+1*r[T-1]+γ^T-t x V, where T is the first time step, T is the last time step, γ is the prize discount rate, typically 0.95. By calculating the motion dominance value, the dominance of the current motion can be determined to determine whether to continue optimization in the current motion direction.

And b3, inputting all state data in the global state into the first strategy network and the second strategy network according to the decision gradient so as to obtain a similarity value. The similarity value represents the similarity of action distribution corresponding to actions adopted in the first strategy network and actions adopted in the second strategy network.

Specifically, the first policy network is actor-old network, which is policy network using data; the second policy network is actor-new network, which is the policy network used to update parameters. In order to accelerate the solution of the decision gradient algorithm, an off-pole (off-pole) mode is adopted for learning, and the off-pole is realized by adopting actor-old network and actor-new network. off-policy refers to the fact that the actor-old policy network used for sampling data and the actor-new policy network used for updating parameters are not the same network. The actor-old network parameters are set to θ ', i.e., the actor-old network is represented by the parameter θ ', the actor-new network is represented by the parameter θ, the actor-new network is represented by the parameter θ, and the parameters θ and θ ' are in a matrix form, for example. Therefore, for the samples generated by the parameter θ', the gradient corresponding to the maximized cumulative prize is expressed asThe adopted sample is generated by a parameter theta', and the optimized parameter is theta, wherein tau represents the control round of the evaporator within a certain time (300 s for example), and R represents the action dominant value.

Assuming that the distribution of the parameter θ and the parameter θ' are not very different, the policy gradient is expressed as Where p _θ (') represents the probability distribution of using actor-new network to control the evaporation boat to derive one round of control strategy, and p _θ′ (τ) represents the probability distribution of using actor-old network to control the evaporation boat to derive one round of control strategy.

Clipping the decision gradient to ensure the stability of decision updating, wherein the gradient is expressed as Wherein p _θ(a_t|s_t) represents the probability distribution of action a _t output by the actor-new network in state s _t, denoted by prob 1; p _θ′(a_t|s_t) represents the probability distribution of the actor-old network outputting action a _t in state s _t, using prob 2.

All state data in the stored global state s are input into actor-old network and actor-new network, the mean value and variance of normal distribution are output, probability distribution prob1 and prob2 of normal distribution are obtained, and actions action1 and action2 are obtained from normal distribution through sampling. The log probability log_prob is calculated according to the action corresponding to the probability distribution of the normal distribution, and is represented by log_prob1 (log probability corresponding to action 1) and log_prob2 (log probability corresponding to action 2), the difference between log_prob1 and log_prob2 represents the similarity of the action distribution, the ratio is used for representing, and the calculation formula is ratio=log_prob1-log_prob2.

And b4, updating parameters of the strategy network according to the action dominance value and the similarity value.

Specifically, the loss function a_loss=mean (min (ratio At, clip (ratio, 1- ζ,1+ζ) At)) is calculated and then back-propagated, updating actor-new network, where ζ is a constant between 0-0.1, for limiting the similarity value.

The actor-new parameter θ is updated by the decision gradient. After updating several times, the θ' parameter of actor-old network is updated to θ. And (3) using an optimized actor-new network and a film coating model to interact, wherein the actor-new network acquires the film coating thickness and the target thickness of a test point corresponding to the evaporation boat and the global state s, and outputs the value of an action (wire feeding parameter), wherein the wire feeding parameter is a two-dimensional array and represents the control of the angle and the evaporation speed of the evaporation boat.

And b5, updating and optimizing the evaluation network, updating the environmental rewards through the updated evaluation network, and returning to the execution step to obtain the action dominant value according to the state quantized value and the environmental rewards until the preset updating times are reached.

Specifically, for critic networks (evaluation networks), the mean square error loss is adopted for updating, and the updating loss function of the parameter theta _C is thatWherein j is a positive integer greater than 0, N is the total deduction round, Q (s _j,a_j|θ_C) represents the output value of the critic network in the jth deduction step, y _j represents the future total prize value in the jth deduction step, and the calculation formula is thatWhere γ is the prize discount rate, r is the environmental prize value, and k is a positive integer greater than 0. After the loss function is defined, the back propagation algorithm is used to update the estimated network parameters. The environmental rewards value can be updated through the updated evaluation network, so that the strategy network is updated according to the updated environmental rewards value until the preset updating times are reached. Through multiple rounds of training, MAPPO algorithm can control a plurality of evaporation boats to cooperatively control, and control the thickness of the coating film to be closer to the target thickness.

On the basis of the above technical solution, fig. 4 is a flowchart of another method for optimizing coating control of a vapor deposition machine according to an embodiment of the present invention, optionally, referring to fig. 4, the method for optimizing coating control of a vapor deposition machine includes:

s310, a film plating model of the evaporator is built, wherein the film plating model comprises a plurality of evaporation boats, and each evaporation boat corresponds to a test point.

S320, inputting a maximum wire feeding speed function, an evaporation speed function, a steam concentration fusion function and a coating thickness conversion function into the coating model.

S330, inputting a disturbance function into the evaporation speed function.

S340, updating and optimizing parameters of the strategy network according to the tuple data corresponding to all the test points until the preset updating times are reached; the tuple data comprises a current observed quantity, a current wire feeding parameter, a current global state, an environmental rewarding value, a next observed quantity and a next wire feeding parameter; the current observed quantity comprises the current thickness and the target thickness of a test point before coating by the current wire feeding parameter; the next observed quantity comprises the current thickness and the target thickness of the test point after coating by the current wire feeding parameter; the next wire feeding parameter is updated according to the next observed quantity.

S350, determining optimized wire feeding parameters of each test point according to the current thickness and the target thickness of each test point through the updated strategy network, so that the absolute value of the difference value between the thickness of the film coated film model and the target thickness according to the optimized wire feeding parameters is smaller than a preset threshold value.

S360, calibrating the coating function according to the actual input value or the actual output value of the coating function.

Specifically, the actual input value or the actual output value of the coating function may be determined according to the performance index of the evaporation machine. In the film coating model, the input values and output values of the maximum wire feed speed function, the evaporation speed function, the vapor concentration fusion function and the film coating thickness conversion function range from 0 to 1. The calibration of the coating function is performed, for example, by multiplying the coating function by a coefficient, where the coefficient is a true maximum input value or a true maximum output value corresponding to the coating function.

Calibrating a coating function (a maximum wire feeding speed function, an evaporation speed function, a steam concentration fusion function and a coating thickness conversion function) according to an actual input value or an actual output value of the coating function, so that the range interval of the input value of the maximum wire feeding speed function is between 0 and a real maximum wire feeding angle, and the range interval of the output value is between 0 and a real maximum wire feeding speed; the range interval of the input value of the evaporation speed function is between 0 and the real maximum wire feeding speed, and the range interval of the output value is between 0 and the real maximum evaporation speed; the range of the input value of the coating thickness conversion function is between 0 and the actual maximum evaporation speed, and the range of the output value is between 0 and the actual maximum coating thickness. Therefore, the calibrated film coating function can reflect the situation in real use, and after the optimized wire feeding parameters are input into the film coating function, the measured film coating thickness can reflect the situation in real use, so that whether the optimized wire feeding parameters meet the requirements or not is judged, and the optimization accuracy is improved.

The embodiment also provides a device for optimizing and controlling the coating of the evaporator, and fig. 5 is a schematic structural diagram of the device for optimizing and controlling the coating of the evaporator. As shown in fig. 5, the apparatus includes: a model construction module 410, an environment construction module 420, and a parameter optimization module 430; the model building module 410 is configured to build a film plating model of the evaporator, where the film plating model includes a plurality of evaporation boats, each evaporation boat corresponding to a test point; the environment construction module 420 is used for inputting a coating function of the evaporator to the coating model; wherein the coating function comprises a maximum wire feeding speed function, an evaporation speed function, a steam concentration fusion function and a coating thickness conversion function; the parameter optimization module 430 is configured to optimize the wire feeding parameter of each evaporation boat by adopting a MAPPO algorithm, so that the absolute value of the difference between the thickness of the film coated film model and the target thickness according to the optimized wire feeding parameter is smaller than a preset threshold.

The coating control optimizing device of the evaporator provided by the embodiment of the invention can execute the coating control optimizing method of the evaporator provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the executing method.

Fig. 6 is a schematic structural diagram of an electronic device implementing the method for optimizing the control of the deposition process of the embodiment of the present invention, that is, fig. 6 is a schematic structural diagram of an electronic device 10 that may be used to implement the embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the evaporator coating control optimization method.

In some embodiments, the evaporator coating control optimization method can be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the above-described method of optimizing the coating control of the evaporator may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the evaporator coating control optimization method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. The coating control optimization method of the vapor deposition machine is characterized by comprising the following steps of:

Optimizing the wire feeding parameters of each evaporation boat by adopting MAPPO algorithm, so that the absolute value of the difference value between the thickness of the film coated film model and the target thickness according to the optimized wire feeding parameters is smaller than a preset threshold value;

the method for optimizing the wire feeding parameters of each evaporation boat by adopting MAPPO algorithm comprises the following steps:

2. The method of claim 1, wherein inputting the coating function of the evaporator to the coating model comprises:

And inputting a disturbance function into the evaporation speed function.

3. The method of claim 1, further comprising, prior to updating and optimizing parameters of the policy network based on the tuple data corresponding to all test points:

4. The method according to claim 1, wherein updating and optimizing parameters of the policy network according to the tuple data corresponding to all the test points comprises:

5. The method of claim 4, wherein the updating parameters of the policy network based on the tuple data corresponding to all test points using a decision gradient comprises:

6. The method of any one of claims 1-5, further comprising, after said optimizing the wire feed parameters of each evaporation boat using MAPPO algorithm:

7. The utility model provides a coating control optimizing device of coating by vaporization machine which characterized in that includes:

The parameter optimization module is used for optimizing the wire feeding parameters of each evaporation boat by adopting a MAPPO algorithm so that the absolute value of the difference value between the thickness of the film coated film model and the target thickness according to the optimized wire feeding parameters is smaller than a preset threshold value; the parameter optimization module is specifically configured to update and optimize parameters of the policy network according to the tuple data corresponding to all the test points until a preset update frequency is reached; determining optimized wire feeding parameters of each test point according to the current thickness and the target thickness of each test point through the updated strategy network;

The tuple data comprises a current observed quantity, a current wire feeding parameter, a current global state, an environmental rewarding value, a next observed quantity and a next wire feeding parameter; the current observed quantity comprises the current thickness and the target thickness of a test point before coating by the current wire feeding parameter; the next observed quantity comprises the current thickness and the target thickness of the test point after coating through the current wire feeding parameter; and the next wire feeding parameter is updated wire feeding parameter according to the next observed quantity.

8. An electronic device, the electronic device comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the coating control optimization method of the coating machine of any one of claims 1-6.

9. A computer readable storage medium storing computer instructions for causing a processor to execute the method for optimizing coating control of a coating machine according to any one of claims 1 to 6.