CN112488315B - Batch scheduling optimization method based on deep reinforcement learning and genetic algorithm - Google Patents

Batch scheduling optimization method based on deep reinforcement learning and genetic algorithm Download PDF

Info

Publication number
CN112488315B
CN112488315B CN202011373229.1A CN202011373229A CN112488315B CN 112488315 B CN112488315 B CN 112488315B CN 202011373229 A CN202011373229 A CN 202011373229A CN 112488315 B CN112488315 B CN 112488315B
Authority
CN
China
Prior art keywords
network
batch
genetic algorithm
workpiece
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011373229.1A
Other languages
Chinese (zh)
Other versions
CN112488315A (en
Inventor
谭琦
贾铖钰
余荣坤
孙晨皓
唐昊
夏田林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202011373229.1A priority Critical patent/CN112488315B/en
Publication of CN112488315A publication Critical patent/CN112488315A/en
Application granted granted Critical
Publication of CN112488315B publication Critical patent/CN112488315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the field of production and manufacturing scheduling, and discloses a batch scheduling optimization method based on deep reinforcement learning and a genetic algorithm, which comprises the following steps: establishing a mathematical model of a batch scheduling problem of the different workpieces; establishing a strategy model of the problem by adopting a pointer network; training a pointer network model by using an operator-critic algorithm; defining and initializing parameters of a genetic algorithm; optimizing an initial population of the genetic algorithm by using the trained pointer network; further optimizing the scheduling scheme by adopting a genetic algorithm; and the optimal scheme obtained by using the genetic algorithm is used as a production scheme for processing the workpiece by a batch processor. Compared with the traditional heuristic algorithm, the pointer network in the invention can obtain a better solution; and in the cross operation of the genetic algorithm, a novel cross mode is provided, and the optimization capability of the genetic algorithm can be improved on the basis of the scheduling scheme obtained by the pointer network, so that the performance of the scheme is further improved.

Description

Batch scheduling optimization method based on deep reinforcement learning and genetic algorithm
Technical Field
The invention belongs to the field of production and manufacturing scheduling, and particularly relates to a batch scheduling optimization method based on deep reinforcement learning and a genetic algorithm.
Background
The batch scheduling problem stems from burn-in operations used for final testing in the semiconductor manufacturing industry. In this operation, the integrated circuits are put in a high-temperature oven in batches, and a malfunction that may occur in an early stage of the integrated circuits is detected over a long period of time. Burn-in operations are often a bottleneck in semiconductor manufacturing because in final testing, its processing time is often longer than other operations. Therefore, it is important to efficiently schedule ovens (or machines) to greatly increase their utilization. At present, the batch scheduling problem exists not only in the semiconductor manufacturing industry, but also widely in most manufacturing industries, such as the foundry industry, furniture manufacturing industry, metal processing industry, aviation industry, pharmaceutical industry, and logistics freight. For most manufacturing industries, a reasonably designed scheduling strategy is also one of effective ways for improving the production efficiency and reducing the production cost. Therefore, the research on the batch scheduling problem has important practical significance for improving the production management level and obtaining higher economic benefit.
In recent years, deep neural networks based on data learning can discover the characteristics of the problem itself and thus be used to solve the problem. Therefore, the deep neural network provides a new direction for solving the combinatorial optimization problem. The existing deep neural network has little research and attention on solving the production and manufacturing scheduling problem, and the application of the deep neural network in the batch scheduling problem of different workpieces is not available.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a batch scheduling optimization method aiming at minimizing the total manufacturing time span when the workpiece to be processed has the size and the processing time difference in production.
The invention adopts the following technical scheme for solving the technical problems:
the invention relates to a batch scheduling optimization method based on deep reinforcement learning and a genetic algorithm, which comprises the following steps:
step I, establishing a mathematical model of a batch scheduling problem of different workpieces;
the differential workpiece lot scheduling problem is defined as follows, a workpiece set J = {1,2, …, n }, wherein the processing time of a workpiece J is p j The size of the work being s j (ii) a The volume of the batch processor is C, and the batch processor can simultaneously process a plurality of workpieces on the premise of meeting the volume constraint; the set of batches to be processed is K, wherein the processing time of the batch K is P k Equal to the maximum value of the workpiece processing time in lot k. X jk Representing a decision variable, X if workpiece j is in lot k jk =1, otherwise, X jk =0。Y k Representing a decision variable, if a batch k is established, Y k =1, otherwise, Y k =0。
According to the above definition, the batch scheduling problem for workpieces with different sizes on a single machine can be established as the following mathematical model:
an objective function:
Figure GDA0002902959180000021
constraint conditions are as follows:
Figure GDA0002902959180000022
Figure GDA0002902959180000023
Figure GDA0002902959180000024
Figure GDA0002902959180000025
Figure GDA0002902959180000026
Figure GDA0002902959180000027
step II, establishing a strategy model of the problem by adopting a pointer network;
step III, training a pointer network model by using an operator-critic algorithm;
step IV, defining and initializing parameters of the genetic algorithm: population size PopNum, maximum number of iterations T GA And the number n of workpieces, the number t of iterations which are finished at present GA =0;
V, optimizing an initial population of the genetic algorithm by using the pointer network trained in the step III;
step VI, solving the problem by adopting a genetic algorithm;
and step VII, using the optimal scheme obtained by the genetic algorithm as a production scheme for processing the workpiece by a batch processor.
Preferably, the step II of establishing the policy model of the problem by using the pointer network mainly includes the following steps:
the pointer network model is defined as follows, n represents the length of the encoder and decoder, and X = { X = { (X) } 1 ,x 2 ,…,x n Denotes a coded input workpiece information sequence, where x is input arbitrarily j All have x j =[s j ,p j ] T ,s j And p j Respectively showing the size and the processing time of the jth workpiece. e = { e = 1 ,e 2 ,…,e n Denotes the encoder's hidden layer state sequence, d = { d = } 1 ,d 2 ,…,d n Denotes the implicit layer state sequence of the decoder, y = { y = } 1 ,y 2 ,…,y n Denotes the final output sequence of the pointer network.
Step i, constructing a coding layer Network of the pointer Network, wherein the coding layer Network consists of a full connection Network layer and an RNN (RecurrentNeural Network) with an LSTM (Long Short-Term Memory) module;
step ii, constructing a decoding layer network of the pointer network, wherein the decoder network is composed of an RNN with an LSTM module;
and step iii, introducing a attention mechanism for selecting and sequencing the workpieces in the output sequence by the pointer network, wherein when the t-th workpiece is added to the output sequence, the selected probability of the rest workpieces is calculated as follows:
Figure GDA0002902959180000031
A(e,d t ;W 1 ,W 2 ,v)=softmax(u t ) (9)
preferably, the training of the pointer network model by using the operator-critic algorithm in the step III mainly includes the following steps:
step i, an actor network model adopted by the problem;
and (3) the model of the Actor network is the pointer network model established in the step (II).
Step ii, establishing a critic network model of the problem;
the structure of the criticic network consists of an encoder (RNN with LSTM module), an LSTM processing module and a decoder of a 2-layer ReLU fully-connected neural network.
Step iii, defining and initializing sample number B, total sample number D of training set and iteration number T in one training PTR = D/B, actor network parameter θ a Critic network parameter θ c And training times E, wherein the initial training times epoch =0 and the initial iteration times t PTR =0, number of trained batch samples i =0;
step iv, the workpiece information sequence x in the current batch of samples is processed i Obtaining an output sequence y via an actor network i
V, sequencing the workpiece information x in the current batch of samples i Obtaining a corresponding baseline value b through a critic network i
Step vi, making i = i +1, judging whether i < B is true, if so, skipping to execute the step iv, otherwise, skipping to the step vii;
step vii, making i =0, and solving the loss value of the operator network by using a monte carlo sampling approximation reinforcement learning algorithm, wherein the calculation formula is as follows:
Figure GDA0002902959180000032
step viii, using the mean square error as the loss value of the criticic network, and calculating the formula as follows:
Figure GDA0002902959180000041
step ix, optimizing parameters of the operator network and the critic network by using an Adam algorithm;
step x, let t PTR =t PTR +1, judgment t PTR <T PTR If yes, skipping to the step iv for execution, otherwise, executing the step xi;
step xi, order t PTR =0, epoch = epoch +1, judge epoch<E, judging whether the E is established, if so, executing the step iv, otherwise, jumping to the step xii;
step xii, obtaining a trained operator network model;
preferably, the step V of optimizing the initial population of the genetic algorithm by using the trained pointer network mainly includes the following steps:
step i, generating an individual in a population by adopting a real number coding mode and an LPT (Long Processing Time) heuristic rule;
step ii, generating PopNum-1 individuals in the population by adopting a triangular fuzzy number mode;
step iii, obtaining a new population from the individuals in the population through a pointer network;
and iv, sequencing all individuals in the two populations in an ascending order according to the fitness value, and taking the front PopNum individuals as an initial population of the genetic algorithm.
Preferably, the step VI of further solving the problem by using a genetic algorithm mainly comprises the following steps:
step i, selecting PopNum parent individuals according to a roulette mode;
step ii, combining all the parent individuals pairwise, and generating child individuals by adopting an improved multipoint intersection mode;
step iii, performing variation operation on all the filial generation individuals in a single-point variation mode;
step iv, let t GA =t GA +1, and calculating fitness values of all individuals;
v, judging t GA <T GA If yes, jumping to the step i, otherwise executing the step vi;
and step vi, finishing the algorithm, and outputting the optimal scheduling scheme.
Preferably, the step ii of generating the child by using the improved multi-point intersection method mainly comprises the following steps:
step a, initializing a descendant to be empty, selecting Parent1 and Parent2 to be crossed, enabling the current inheritance Parent to be = Parent1, enabling gene replication starting position Index to be =0, and randomly generating the number num of replication genes, wherein the range of the num is 1-n;
b, starting from the place where the subscript in the parent is Index, searching num genes which are not in the filial generation to the left side and the right side, if the searched genes exist in the filial generation, ignoring the searched genes, if the searched subscript reaches the boundary, stopping searching in the current direction, and if all the genes are copied, directly turning to the step d;
c, copying the gene segments searched in the step to offspring according to the sequence of the gene segments in parent;
d, judging whether the filial generation copies all the genes in the parent, if so, skipping to the step f, otherwise, executing the step e;
e, enabling parent to be another parent, enabling Index to be the value of the last gene of the current child, regenerating the number num of the genes to be copied, and jumping to the step b;
and f, obtaining the crossed filial generation individuals.
The intelligent manufacturing has important significance for improving the comprehensive competitiveness of the Chinese manufacturing industry and realizing the strategic transformation from big to strong of the Chinese manufacturing industry. The development and the application of industrial big data are pushed, the function of the industrial big data in the manufacturing industry is fully played, and the method is one of the main directions of intelligent manufacturing. The invention takes the batch production scheduling problem of the different workpieces as a research object. A new optimization method is designed by utilizing related technologies such as deep reinforcement learning and the like. Compared with the prior art, the invention has the beneficial effects that:
1. aiming at the problem of batch scheduling of different workpieces, the invention takes the maximum completion time as a target, trains the pointer network model through an operator-critic algorithm, and can obtain a high-quality scheduling strategy in a short time by using the trained pointer network model.
2. The pointer network model obtained by training in the invention has better generalization capability, namely the pointer network model trained under the small-scale problem can also effectively solve the large-scale problem. Therefore, in the actual production process, when the current preparation time is limited, the time spent in the training process can be reduced by training the pointer network model under the small-scale problem. And when the preparation time is sufficient, the optimization performance of the model can be further improved by fully training the pointer network model.
3. Compared with the prior heuristic rule, the pointer network has the advantages that although the solving time is equivalent, the solving effect is superior to the prior heuristic rule. Therefore, for a production scenario with a scheduling policy with high real-time performance, the pointer network trained in the step III above may be directly adopted to obtain a scheduling scheme.
4. The invention provides a novel cross mode in the cross operation of the genetic algorithm, and the optimization capability of the genetic algorithm can be utilized on the basis of the scheduling scheme obtained by the pointer network to further improve the performance of the scheme.
Drawings
FIG. 1 is a flow chart of pointer network model training.
FIG. 2 is a flow chart of the PTR-GA optimization algorithm.
Fig. 3 is a schematic diagram of an improved crossover mode of operation.
FIG. 4 is a graph comparing solving performance of a pointer network and a BFLPT heuristic algorithm.
FIG. 5 is a pointer network generalization capability diagram.
FIG. 6 is a comparison graph of solving performance of PTR-GA and GA.
Detailed Description
To more clearly illustrate the objects, aspects and advantages of the present invention, the present invention will be described in further detail below with reference to the accompanying drawings, in which only some specific embodiments are shown.
The application provides a batch scheduling optimization method based on deep reinforcement learning and genetic algorithm aiming at workpiece sequences with different sizes and different processing times and aiming at minimizing total manufacturing time span, and the simplified problems are specifically described as follows:
(1) Workpiece set J = {1,2, …, n }, wherein the machining time of workpiece J is p j The size of the work being s j
(2) The batch processor has a machine capacity of C, workpieces with a workpiece set J are processed in batches, and the total size of the workpieces in all batches to be processed is not larger than C.
(3) The set of batches to be processed is K, wherein the processing time of the batch K is P k ,P k Representing the maximum processing time for all workpieces in lot k.
(4) The manufacturing span is the sum of the processing times of all processing batches.
As shown in fig. 1,2 and 3, based on the introduction of the above problem, the batch scheduling optimization method based on deep reinforcement learning and genetic algorithm provided by this embodiment includes the following steps:
step I, establishing a mathematical model of a batch scheduling problem of different workpieces;
the differential workpiece lot scheduling problem is defined as follows, a workpiece set J = {1,2, …, n }, wherein the processing time of a workpiece J is p j The size of the workpiece is s j (ii) a The volume of the batch processor is C, and the batch processor can simultaneously process a plurality of workpieces on the premise of meeting the volume constraint; the set of batches to be processed is K, wherein the processing time of the batch K is P k Equal to the maximum value of the workpiece processing time in lot k. X jk Representing a decision variable, X if workpiece j is in lot k jk =1, otherwise, X jk =0。Y k Representing a decision variable, if a batch k is established, then Y k =1, otherwise, Y k =0。
According to the above definition, the batch scheduling problem for workpieces with different sizes on a single machine can be established as the following mathematical model:
an objective function:
Figure GDA0002902959180000071
constraint conditions are as follows:
Figure GDA0002902959180000072
Figure GDA0002902959180000073
Figure GDA0002902959180000074
Figure GDA0002902959180000075
Figure GDA0002902959180000076
Figure GDA0002902959180000077
equation (1) represents that the optimization goal of the model is to minimize the manufacturing span; formula (2) indicates that one workpiece can be limited to only one batch; equation (3) indicates that the total size of the workpieces to be machined in the batch cannot exceed the machine capacity of the batch processor; the formula (4) represents that the processing time of the batch is not less than the processing time of the workpieces to be processed in the current batch; equations (5) to (7) are fundamental constraints of the problem.
And step II, establishing a strategy model of the problem by adopting a pointer network, wherein the specific construction mode of the pointer network is as follows:
the pointer network model is defined as follows, n represents the length of the encoder and decoder, and X = { X = { (X) } 1 ,x 2 ,…,x n Denotes a coded input workpiece information sequence in which x is input arbitrarily j All have x j =[s j ,p j ] T ,s j And p j Respectively showing the size and the processing time of the jth workpiece. e = { e = 1 ,e 2 ,…,e n Denotes the encoder's hidden layer state sequence, d = { d = } 1 ,d 2 ,…,d n Denotes the implicit layer state sequence of the decoder, y = { y = } 1 ,y 2 ,…,y n Denotes the final output sequence of the pointer network.
I, constructing an encoding layer network of the pointer network, wherein the encoder network consists of a full-connection network layer and an RNN with an LSTM module;
step ii, constructing a decoding layer network of the pointer network, wherein the decoder network is composed of an RNN with an LSTM module;
and step iii, introducing an attention mechanism for selecting and sequencing the workpieces in the output sequence by the pointer network, wherein when the t-th workpiece is added to the output sequence, the selected probability of the remaining workpieces is calculated as follows:
Figure GDA0002902959180000081
A(e,d t ;W 1 ,W 2 ,v)=softmax(u t ) (9)
and III, training a pointer network model by using an operator-critic algorithm, wherein the specific training steps of the algorithm are as follows:
step i, an operator network model adopted by the problem;
and (3) the model of the Actor network is the pointer network model established in the step (II).
Step ii, establishing a critic network model of the problem;
the structure of the criticic network consists of an encoder (RNN with LSTM module), an LSTM processing module and a decoder of a 2-layer ReLU fully-connected neural network.
Step iii, defining and initializing sample number B in one training, and trainingTotal number of samples D, number of iterations T PTR = D/B, actor network parameter θ a Critic network parameter θ c And training times E, wherein the initial training times epoch =0 and the initial iteration times t PTR =0, number of trained batch samples i =0;
step iv, the workpiece information sequence x in the current batch of samples is processed i Obtaining an output sequence y via an actor network i
V, sequencing the workpiece information x in the current batch of samples i Obtaining a corresponding baseline value b through a critic network i
Step vi, letting i = i +1, judging whether i < B is true, if so, skipping to execute step iv, otherwise, skipping to step vii;
step vii, let i =0, and solve the loss value of the operator network by using the monte carlo sampling approximation reinforcement learning algorithm, the calculation formula is as follows:
Figure GDA0002902959180000082
step viii, using the mean square error as the loss value of the criticic network, and calculating the formula as follows:
Figure GDA0002902959180000083
step ix, optimizing parameters of the operator network and the critic network by using an Adam algorithm;
step x, let t PTR =t PTR +1, judgment t PTR <T PTR If yes, skipping to the step iv for execution, otherwise, executing the step xi;
step xi, order t PTR =0, epoch = epoch +1, judge epoch<E, judging whether the E is established, if so, executing the step iv, otherwise, jumping to the step xii;
step xii, obtaining a trained operator network model;
step IV, defining and initializing parameters of the genetic algorithm: seed of a plantGroup size PopNum, maximum number of iterations T GA And the number n of workpieces, the number t of iterations that have been completed at present GA =0;
And V, optimizing an initial population of the genetic algorithm by using the pointer network trained in the step III, wherein the generation mode of the initial population comprises the following steps:
step i, generating an individual in the population by adopting a real number coding mode and an LPT heuristic rule;
step ii, generating PopNum-1 individuals in the population by adopting a triangular fuzzy number mode;
step iii, obtaining a new population from the individuals in the population through a pointer network;
and iv, sequencing all the individuals in the two populations in an ascending order according to the fitness value, and taking the first PopNum individuals as an initial population of the genetic algorithm.
Step VI, solving the problem by adopting a genetic algorithm, wherein the specific solving steps are as follows:
step i, selecting PopNum parent individuals according to a roulette mode;
step ii, combining all parent individuals pairwise, and generating child individuals by adopting an improved multipoint intersection mode, wherein the improved multipoint intersection mode comprises the following specific contents:
step a, initializing a child to be empty, selecting Parent1 and Parent2 to be crossed, enabling current genetic Parent = Parent1, gene replication initial position Index =0 and randomly generating number num of replicated genes, wherein the range of num is 1-n;
b, starting from the position where the subscript in the parent is Index, searching num genes which are not in filial generation to the left side and the right side, if the searched genes exist in the filial generation, ignoring the searched genes, if the searched subscript reaches the boundary, stopping searching in the current direction, and if all the genes are copied, directly turning to the step d;
c, copying the gene segments searched in the step to offspring according to the sequence of the gene segments in parent;
d, judging whether the filial generation copies all the genes in the parent, if so, skipping to the step f, otherwise, executing the step e;
e, enabling parent to be another parent, enabling Index to be the value of the last gene of the current child, regenerating the number num of the genes to be copied, and jumping to the step b;
and f, obtaining the crossed filial generation individuals.
Step iii, performing variation operation on all the filial generation individuals in a single-point variation mode;
step iv, let t GA =t GA +1, and calculating individual fitness value;
step v, judgment of t GA <T GA If yes, jumping to the step i, otherwise executing the step vi;
and step vi, finishing the algorithm, and outputting the optimal scheduling scheme.
And step VII, using the optimal scheme obtained by the genetic algorithm as a production scheme for processing the workpiece by a batch processor.
Performance verification
To verify the performance of the batch scheduling optimization method based on deep reinforcement learning and genetic algorithm proposed in this embodiment, the algorithm is compared with GA algorithm (Damodaran P, manjeshwar P K, srihari K. Minimizing mask on a batch-processing mask with non-intermediate joint using genetic algorithms [ J ]. International Journal of Production Economics,2006,103 (2): 882-891.) and BFLPT heuristic algorithm (Dupont L, ghazini F J. Minimizing mask on a batch processing mask with non-intermediate joint [ J ]. Journal of processing system [ J ]. 8978): 8978-431.
To verify the performance of the algorithm, different numbers of workpieces n, sizes of workpieces s, are used j And a workpiece machining time p j A series of problem instances are randomly generated for the test algorithm. In this embodiment, four sets of workpiece numbers n = {10,20,50,100}; dimension s of work j Is set as [4,8]The random integers are uniformly distributed; machining time p of workpiece j Is set as [1,20]Is also uniformly distributed. Number of each workpiece for accuracy of test dataRandomly generating 1000 examples for comparing the pointer network with the BFLPT, and randomly generating 100 examples for comparing a genetic algorithm (PTR-GA) for optimizing an initial population by using the pointer network with an original Genetic Algorithm (GA) in each workpiece quantity in consideration of time consumed by solving; the machine capacity is set to 10.
For objectivity of the results, the parameters of the PTR-GA and the parameter settings of the GA used for comparison are kept consistent with the original document, and the learning parameters set when training the pointer network are as follows: the number of batch samples B =128, the total number of training set samples D =128000, the learning rate η of the Actor and critical network parameters =0.001 when the number of workpieces is 10 or 20, the learning rate η of the Actor and critical network parameters =0.0001 when the number of workpieces is 50 or 100, and the training time E is 40.
Under the condition of the example parameter setting, the solving performance of the pointer network and the BFLPT heuristic algorithm is compared with that of the graph shown in FIG. 4, the generalization capability of the pointer network on the scale of other workpieces is shown in FIG. 5, and the solving performance of the PTR-GA and GA is compared with that of the graph shown in FIG. 6. Fig. 4 mainly shows that in 1000 calculation examples of different numbers of workpieces, the completion time of the pointer network to solve the problem is compared with the advantages and disadvantages of the results obtained by the BFLPT heuristic algorithm, and it can be seen that as the number of workpieces increases, the number of completion times obtained by the pointer network is gradually increased over the number of BFLPT heuristic algorithms. Under the condition of small scale, the pointer network is less in calculation proportion than the BFLPT heuristic algorithm because the BFLPT heuristic algorithm can also solve most of optimal solutions. Therefore, the pointer network has better performance than the BFLPT heuristic algorithm under the large-scale condition, and the solving performance of the pointer network is slightly better than the BFLPT heuristic algorithm under the small-scale condition, but the solving time is slightly higher than the BFLPT heuristic algorithm. Fig. 5 mainly shows the generalization ability of the pointer network model trained on small-scale examples on other workpiece scales, where the meaning of the ordinate is the difference between the average value of 1000 examples solved by the BFLPT heuristic algorithm and the average value of 1000 examples solved by the pointer network model, and the abscissa is the number of workpieces. It can be seen that even in the pointer network trained by small-scale calculation, along with the increase of the number of workpieces, the trend that the solving completion time is superior to the BFLPT heuristic algorithm is more obvious. Therefore, the pointer network in the invention has better generalization capability. Fig. 6 mainly shows the comparison of the solving performance of the PTR-GA and the GA when the number of workpieces is 100, wherein the meaning represented by the ordinate is the average value of the completion time of each algorithm for solving 100 sets of examples, and the abscissa represents the iteration number of the algorithm.

Claims (3)

1. The batch scheduling optimization method based on the deep reinforcement learning and the genetic algorithm is characterized by comprising the following steps of: the method comprises the following steps:
step I, establishing a mathematical model of a batch scheduling problem of different workpieces;
the differential workpiece lot scheduling problem is defined as follows, a workpiece set J = {1,2, …, n }, wherein the processing time of a workpiece J is p j The size of the workpiece is s j (ii) a The volume of the batch processor is C, and the batch processor can simultaneously process a plurality of workpieces on the premise of meeting the volume constraint; the set of batches to be processed is K, wherein the processing time of the batch K is P k Equal to the maximum value of the workpiece processing time in batch k; x jk Representing a decision variable, X if workpiece j is in lot k jk =1, otherwise, X jk =0;Y k Representing a decision variable, if a batch k is established, Y k =1, otherwise, Y k =0;
According to the above definition, the batch scheduling problem for workpieces with different sizes on a single machine can be established as the following mathematical model:
an objective function:
Figure FDA0003817445040000011
constraint conditions are as follows:
Figure FDA0003817445040000012
Figure FDA0003817445040000013
Figure FDA0003817445040000014
Figure FDA0003817445040000015
Figure FDA0003817445040000016
Figure FDA0003817445040000017
step II, establishing a strategy model of the problem by adopting a pointer network;
step III, training a pointer network model by using an operator-critic algorithm;
step IV, defining and initializing parameters of the genetic algorithm: population size PopNum, maximum number of iterations T GA And the number n of workpieces, the number t of iterations that have been completed at present GA =0;
V, optimizing an initial population of the genetic algorithm by using the pointer network trained in the step III;
step VI, solving the problem by adopting a genetic algorithm;
step VII, using the optimal scheme obtained by the genetic algorithm as a production scheme for processing the workpiece by a batch processor;
the initial population using the trained pointer network optimized genetic algorithm described in step V mainly comprises the following steps:
step i, generating an individual in the population by adopting a real number coding mode and an LPT heuristic rule;
step ii, generating PopNum-1 individuals in the population by adopting a triangular fuzzy number mode;
step iii, obtaining a new population from the individuals in the population through a pointer network;
iv, sequencing all individuals in the two populations in an ascending order according to the fitness value, and taking the front PopNum individuals as an initial population of the genetic algorithm;
step VI, further solving the problem by adopting a genetic algorithm mainly comprises the following steps:
step i, selecting PopNum parent individuals according to a roulette mode;
step ii, combining all the parent individuals pairwise, and generating child individuals by adopting an improved multipoint intersection mode;
step iii, performing variation operation on all the filial generation individuals in a single-point variation mode;
step iv, let t GA =t GA +1, and calculating the fitness value of all the filial generation individuals;
step v, judgment of t GA <T GA If yes, jumping to the step i, otherwise executing the step vi;
step vi, finishing the algorithm, and outputting an optimal scheduling scheme;
step ii in step VI, generating offspring in an improved multipoint intersection manner, mainly includes the following steps:
step a, initializing a child to be empty, selecting Parent1 and Parent2 to be crossed, enabling current genetic Parent = Parent1, gene replication initial position Index =0 and randomly generating number num of replicated genes, wherein the range of num is 1-n;
b, starting from the place where the subscript in the parent is Index, searching num genes which are not in the filial generation to the left side and the right side, if the searched genes exist in the filial generation, ignoring the searched genes, if the searched subscript reaches the boundary, stopping searching in the current direction, and if all the genes are copied, directly turning to the step d;
c, copying the gene segments searched in the step into filial generations according to the sequence of the gene segments in parent;
d, judging whether the filial generation copies all the genes in the parent, if so, skipping to the step f, otherwise, executing the step e;
e, enabling parent to be another parent, enabling Index to be the value of the last gene of the current child, regenerating the number num of the genes to be copied, and jumping to the step b;
and f, obtaining the crossed filial generation individuals.
2. The batch scheduling optimization method based on deep reinforcement learning and genetic algorithm of claim 1, wherein: step II, establishing the strategy model of the problem by adopting the pointer network mainly comprises the following steps:
the pointer network model is defined as follows, n represents the length of the encoder and decoder, and X = { X = { (X) } 1 ,x 2 ,…,x n Denotes a coded input workpiece information sequence, where x is input arbitrarily j All have x j =[s j ,p j ] T ,s j And p j Respectively representing the size and the processing time of the jth workpiece; e = { e = 1 ,e 2 ,…,e n Denotes the encoder's hidden layer state sequence, d = { d = } 1 ,d 2 ,…,d n Denotes the implicit layer state sequence of the decoder, y = { y = } 1 ,y 2 ,…,y n Denotes the final output sequence of the pointer network;
i, constructing an encoding layer network of the pointer network, wherein the encoder network consists of a full-connection network layer and an RNN with an LSTM module;
step ii, constructing a decoding layer network of the pointer network, wherein the decoder network is formed by an RNN with an LSTM module;
and step iii, introducing an attention mechanism for selecting and sequencing the workpieces in the output sequence by the pointer network, wherein when the t-th workpiece is added to the output sequence, the selected probability of the remaining workpieces is calculated as follows:
Figure FDA0003817445040000031
A(e,d t ;W 1 ,W 2 ,v)=softmax(u t ) (9)。
3. the batch scheduling optimization method based on deep reinforcement learning and genetic algorithm of claim 1, wherein: the pointer network model trained by the operator-critic algorithm in the step III mainly comprises the following steps:
step i, an actor network model adopted by the problem;
the Actor network adopts the pointer network model established in the step II;
step ii, establishing a critic network model of the problem;
the structure of the Critic network consists of an encoder, an LSTM processing module and a decoder of a 2-layer ReLU fully-connected neural network, wherein the encoder is an RNN with the LSTM module;
step iii, defining and initializing sample number B, total sample number D of training set and iteration number T in one training PTR = D/B, actor network parameter θ a Critic network parameter θ c And training times E, wherein the initial training times epoch =0 and the initial iteration times t PTR =0, number of trained batch samples i =0;
step iv, the workpiece information sequence x in the current batch of samples is processed i Obtaining an output sequence y via an actor network i
V, enabling the workpiece information sequence x in the current batch of samples i Obtaining a corresponding baseline value b through a critic network i
Step vi, letting i = i +1, judging whether i < B is true, if so, skipping to execute step iv, otherwise, skipping to step vii;
step vii, let i =0, and solve the loss value of the operator network by using the monte carlo sampling approximation reinforcement learning algorithm, the calculation formula is as follows:
Figure FDA0003817445040000041
step viii, using the mean square error as the loss value of the criticic network, and calculating the formula as follows:
Figure FDA0003817445040000042
step ix, optimizing parameters of the operator network and the critic network by using an Adam algorithm;
step x, let t PTR =t PTR +1, judgment t PTR <T PTR If yes, skipping to the step iv for execution, otherwise, executing the step xi;
step xi, order t PTR =0, epoch = epoch +1, judge epoch<E, judging whether the E is established, if so, executing the step iv, otherwise, jumping to the step xii;
and step xii, obtaining the trained actor network model.
CN202011373229.1A 2020-11-30 2020-11-30 Batch scheduling optimization method based on deep reinforcement learning and genetic algorithm Active CN112488315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011373229.1A CN112488315B (en) 2020-11-30 2020-11-30 Batch scheduling optimization method based on deep reinforcement learning and genetic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011373229.1A CN112488315B (en) 2020-11-30 2020-11-30 Batch scheduling optimization method based on deep reinforcement learning and genetic algorithm

Publications (2)

Publication Number Publication Date
CN112488315A CN112488315A (en) 2021-03-12
CN112488315B true CN112488315B (en) 2022-11-04

Family

ID=74937276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011373229.1A Active CN112488315B (en) 2020-11-30 2020-11-30 Batch scheduling optimization method based on deep reinforcement learning and genetic algorithm

Country Status (1)

Country Link
CN (1) CN112488315B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112171B (en) * 2021-04-22 2022-10-11 合肥工业大学 Batch scheduling method based on roulette and genetic algorithm
CN113191548A (en) * 2021-04-29 2021-07-30 南京航空航天大学 Production scheduling method
CN113448687B (en) * 2021-06-24 2022-07-26 山东大学 Hyper-heuristic task scheduling method and system based on reinforcement learning in cloud environment
CN113515097B (en) * 2021-07-23 2022-08-19 合肥工业大学 Two-target single machine batch scheduling method based on deep reinforcement learning
CN113743784A (en) * 2021-09-06 2021-12-03 山东大学 Production time sequence table intelligent generation method based on deep reinforcement learning
CN114186749B (en) * 2021-12-16 2022-06-28 暨南大学 Flexible workshop scheduling method and model based on reinforcement learning and genetic algorithm
CN115471142B (en) * 2022-11-02 2023-04-07 武汉理工大学 Intelligent port tug operation scheduling method based on man-machine cooperation
CN117709683A (en) * 2024-02-02 2024-03-15 合肥喆塔科技有限公司 Semiconductor wafer dynamic scheduling method and equipment based on real-time manufacturing data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390195A (en) * 2013-05-28 2013-11-13 重庆大学 Machine workshop task scheduling energy-saving optimization system based on reinforcement learning
CN103870647A (en) * 2014-03-14 2014-06-18 西安工业大学 Operation workshop scheduling modeling method based on genetic algorithm
CN109270904A (en) * 2018-10-22 2019-01-25 中车青岛四方机车车辆股份有限公司 A kind of flexible job shop batch dynamic dispatching optimization method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040044633A1 (en) * 2002-08-29 2004-03-04 Chen Thomas W. System and method for solving an optimization problem using a neural-network-based genetic algorithm technique
CN107301473B (en) * 2017-06-12 2018-06-15 合肥工业大学 Similar parallel machine based on improved adaptive GA-IAGA batch dispatching method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390195A (en) * 2013-05-28 2013-11-13 重庆大学 Machine workshop task scheduling energy-saving optimization system based on reinforcement learning
CN103870647A (en) * 2014-03-14 2014-06-18 西安工业大学 Operation workshop scheduling modeling method based on genetic algorithm
CN109270904A (en) * 2018-10-22 2019-01-25 中车青岛四方机车车辆股份有限公司 A kind of flexible job shop batch dynamic dispatching optimization method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning";YUANDOU WANG et al.;《IEEE Access》;20190329;第1-9页 *
"基于智能强化学习的遗传算法研究";叶婉秋;《电脑学习》;20100430;第1-3页 *

Also Published As

Publication number Publication date
CN112488315A (en) 2021-03-12

Similar Documents

Publication Publication Date Title
CN112488315B (en) Batch scheduling optimization method based on deep reinforcement learning and genetic algorithm
CN108694502B (en) Self-adaptive scheduling method for robot manufacturing unit based on XGboost algorithm
CN114186749B (en) Flexible workshop scheduling method and model based on reinforcement learning and genetic algorithm
CN107045569B (en) Gear reducer optimization design method based on clustering multi-target distribution estimation algorithm
CN113034026A (en) Q-learning and GA based multi-target flexible job shop scheduling self-learning method
CN107506865A (en) A kind of load forecasting method and system based on LSSVM optimizations
CN110598929B (en) Wind power nonparametric probability interval ultrashort term prediction method
CN112947300A (en) Virtual measuring method, system, medium and equipment for processing quality
CN116560313A (en) Genetic algorithm optimization scheduling method for multi-objective flexible job shop problem
Su et al. Many‐objective optimization by using an immune algorithm
Rad et al. GP-RVM: Genetic programing-based symbolic regression using relevance vector machine
CN113971517A (en) GA-LM-BP neural network-based water quality evaluation method
CN104732067A (en) Industrial process modeling forecasting method oriented at flow object
Fu et al. An improved adaptive genetic algorithm for solving 3-SAT problems based on effective restart and greedy strategy
CN114021934A (en) Method for solving workshop energy-saving scheduling problem based on improved SPEA2
Chai et al. Symmetric uncertainty based decomposition multi-objective immune algorithm for feature selection
CN113762591A (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM counterstudy
CN113705098A (en) Air duct heater modeling method based on PCA and GA-BP network
Zhu et al. An Efficient Hybrid Feature Selection Method Using the Artificial Immune Algorithm for High‐Dimensional Data
CN116151303B (en) Method for optimizing combustion chamber design by accelerating multi-objective optimization algorithm
CN117291069A (en) LSTM sewage water quality prediction method based on improved DE and attention mechanism
CN117151277A (en) Two-dimensional irregular layout method based on network migration and hybrid positioning and application
CN116613740A (en) Intelligent load prediction method based on transform and TCN combined model
CN114217580B (en) Functional fiber production scheduling method based on improved differential evolution algorithm
CN110705844A (en) Robust optimization method of job shop scheduling scheme based on non-forced idle time

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant