CN114912826A - Flexible job shop scheduling method based on multilayer deep reinforcement learning - Google Patents
Flexible job shop scheduling method based on multilayer deep reinforcement learning Download PDFInfo
- Publication number
- CN114912826A CN114912826A CN202210603831.2A CN202210603831A CN114912826A CN 114912826 A CN114912826 A CN 114912826A CN 202210603831 A CN202210603831 A CN 202210603831A CN 114912826 A CN114912826 A CN 114912826A
- Authority
- CN
- China
- Prior art keywords
- graph
- model
- reinforcement learning
- decision
- deep reinforcement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 184
- 230000002787 reinforcement Effects 0.000 title claims abstract description 50
- 230000008569 process Effects 0.000 claims abstract description 105
- 238000012549 training Methods 0.000 claims abstract description 37
- 238000013528 artificial neural network Methods 0.000 claims abstract description 24
- 238000005457 optimization Methods 0.000 claims abstract description 9
- 238000013135 deep learning Methods 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims description 32
- 238000000605 extraction Methods 0.000 claims description 23
- 230000009471 action Effects 0.000 claims description 22
- 230000000875 corresponding effect Effects 0.000 claims description 16
- 238000006243 chemical reaction Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000010187 selection method Methods 0.000 claims description 13
- 238000012512 characterization method Methods 0.000 claims description 9
- 238000003754 machining Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 7
- 238000003780 insertion Methods 0.000 claims description 6
- 230000037431 insertion Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 230000008901 benefit Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 2
- 238000009826 distribution Methods 0.000 claims description 2
- 238000005728 strengthening Methods 0.000 claims description 2
- 238000010801 machine learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical group OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Educational Administration (AREA)
- Manufacturing & Machinery (AREA)
- Development Economics (AREA)
- Quality & Reliability (AREA)
- Primary Health Care (AREA)
- Operations Research (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- General Factory Administration (AREA)
Abstract
The invention discloses a flexible job shop scheduling method based on multilayer deep reinforcement learning, which comprises the following steps: p1 deep reinforcement learning model part: the deep learning adopts a graph neural network, and the graph is extracted as the input of the graph neural network to obtain the characteristics of the graph, so that the characteristic representation of the problem is effectively obtained. The reinforcement learning is based on a Markov decision model, the flexible workshop scheduling problem obtains a decision scheme through the repeated decision process of the model, and the goal is optimized in a mode of maximizing the reward value. P2 training algorithm part: the method comprises the steps of training a model by adopting an operator _ critic algorithm, distributing tasks collected by samples to a plurality of sub-threads for carrying out decision making and sample generation independently by each sub-thread, and simultaneously deciding a plurality of problems by each sub-thread to generate a plurality of decision tracks, so that an unrelated high-quality sample optimization model is rapidly generated, and a final model is rapidly obtained.
Description
Technical Field
The invention relates to the field of combination optimization, in particular to a flexible job shop scheduling method based on multilayer deep reinforcement learning.
Background
The flexible job shop scheduling problem, in which the same workpiece may have multiple processing paths and the processing machines of the same process may have multiple sets, is an important extension of the shop scheduling problem and is considered as an NP problem. This greatly increases the complexity of the problem. How to find out the optimal solution for the flexible job shop scheduling problem in the shortest time has important significance in the combination optimization problem. At present, the main methods for solving the flexible job shop scheduling problem are scheduling rules and meta-heuristic algorithms. By prioritizing processes and machines based on the scheduling rules of the flexible job shop scheduling problem, solutions can be obtained quickly.
However, the scheduling results obtained using the scheduling rules are not ideal, and the scheduling rules are not applicable to a diverse processing environment. Compared with a scheduling rule, the meta-heuristic algorithm finds the optimal solution through a plurality of rounds of iteration, can obtain a good result, but has long calculation time, does not have generalization performance, and needs to be initialized and iterated again when the problem changes. Machine learning is applied to many fields as a new method and achieves good results, so that the application of the machine learning method to the flexible workshop scheduling problem is a new research direction. The deep reinforcement learning is a research branch of machine learning, a model of the deep reinforcement learning can be directly used for problem decision after a large amount of training, and a flexible workshop scheduling problem can also be expressed as a decision problem. The design of the deep reinforcement learning model is an important part of the method.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a flexible job shop scheduling method based on multilayer deep reinforcement learning aiming at the defects in the prior art. The flexible workshop scheduling problem is represented by a disjunctive graph, the graph neural network is used for extracting features, states, actions and rewards corresponding to the problem are designed to establish a Markov model, a layered decision model is designed to divide the flexible workshop scheduling problem into two sub-problems of procedure sequencing and machine selection for solving, and the asynchronous dominant operator _ critical algorithm can be used for training the model quickly and effectively.
The technical scheme adopted by the invention for solving the technical problem is as follows:
the invention provides a flexible job shop scheduling method based on multilayer deep reinforcement learning, which is characterized in that a deep reinforcement learning model is established for a flexible shop scheduling problem, the deep reinforcement learning model is trained, the flexible shop scheduling problem is solved through the trained deep reinforcement learning model, and an optimal scheduling scheme is output; the method comprises the following two parts:
p1 deep reinforcement learning model part: the deep reinforcement learning model is used for deciding the flexible workshop scheduling problem, expressing the flexible workshop scheduling problem as an extraction graph, and solving the flexible workshop scheduling problem as an orientation process of extraction arcs; the deep learning adopts a graph neural network, and the graph is extracted as the input of the graph neural network to obtain the characteristics of the graph, so that the characteristic representation of the problem is effectively obtained; the reinforcement learning is based on a Markov decision model, a state, an action and a reward corresponding to the problem are designed, and the layered decision model makes corresponding actions according to the state characteristics; the flexible workshop scheduling problem obtains a decision scheme through a repeated decision process of the model, and the target is optimized in a mode of maximizing the reward value;
p2 training algorithm part: the method comprises the steps of training a deep reinforcement learning model by adopting a multithreading and multi-track asynchronous dominant operator _ critic algorithm, distributing tasks collected by samples to a plurality of sub-threads for carrying out decision making and sample generation independently by each sub-thread, and simultaneously deciding a plurality of problems by each sub-thread to generate a plurality of decision tracks, so that an unrelated high-quality sample optimization model is rapidly generated, a final model is rapidly obtained, and the trained model supports rapid solving of flexible workshop scheduling problems and generalization on problems of different scales; and outputting an optimal scheduling scheme of the flexible workshop through the trained deep reinforcement learning model, and handing the optimal scheduling scheme to the flexible workshop for execution.
Further, a specific method for obtaining the characteristics of the disjunctive graph in the P1 deep reinforcement learning model part of the present invention is as follows:
step 1.1, obtaining an analytic Graph representation Graph according to a flexible workshop scheduling problem;
step 1.2, determining node information according to disjunct arcs in the disjunct graph;
and step 1.3, obtaining the Feature of the extracted graph by taking the extracted graph as the input of a graph neural network.
Further, the extraction diagram in step 1.1 of the present invention is defined as follows:
the disjunctive graph of the flexible plant scheduling problem is described as: given graph G ═ O, C, D, where O is the set of all process nodes O and two virtual process nodes S and E, which represent the start and end of the schedule, respectively; c is a connecting arc set<v,w>L V, w belongs to V }, and the two processes represented by V and w belong to the same workpiece; for a compound belonging to C<v,w>The expression that the node v to the node w have a connecting arc which is a one-way arc and has s for ensuring the sequential constraint of the processing sequence of each procedure on the same workpiece tv <s tw ,s tv The machining start time of the process represented by the node v; d is an extraction arc set, D is a last<v,w>L V, w belongs to V, and each procedure of extracting arcs which are bidirectional arcs to represent connected nodes V and w can be processed on the same machine; the final goal is to determine the directions of all disjunct arcs and simultaneously make the maximum completion time shortest; the number of working procedures of each workpiece in the flexible workshop scheduling problem may be different, and when the analysis graph is converted, if the number of the working procedures of the workpiece is less than the maximum number of the working procedures, a '0' working procedure node is added at the tail of the workpiece to ensure the uniformity of the graph structure, the '0' working procedure running time is not counted, and the workpiece can be processed on all machines.
Further, the method for calculating the node information in step 1.2 of the present invention specifically includes:
step 1.2.1, randomly selecting the execution time of each procedure on an executable machine as the estimated execution time of each procedure;
and 1.2.2, not considering the unoriented disjunctive arc constraint, sequentially processing each procedure according to the connection arc constraint relation and the oriented disjunctive arc relation, and calculating the completion time of each procedure as the node information.
Further, the specific method for calculating the neural network characteristics of the graph in the step 1.3 of the present invention is as follows:
step 1.3.1, inputting node information and an arc relation into a neural network of a kth-level graph to calculate node representation, wherein k is 1; the node characterization calculation formula is as follows:
adopting a graph isomorphic network structure, executing K times of updating iterations to calculate p-dimensional embedding of each node V, wherein V belongs to V, and the updating of the K-th layer is expressed as:
wherein,is a characteristic representation of node v at layer k; MLP is a multi-layer linear model, N (v) Is the set of all nodes connected to node v;
step 1.3.2, pooling node characterization graphs to obtain graph characterization, and adopting average pooling, wherein k is k + 1;
step 1.3.3, circularly executing step 1.3.1 and step 1.3.2 for K times;
and 1.3.4, performing linear transformation on the output layer to obtain the output characteristic Feature by the representation of the final graph.
Further, the decision making process in the P1 deep reinforcement learning model part of the present invention is as follows:
step 2.1, calculating the probability of the selection process according to the obtained Feature as the input of a decision model;
2.2, selecting a process o with the maximum probability according to the obtained probability greedy;
2.3, selecting the machine m most suitable for the selected process according to the scheduling rule;
step 2.4, the selected combination of the process o and the machine m is used as an action (o, m) in the current state, the state conversion is executed to obtain a new state, the extraction graph is updated, and the new and old states and the reward value are saved as samples;
step 2.5, repeatedly executing the step 1.2 to the step 2.4 until all the process selections are finished;
step 2.6, obtaining a final decision scheme through repeated decision of the model, and strengthening learning based on a Markov decision model; the Markov decision model is as follows:
a State State: the method comprises the steps that a corresponding graph structure is obtained through an input test set or a training set, machining processes of workpieces serve as nodes of the graph, machining sequence relations of the processes are arcs, node information comprises completion time of the processes, the arcs in the graph are directed arcs, arc information comprises machining sequences of the processes on a machine, namely two process nodes connected by the arcs execute a second process pointed by the arcs after a first process is completed in a decision scheme. The status also includes basic problem information, including whether each workpiece can be processed on different machines and the time corresponding to the processing;
and (4) Action: defining a primary action as a process o of determining a certain workpiece and allocating a machine m for the workpiece, wherein the process o is represented as (o, m), the state is used as the input of a deep reinforcement learning model, the characteristics of the process are extracted through a graph neural network and then input into a decision model to obtain process selection probability distribution, the workpiece is selected by the obtained probability greedy and the process is determined, and a proper machine is selected for the workpiece by a relevant scheduling rule;
and (3) state conversion: updating the state according to the selected action, updating the arc relation and the node information of the graph according to the working procedure and the machine corresponding to the action, namely adding or modifying the arc in the directed graph, and updating the completion time of the working procedure to be used as a new state;
reward: and taking the difference of the maximum completion time of the corresponding schemes of the analysis graphs before and after one state conversion as the timely reward of the decision, and summing the instant rewards of each decision as the accumulated reward according to the estimated processing time.
Further, the specific calculation method of the scheduling rule in step 2.3 of the present invention is as follows:
step 2.3.1, determining its set of executable machines S by the selection process o m ;
Step 2.3.2, obtaining a value f1 obtained by normalizing the time of each machine processing selected procedure in the set as an index 1;
step 2.3.3, calculating a value f2 normalized by the number of the processed procedures of each machine in the set as an index 2;
step 2.3.4, adding the index 1 and the index 2 to obtain a final index (f1+ f 2);
and 2.3.5, determining a machine from the set according to the final index, wherein the selected machine is a machine with short processing time and a small number of processed procedures.
Further, the state transition process in the markov decision model of the present invention specifically includes:
step 3.1, judging the processing feasibility of the selection process on the selection machine according to the state and the action;
step 3.2, determining and selecting a machined process sequence of a machine according to the analysis chart;
3.3, determining the processing time of the selection procedure on the selection machine;
judging whether the selection process can be inserted into a preset idle time period of the selection machine, if so, executing the step 3.4, and if not, executing the step 3.7;
step 3.4, calculating the earliest machinable time of the selection procedure and the idle time section of the selection machine, and determining the insertion position of the selection procedure in the machined procedure sequence
Step 3.5, modifying the arc relation of the extraction graph according to the insertion position, and deleting other extraction arcs connected with the selected process nodes;
step 3.6, updating the node information and finishing the state conversion;
step 3.7, determining the total start-up time, and adding the process to the end of the processed process sequence;
step 3.8, determining the disjuncting arc direction of the selection procedure on the selection machine, and deleting other disjuncting arcs connected with the selection procedure nodes;
and 3.9, updating the node information and finishing the state conversion.
Further, the calculation process of the instant reward and the cumulative reward in the markov decision model of the present invention is as follows:
step 4.1, calculating the maximum completion time T of the old state s ;
Step 4.2, calculating the maximum completion time T of the new state s+1 ;
Step 4.3, calculate the instant prize value T s -T s+1 ;
The calculation formula of the accumulated award is as follows:
r t =TS t -Ts t+1
wherein R is the cumulative prize value, T s1 For maximum completion time, T, corresponding to the initial state send Maximum completion time for the final project due to T s1 The fixed value determined according to the problem information does not change with the decision, so maximizing the prize value is equivalent to minimizing the maximum completion time of the final scheme.
Further, the training process in the P2 training algorithm part of the present invention specifically includes:
step 5.1, generating a main thread and T sub-threads and initializing a training round number Count to be 0;
step 5.2, copying parameters of the main thread model to the sub thread model;
step 5.3, starting each sub thread, and initializing the number of training rounds;
step 5.4, generating U problems by each sub thread;
5.5, solving a flexible workshop scheduling problem through a deep reinforcement learning model by the sub-thread and generating a sample;
step 5.6, completing sample collection, and optimizing the model parameters of the main thread by using a gradient descent strategy, wherein the Count is equal to Count + 1; the optimization formula is as follows:
where pi is the operator network, i.e. the graph neural network and the decision network, theta is the parameter thereof, and v is the critic network, used for and rewardingThe values are collectively estimated to calculate a merit function,is a parameter thereof;
step 5.7, judging whether the maximum training round number T is reached c And if the parameter is not reached, executing the step 5.4 to the step 5.6, if the parameter is reached, finishing the training, and storing the main thread model parameters.
The invention has the following beneficial effects:
1. deep learning extracts problem features, can take into account internal connections of problems and adapt to changes in different manufacturing environments.
The trained model can obtain a better result in a short time, and the model can be used for solving flexible workshop scheduling problems of different scales without retraining, so that the method has strong generalization.
2. The flexible workshop scheduling problem is decomposed into two sub-problems of procedure ordering and machine selection, the two sub-problems are solved by using a layered structure, the complexity of the problem is reduced, meanwhile, the calculation time is reduced, and the structural complexity of the overall model is also reduced by using a mode of cooperation of a neural network model and a scheduling rule.
3. The asynchronous dominant operator _ critical algorithm is used for training, the training time is greatly shortened by the multi-thread and multi-track training method, samples used for training are from different decision processes of a model for scheduling different flexible workshops at the same moment, and therefore each sample is irrelevant and effective.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a schematic diagram of a flexible job shop scheduling method based on multi-layer deep reinforcement learning according to the present invention;
FIG. 2 is a disjunctive graph model of a 3 × 3 flexible shop scheduling problem;
FIG. 3 is a flexible shop dispatch criteria test set MK 01.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The framework of the invention is a training framework of a multi-thread and multi-track based hierarchical deep reinforcement learning model as shown in FIG. 1. Taking a training process as an example, the method comprises the following specific steps:
as shown in fig. 1, the training method is based on an asynchronous dominance operator _ critic algorithm, a multithreading multi-track method is adopted to train a model, a main thread model is a model to be optimized, model parameters of sub-threads are copied from the main thread model, each sub-thread makes a decision on a plurality of problems at the same time to generate a plurality of decision tracks, and each decision track is obtained according to a depth reinforcement learning model.
The specific implementation steps of the deep learning extraction feature specific implementation process of the deep reinforcement model in P1 are as follows:
when the processing sequence and the processing machines are not determined in the corresponding process of the node, one machine is randomly selected from the processing machines of the node, the processing end time of the process on the selected machine is used as the estimated processing time, and the actual processing time is used as the node information after the processing sequence and the processing machines are determined for the process.
the specific implementation steps of the state feature calculation of the neural network of the graph described in P1 are as follows:
the node characterization calculation formula is as follows:
and 4, obtaining the Feature of the output state by the representation of the graph through multi-layer linear network transformation.
The state characteristics are output through a layered decision network, the action comprises a process and a machine, the process decision adopts a linear neural network, and the machine selects a scheduling rule.
The specific implementation steps of the hierarchical decision network output action in P1 are as follows:
In the above technical solution, the scheduling rule of the hierarchical decision network is calculated as follows:
The sample collection procedure described in P2 is as follows:
and 4, obtaining a final scheme.
In the above technical solution, the state transition specifically executed steps of sample collection are as follows:
step 7, determining the idle time T of the machine m In the selection step, the earliest starting time T o And T m Max (T) of o ,T m ) As a start-upTime, process step addition processed process step sequence M sec Ending;
step 8, determining the disjuncting arc direction of the selection procedure on the selection machine, and deleting other disjuncting arcs connected with the selection procedure nodes;
and 9, updating the node information and finishing the state conversion.
In the above technical solution, the calculation process of the instant prize collected by the sample is as follows:
The total training flow described in P2 is as follows:
and 7, repeating the steps 2 to 6 until the training Count reaches the maximum training round number Tc, and ending the sub-thread, wherein Tc is 10000.
In the above technical solution, the gradient descent method has the following optimization formula:
wherein pi is an actor network, namely a graph neural network and a decision network, theta is a parameter of the actor network, v is a critic network and is used for estimating and calculating the advantage function together with the reward value,is a parameter thereof;
the above technical solution describes the general framework of the present invention in a training process, and the following describes the proposed flexible workshop scheduling method based on hierarchical reinforcement learning by taking a solving process as an example, after model training is completed, through the process of solving MK 01. The method comprises the following specific steps:
and 6, judging whether the decision is finished or not, returning to the step 3 if t is less than 60, finishing the decision if t is 60, and outputting a solving scheme.
The maximum completion time of the result obtained by the above solving process is 52, and the specific actions are selected as follows:
(0,5),(30,0),(1,1),(18,1),(24,5),(36,2),(48,0),(42,0),(54,5),(6,2),(12,5),(7,2),(8,0),(13,4),(25,5),(9,5),(26,5),(2,4),(3,0),(10,1),(11,0),(4,3),(27,2),(28,4),(5,2),(14,2),(15,2),(16,1),(17,4),(19,3),(20,1),(21,2),(22,4),(23,0),(49,3),(50,5),(51,2),(52,4),(53,5),(31,4),(32,3),(33,0),(34,2),(35,3),(37,1),(38,3),(39,3),(40,3),(41,0),(29,0),(43,5),(44,3),(45,0),(46,5),(47,5),(55,5),(56,5),(57,0),(58,2),(59,3)。
wherein the first value of each action is a selected process, the first process from the first workpiece is process 0, the second process from the first workpiece is process 1, and so on until the last process from the last workpiece is process 59; the second value is the processing machine selected for the process.
According to the implementation case, the training algorithm in the technical scheme can quickly and efficiently perform model training and obtain the model suitable for solving the flexible workshop scheduling problem, the layered deep reinforcement learning model in the technical scheme can quickly solve the flexible workshop scheduling problem and can obtain a good optimization result, and the trained model can be directly used for solving the flexible workshop scheduling problem in different scales and has good generalization performance.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.
Claims (10)
1. A flexible job shop scheduling method based on multilayer deep reinforcement learning is characterized in that for a flexible shop scheduling problem, a deep reinforcement learning model is established and trained, the flexible shop scheduling problem is solved through the trained deep reinforcement learning model, and an optimal scheduling scheme is output; the method comprises the following two parts:
p1 deep reinforcement learning model part: the deep reinforcement learning model is used for deciding the flexible workshop scheduling problem, expressing the flexible workshop scheduling problem as an extraction graph, and solving the flexible workshop scheduling problem as an orientation process of extraction arcs; the deep learning adopts a graph neural network, and the graph is extracted as the input of the graph neural network to obtain the characteristics of the graph, so that the characteristic representation of the problem is effectively obtained; the reinforcement learning is based on a Markov decision model, a state, an action and a reward corresponding to the problem are designed, and the layered decision model makes corresponding actions according to the state characteristics; the flexible workshop scheduling problem obtains a decision scheme through a repeated decision process of the model, and the target is optimized in a mode of maximizing the reward value;
p2 training algorithm part: the method comprises the steps of training a deep reinforcement learning model by adopting a multithreading and multi-track asynchronous dominant operator _ critic algorithm, distributing tasks collected by samples to a plurality of sub-threads for carrying out decision making and sample generation independently by each sub-thread, and simultaneously deciding a plurality of problems by each sub-thread to generate a plurality of decision tracks, so that an unrelated high-quality sample optimization model is rapidly generated, a final model is rapidly obtained, and the trained model supports rapid solving of flexible workshop scheduling problems and generalization on problems of different scales; and outputting an optimal scheduling scheme of the flexible workshop through the trained deep reinforcement learning model, and handing the optimal scheduling scheme to the flexible workshop for execution.
2. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 1, wherein the specific method for obtaining the extraction map features in the P1 deep reinforcement learning model part is as follows:
step 1.1, obtaining an analytic Graph representation Graph according to a flexible workshop scheduling problem;
step 1.2, determining node information according to disjunct arcs in the disjunct graph;
and step 1.3, obtaining the Feature of the extracted graph by taking the extracted graph as the input of a graph neural network.
3. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 2, wherein the extraction map in the step 1.1 is defined as follows:
the disjunctive graph of the flexible plant scheduling problem is described as: given graph G ═ O, C, D, where O is the set of all process nodes O and two virtual process nodes S and E, which represent the start and end of the schedule, respectively; c is a connecting arc set<v,w>L V, w belongs to V, and the two processes represented by V and w belong to the same workpiece; for a compound belonging to C<v,w>The expression that the node v to the node w have a connecting arc which is a one-way arc and has s for ensuring the sequential constraint of the processing sequence of each procedure on the same workpiece tv <s tw ,s tv The machining start time of the process indicated by the node v; d is a disjuncting arc set, D ═ tone<v,w>L V, w belongs to V, each is bidirectionalThe arc extraction indicates that the connected node v and node w can be processed on the same machine; the final goal is to determine the directions of all disjunct arcs and simultaneously make the maximum completion time shortest; the number of working procedures of each workpiece in the flexible workshop scheduling problem may be different, and when the analysis graph is converted, if the number of the working procedures of the workpiece is less than the maximum number of the working procedures, a '0' working procedure node is added at the tail of the workpiece to ensure the uniformity of the graph structure, the '0' working procedure running time is not counted, and the workpiece can be processed on all machines.
4. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 2, wherein the calculation method of the node information in the step 1.2 is specifically as follows:
step 1.2.1, randomly selecting the execution time of each procedure on an executable machine as the estimated execution time of each procedure;
and 1.2.2, not considering the unoriented disjunctive arc constraint, sequentially processing each procedure according to the connection arc constraint relation and the oriented disjunctive arc relation, and calculating the completion time of each procedure as the node information.
5. The flexible job shop scheduling method based on multilayer deep reinforcement learning according to claim 2, wherein the specific method for calculating the neural network characteristics of the graph in the step 1.3 is as follows:
step 1.3.1, inputting node information and an arc relation into a neural network of a kth-level graph to calculate node representation, wherein k is 1; the node characterization calculation formula is as follows:
adopting a graph isomorphic network structure, executing K times of updating iterations to calculate p-dimensional embedding of each node V, wherein V belongs to V, and the updating of the K-th layer is expressed as:
wherein,is a characteristic representation of node v at layer k; MLP is a multi-layer linear model, N (v) Is the set of all nodes connected to node v;
step 1.3.2, pooling node characterization graphs to obtain the characterizations of the graphs, wherein average pooling is adopted, and k is k + 1;
step 1.3.3, step 1.3.1 and step 1.3.2 are executed in K times in a circulating manner;
and 1.3.4, performing linear transformation on the output layer to obtain the output characteristic Feature by the representation of the final graph.
6. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 2, wherein the decision making process in the P1 deep reinforcement learning model part is as follows:
step 2.1, calculating the probability of the selection process according to the obtained Feature as the input of a decision model;
2.2, selecting a process o with the maximum probability according to the obtained probability greedy;
2.3, selecting the machine m most suitable for the selected process according to the scheduling rule;
step 2.4, the selected combination of the process o and the machine m is used as an action (o, m) in the current state, the state conversion is executed to obtain a new state, the extraction graph is updated, and the new and old states and the reward value are saved as samples;
step 2.5, repeatedly executing the step 1.2 to the step 2.4 until all the process selections are finished;
step 2.6, obtaining a final decision scheme through repeated decision of the model, and strengthening learning based on a Markov decision model; the Markov decision model is as follows:
state: the method comprises the steps of obtaining a corresponding graph structure through an input test set or a training set, using machining processes of workpieces as nodes of a graph, wherein the machining sequence relation of the processes is an arc, node information comprises the completion time of the processes, the arc in the graph is a directed arc, the arc information comprises the machining sequence of the processes on a machine, namely, two process nodes connected by the arc execute a second process pointed by the arc after a first process is completed in a decision scheme. The status also includes basic problem information, including whether each workpiece can be processed on different machines and the time corresponding to the processing;
and (4) Action: defining a primary action as a process o of determining a certain workpiece and allocating a machine m for the workpiece, wherein the process o is represented as (o, m), the state is used as the input of a deep reinforcement learning model, the characteristics of the process are extracted through a graph neural network and then input into a decision model to obtain process selection probability distribution, the workpiece is selected by the obtained probability greedy and the process is determined, and a proper machine is selected for the workpiece by a relevant scheduling rule;
and (3) state conversion: updating the state according to the selected action, updating the arc relation and the node information of the graph according to the working procedure and the machine corresponding to the action, namely adding or modifying the arc in the directed graph, and updating the completion time of the working procedure to be used as a new state;
reward: and taking the difference of the maximum completion time of the corresponding schemes of the analysis graphs before and after one state conversion as the timely reward of the decision, and summing the instant rewards of each decision as the accumulated reward according to the estimated processing time.
7. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 6, wherein the specific calculation method of the scheduling rule in the step 2.3 is as follows:
step 2.3.1, determining its set of executable machines S by the selection process o m ;
Step 2.3.2, obtaining a value f1 as an index 1 after normalizing the time of each machine processing selected procedure in the set;
step 2.3.3, calculating a value f2 normalized by the number of the processed procedures of each machine in the set as an index 2;
step 2.3.4, adding the index 1 and the index 2 to obtain a final index (f1+ f 2);
and 2.3.5, determining a machine from the set according to the final index, wherein the selected machine is a machine with short processing time and a small number of processed procedures.
8. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 6, wherein the state transition process in the Markov decision model is specifically:
step 3.1, judging the processing feasibility of the selection process on the selection machine according to the state and the action;
step 3.2, determining and selecting a machined process sequence of a machine according to the analysis chart;
3.3, determining the processing time of the selection procedure on the selection machine;
judging whether the selection process can be inserted into a preset idle time period of the selection machine, if so, executing the step 3.4, and if not, executing the step 3.7;
step 3.4, calculating the earliest machinable time of the selection process and the idle time section of the selection machine, and determining the insertion position of the selection process in the machined process sequence
Step 3.5, modifying the arc relation of the extraction graph according to the insertion position, and deleting other extraction arcs connected with the selected process nodes;
step 3.6, updating the node information and finishing the state conversion;
step 3.7, determining the total start-up time, and adding the process to the end of the processed process sequence;
step 3.8, determining the disjuncting arc direction of the selection procedure on the selection machine, and deleting other disjuncting arcs connected with the selection procedure nodes;
and 3.9, updating the node information and finishing the state conversion.
9. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 6, wherein the computation process of the instantaneous reward and the cumulative reward in the Markov decision model is as follows:
step 4.1, calculating the maximum completion time T of the old state s ;
Step 4.2, calculating the maximum completion time T of the new state s+1 ;
Step 4.3, calculating the instant prize value T s -T s+1 ;
The calculation formula of the accumulated award is as follows:
r t =Ts t -Ts t+1
wherein R is the cumulative prize value, T s1 For maximum completion time, T, corresponding to the initial state send Maximum completion time for the final project due to T s1 The fixed value, which is determined based on the problem information, does not vary with the decision, so maximizing the reward value is equivalent to minimizing the maximum completion time of the final proposal.
10. The flexible job shop scheduling method based on multi-layer deep reinforcement learning according to claim 1, wherein the training process in the P2 training algorithm part is specifically as follows:
step 5.1, generating a main thread and T sub-threads and initializing a training round number Count to be 0;
step 5.2, copying parameters of the main thread model to the sub thread model;
step 5.3, starting each sub thread, and initializing the number of training rounds;
step 5.4, generating U problems by each sub thread;
5.5, solving a flexible workshop scheduling problem through a deep reinforcement learning model by the sub-thread and generating a sample;
step 5.6, completing sample collection, and optimizing the model parameters of the main thread by using a gradient descent strategy, wherein the Count is equal to Count + 1; the optimization formula is as follows:
wherein pi is actorNetworks, namely a graph neural network and a decision network, theta is a parameter of the graph neural network, v is a critic network, and is used for estimating and calculating the advantage function together with the reward value,is a parameter thereof;
step 5.7, judging whether the maximum training round number T is reached c And if the parameter is not reached, executing the step 5.4 to the step 5.6, if the parameter is reached, finishing the training, and storing the main thread model parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210603831.2A CN114912826B (en) | 2022-05-30 | 2022-05-30 | Flexible job shop scheduling method based on multilayer deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210603831.2A CN114912826B (en) | 2022-05-30 | 2022-05-30 | Flexible job shop scheduling method based on multilayer deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114912826A true CN114912826A (en) | 2022-08-16 |
CN114912826B CN114912826B (en) | 2024-07-02 |
Family
ID=82771105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210603831.2A Active CN114912826B (en) | 2022-05-30 | 2022-05-30 | Flexible job shop scheduling method based on multilayer deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114912826B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115293623A (en) * | 2022-08-17 | 2022-11-04 | 海尔数字科技(青岛)有限公司 | Training method and device for production scheduling model, electronic equipment and medium |
CN116414093A (en) * | 2023-04-13 | 2023-07-11 | 暨南大学 | Workshop production method based on Internet of things system and reinforcement learning |
CN116993028A (en) * | 2023-09-27 | 2023-11-03 | 美云智数科技有限公司 | Workshop scheduling method and device, storage medium and electronic equipment |
CN117057569A (en) * | 2023-08-21 | 2023-11-14 | 重庆大学 | Non-replacement flow shop scheduling method and device based on neural network |
CN117892969A (en) * | 2024-01-18 | 2024-04-16 | 河南科技大学 | Flexible workshop operation dynamic scheduling method based on deep reinforcement learning |
CN117973635A (en) * | 2024-03-28 | 2024-05-03 | 中科先进(深圳)集成技术有限公司 | Decision prediction method, electronic device, and computer-readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200026264A1 (en) * | 2018-02-07 | 2020-01-23 | Jiangnan University | Flexible job-shop scheduling method based on limited stable matching strategy |
US20210081787A1 (en) * | 2019-09-12 | 2021-03-18 | Beijing University Of Posts And Telecommunications | Method and apparatus for task scheduling based on deep reinforcement learning, and device |
CN112631214A (en) * | 2020-11-27 | 2021-04-09 | 西南交通大学 | Flexible job shop batch scheduling method based on improved invasive weed optimization algorithm |
CN113792924A (en) * | 2021-09-16 | 2021-12-14 | 郑州轻工业大学 | Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network |
-
2022
- 2022-05-30 CN CN202210603831.2A patent/CN114912826B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200026264A1 (en) * | 2018-02-07 | 2020-01-23 | Jiangnan University | Flexible job-shop scheduling method based on limited stable matching strategy |
US20210081787A1 (en) * | 2019-09-12 | 2021-03-18 | Beijing University Of Posts And Telecommunications | Method and apparatus for task scheduling based on deep reinforcement learning, and device |
CN112631214A (en) * | 2020-11-27 | 2021-04-09 | 西南交通大学 | Flexible job shop batch scheduling method based on improved invasive weed optimization algorithm |
CN113792924A (en) * | 2021-09-16 | 2021-12-14 | 郑州轻工业大学 | Single-piece job shop scheduling method based on Deep reinforcement learning of Deep Q-network |
Non-Patent Citations (1)
Title |
---|
孟彬彬;吴艳;: "面向云计算的分布式机器学习任务调度算法研究", 西安文理学院学报(自然科学版), no. 01, 15 January 2020 (2020-01-15) * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115293623A (en) * | 2022-08-17 | 2022-11-04 | 海尔数字科技(青岛)有限公司 | Training method and device for production scheduling model, electronic equipment and medium |
CN116414093A (en) * | 2023-04-13 | 2023-07-11 | 暨南大学 | Workshop production method based on Internet of things system and reinforcement learning |
CN116414093B (en) * | 2023-04-13 | 2024-01-16 | 暨南大学 | Workshop production method based on Internet of things system and reinforcement learning |
CN117057569A (en) * | 2023-08-21 | 2023-11-14 | 重庆大学 | Non-replacement flow shop scheduling method and device based on neural network |
CN116993028A (en) * | 2023-09-27 | 2023-11-03 | 美云智数科技有限公司 | Workshop scheduling method and device, storage medium and electronic equipment |
CN116993028B (en) * | 2023-09-27 | 2024-01-23 | 美云智数科技有限公司 | Workshop scheduling method and device, storage medium and electronic equipment |
CN117892969A (en) * | 2024-01-18 | 2024-04-16 | 河南科技大学 | Flexible workshop operation dynamic scheduling method based on deep reinforcement learning |
CN117973635A (en) * | 2024-03-28 | 2024-05-03 | 中科先进(深圳)集成技术有限公司 | Decision prediction method, electronic device, and computer-readable storage medium |
CN117973635B (en) * | 2024-03-28 | 2024-06-07 | 中科先进(深圳)集成技术有限公司 | Decision prediction method, electronic device, and computer-readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114912826B (en) | 2024-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114912826A (en) | Flexible job shop scheduling method based on multilayer deep reinforcement learning | |
CN112734172B (en) | Hybrid flow shop scheduling method based on time sequence difference | |
CN114565247B (en) | Workshop scheduling method, device and system based on deep reinforcement learning | |
CN114707881A (en) | Job shop adaptive scheduling method based on deep reinforcement learning | |
CN110458326B (en) | Mixed group intelligent optimization method for distributed blocking type pipeline scheduling | |
CN107544251A (en) | A kind of minimum based on Robust distributed model always drags the Single Machine Scheduling method of phase | |
CN115454005A (en) | Manufacturing workshop dynamic intelligent scheduling method and device oriented to limited transportation resource scene | |
CN115130789A (en) | Distributed manufacturing intelligent scheduling method based on improved wolf optimization algorithm | |
CN116500986A (en) | Method and system for generating priority scheduling rule of distributed job shop | |
CN111353646A (en) | Steel-making flexible scheduling optimization method with switching time, system, medium and equipment | |
CN116466659A (en) | Distributed assembly flow shop scheduling method based on deep reinforcement learning | |
CN117057528A (en) | Distributed job shop scheduling method based on end-to-end deep reinforcement learning | |
CN116151581A (en) | Flexible workshop scheduling method and system and electronic equipment | |
CN115640898A (en) | Large-scale flexible job shop scheduling method based on DDQN algorithm | |
CN117726119A (en) | Graph bionic learning method for solving distributed mixed flow shop group scheduling | |
CN117647960A (en) | Workshop scheduling method, device and system based on deep reinforcement learning | |
CN117314055A (en) | Intelligent manufacturing workshop production-transportation joint scheduling method based on reinforcement learning | |
CN116774657A (en) | Dynamic scheduling method for remanufacturing workshop based on robust optimization | |
CN116562584A (en) | Dynamic workshop scheduling method based on Conv-lasting and generalization characterization | |
CN110705844A (en) | Robust optimization method of job shop scheduling scheme based on non-forced idle time | |
Gu et al. | Minimizing makespan in job-shop scheduling problem using an improved adaptive particle swarm optimization algorithm | |
CN115016405A (en) | Process route multi-objective optimization method based on deep reinforcement learning | |
CN115034615A (en) | Method for improving feature selection efficiency in genetic programming scheduling rule for job shop scheduling | |
CN114219274A (en) | Workshop scheduling method adapting to machine state based on deep reinforcement learning | |
CN117872999A (en) | Method for solving scheduling problem of flexible job shop based on hybrid reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |