CN109902954A

CN109902954A - A kind of flexible job shop dynamic dispatching method based on industrial big data

Info

Publication number: CN109902954A
Application number: CN201910144370.5A
Authority: CN
Inventors: 汤洪涛; 费永辉; 闫伟杰; 陈程; 梁佳炯; 程晓雅; 王丹南; 李晋青
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-02-27
Filing date: 2019-02-27
Publication date: 2019-06-18
Anticipated expiration: 2039-02-27
Also published as: CN109902954B

Abstract

A kind of flexible job shop dynamic dispatching method based on industrial big data, includes the following steps as follows: step 1: Usage data collection tool Sqoop and Flume, scheduling data is obtained from database or file system, and be stored in HDFS file system；Step 2: it by Tool for Data Warehouse Hive, is divided for dispatching data by unit of scheduling scheme；Step 3: it is converted into trained example by data are dispatched using Spark Computational frame, and is stored in the form of scheduling scheme is unit into Hbase；Step 4: being screened from multiple indexs, obtains showing the scheduling data acquisition system that good scheduling scheme generates in commission；Step 5: the cluster based on disturbance attribute is carried out to scheduling relevant historical data；Step 6: using improved random forests algorithm, excavates random forest scheduling rule；Step 7: flexible job shop dynamic dispatching is instructed using excavated scheduling rule.Practical operability of the present invention, computational efficiency are high, quickly can make real-time response to workshop disturbance.

Description

A kind of flexible job shop dynamic dispatching method based on industrial big data

Technical field

The present invention relates to a kind of flexible job shop dynamic dispatching methods based on industrial big data

Background technique

Scheduling plays an important role in manufacture system, and scheduling quality will will affect the competitiveness of manufacturing enterprise itself. Formulating scientific and reasonable scheduling scheme for workshop can be improved production efficiency, reduction process cost, shortening product life cycle, together When can guarantee to deliver goods with guaranteeing the quality on time.Flexible job shop due to its with flexible process route with to the fast of the market demand Fast adaptability to changes can be very good the production requirement for meeting multi items, small lot, and therefore, it has become a kind of lifes being widely used Production mode.Flexible job shop dynamic dispatching is that the disturbance of actual production environment is considered on the basis of static scheduling, is more accorded with Actual production environment is closed, so more having research significance.

As product demand constantly changes to personalization, manufacturing process is more various, and actual schedule problem also becomes more Complexity, manufacturing enterprise is to Job-Shop way to solve the problem in practical operability, computational efficiency and to workshop disturbance Real-time response ability etc. made higher requirement.Priority scheduling rule is a kind of simple heuristic rule, it Computational efficiency is high, practical strong operability and can be used for Real-Time Scheduling, is suitable for the dynamic dispatch environment of complexity.But priority scheduling The performance of rule is influenced by actual environment variation, and single scheduling rule cannot have preferably in all bumpy weathers Scheduling performance.In order to meet the needs of actual job Job-Shop, a kind of feasible thinking is from scheduling relevant historical data It is middle to excavate the scheduling knowledge about scheduling rule to instruct practical Job-Shop activity.Scheduling problem is solved by data mining Research be broadly divided into the method in conjunction with existing priority scheduling rule and the excavation scheduling rule from scheduling relevant historical data Method.

In terms of combining existing priority scheduling rule, WANG Shuang-Xi etc. is in (A hybridknowledge discovery model using decision tree and neural network for selecting Dispatching rules of a semiconductor final testing factory, 2005) it is directed to semiconductor row Industry proposes a kind of combination decision tree and neural network and excavates priority scheduling rule selection mechanism from scheduling relevant historical data Method, most suitable priority scheduling rule under the available current environment of the selection mechanism.SHIUE Y.R. etc. is in (Data- mining-based dynamic dispatching rule selection mechanism for shop floor Control systems using a support vector machine approach, 2009) it proposes a kind of using supporting Vector machine (SupportVector Machine, SVM) excavates priority scheduling rule selection mechanism from scheduling relevant historical data Method, and real-time operation decision is made with this.Mouelhi is in (Training a neuralnetwork to select Dispatching rules in real time, 2009) etc. a kind of scheduling rule selecting party of combination neural network is proposed Method, this method excavate scheduling rule real-time selection method by neural network from the scheduling relevant historical data that emulation generates.

Priority scheduling rule only makes scheduling decision by a small amount of information, this may cause scheduling result not to the utmost such as people Meaning, therefore extracting new scheduling rule from scheduling relevant historical data is another thinking.LI X etc. is in (Discovering Dispatching rules using datamining, 2005) propose it is a kind of using decision tree from scheduling relevant historical data In obtain the method for completely new scheduling rule, and be experimentally confirmed extract scheduling rule can be fitted former dispatching party well Case.SIGURDUR OLAFSSON etc. is in (Learning effective new single machine dispatching Rules from optimal scheduling data, 2010) a kind of two stage scheduling knowledge learning method is proposed, it is first First learn the training example for how obtaining being suitble to excavation from scheduling process data, then rule is scheduled to these training examples Excavation.Wang Chenglong etc. proposes a kind of combination Petri network (method for digging of solving job shop scheduling problem rule is studied, 2015) and builds The branch-bound algorithm of mould and the scheduling rule method for digging of decision Tree algorithms, the scheduling rule of extraction can be used for instructing static work Industry Job-Shop.

In conclusion being primarily directed to vehicle about the method for excavating scheduling rule from scheduling relevant historical data at present Between on scheduling problem, applied to the less of flexible job shop dynamic scheduling problem.In addition, the above method is used Scheduling relevant historical data be partial to gross data, a large amount of uses however as Intellisense equipment in shop layer, workshop Start to intelligent development, Job-Shop relevant historical data shows the industry such as scale is big, value is low, continuous sampling, higher-dimension The characteristics of big data.

Summary of the invention

In order to overcome, the existing practical operability of flexible job shop dynamic dispatching method is not high, computational efficiency is completely and to vehicle Between disturb real-time response scarce capacity the problem of, the present invention provide a kind of practical strong operability, computational efficiency it is high, can be right Workshop disturbs the flexible job shop dynamic dispatching method for making real-time response.

The technical solution adopted by the present invention to solve the technical problems is:

A kind of flexible job shop dynamic dispatching method based on industrial big data, the flexible job shop dynamic dispatching Method the following steps are included:

Step 1, data acquisition: using the metadata acquisition tool under Hadoop ecosystem from existing information system Historical data relevant to scheduling is acquired, and is stored in HDFS file system.

Step 2, Data Integration: by Tool for Data Warehouse Hive using SQL statement to the scheduling in HDFS file system Data acquisition system D_hIt is divided by unit of scheduling scheme, i.e., the scheduling relevant historical generated primary scheduling scheme in commission Data are divided into together.

Step 3, the data relay of integration: being turned to the form of trained example using Spark by data conversion, is convenient for data Mining algorithm is scheduled rule digging.

Step 4, data screening: from three Maximal Makespan, total tardiness time, machine total load indexs to history tune Degree scheme is considered, and screening obtains showing the scheduling relevant historical data set that good scheduling scheme generates in commission. It specifically includes:

Step 4.1: in Maximal Makespan index, to only use the dispatching party of SPT rule generation in situation of the same race The Maximal Makespan of case is as screening criteria.

Step 4.2: identical to use EDD rule combination SPT rule to complete in situation of the same race in total tardiness time index In the case of scheduler task total extension time as screening criteria.

Step 4.3: in machine total load index, to combine LMWT and SPT rule to complete scheduler task in situation of the same race Machine total load as screening criteria, the scheduling data that the scheduling scheme of these three indexs generates in commission can be met simultaneously Set, it will the input as scheduling rule mining algorithm.

Step 5；Cluster based on disturbance attribute: DBSCAN clustering method is used, to the scheduling relevant historical number after screening According to carrying out based on disturbance attribute using scheduling scheme as unit (i.e. the data that a scheduling scheme generates are as an object) Cluster.It specifically includes:

Step 5.1: noisy data when executing to scheme carries out data normalization processing, if certain disturbance attribute is in each side Data in case are X1, X2, X3..., Xn, then they need to convert by such as formula (1).

In formula (1)Indicate the mean value of attribute；S is expressed as standard deviation；Y1, Y2, Y3..., Yn are the data after standardization.

Step 5.2: determining the parameter field radius Eps of DBSCAN algorithm, and included at least in the radius of kernel object field Object number MinPts.

Step 5.3: finding out a kernel object p at random, create new cluster of the p as kernel object.It is found again from p The reachable object of direct density, is classified in cluster.

Step 5.4: repeating step 5.3, when not new point can be added to any cluster, which terminates.

Step 6, random forest scheduling rule are excavated: being used improved random forests algorithm, divided from cluster each after cluster The forest scheduling scheduling rule 1 that workpiece selection machine problem Wa Jue be resolved is asked with idle machine selection workpieces processing is solved The random forest scheduling 2 of topic.It specifically includes:

Step 6.1: for the cluster after each cluster, extracting trained example with putting back to from cluster, form k new instructions Practice example collection, for constructing decision tree.

Step 6.2: m characteristic attribute of random selection, and best divisional mode is calculated, it is respectively trained to obtain k decision tree.

Step 6.3: using non-selected trained example in cluster, the classification performance of test decision tree.

Step 6.4: judging whether there is similar decision tree, if it exists similar decision tree, then reservation table is existing in testing Good decision tree, forms random forest.

Step 6.5: logical to calculate the weight w of every decision tree according to Bayes's voting mechanism, h obtains forest scheduling scheduling Rule 1 and random forest scheduling 2.

Step 7, scheduling rule use: instructing flexible job shop dynamic to adjust by excavated random forest scheduling rule Degree.It specifically includes:

Step 7.1 selects workpieces processing problem according to the problem of solution for workpiece machine choice or idle machine, It finds and random forest scheduling rule 1 corresponding to cluster belonging to the bumpy weather of current flexible job shop or random forest tune Metric then 2.

Step 7.2, according to selected random forest scheduling rule, optimal method is selected by comparing two-by-two, in candidate machine Most suitable workpiece or machine are selected in device set M or candidate artifacts set J.

Technical concept of the invention are as follows: Job-Shop relevant historical data show scale is big, value is low, continuous sampling, The characteristics of industry big data such as higher-dimension, therefore combine big data according to the pretreatment for completing scheduling relevant historical data first.Fig. 2 is provided Combine the data prediction model of big data technology.Data prediction flexible job shop dynamic scheduling problem is that having disturbance In the environment of the data acquisition system D that solves the problems, such as the machine choice of workpiece and the workpiece select permeability of idle machine, therefore acquire_hPoint It is system disturbance information relevant to disturbance when scheduling scheme is formulated for three parts: d1；D2 is that certain procedure selection of workpiece adds When work machine, the status information of every machine in the collection of machines of this procedure can be currently processed；D3 needs for idle machine When selecting workpiece to be processed in waiting list, the status information of each workpiece in current queue.Dispatch data acquisition system D_hIn data mode it is chaotic, cannot be used directly for next data screening, cluster and scheduling rule excacation, need Pass through Data Integration and the arrangement scheduling data acquisition system D that is converted_hIn data.In scheduling data acquisition system D_hIn to imply reflection real The mass efficient information of border dispatch environment feature and scheduling knowledge, at the same also along with many useless or wrong rule or Mode.Therefore the multiple parameter data Filtering system of Fig. 3 is used, from three Maximal Makespan, total tardiness time, machine total load angles Degree considers history scheduling scheme, retains generated data in the schedule history scheme execution for meet three indexs.

Using random forests algorithm as the mining algorithm of scheduling rule, the algorithm building thus of finally obtained scheduling rule Random forest, is essentially more trained C4.5 decision trees, and the scheduling performance of scheduling rule depends on point of decision tree Class performance, the computational efficiency and complexity of scheduling rule depend on the numbers of branches of decision tree.By DBSCAN cluster to compared with Excellent scheduling data D_bIt is reasonably divided, distinguishes data caused by the scheduling decision made under different bumpy weathers, The scheduling rule for different bumpy weathers is respectively obtained from each region of division again, gained random forest tune can be enhanced Metric then in decision tree classification performance and reduce numbers of branches so that scheduling rule complexity is lower, computational efficiency Higher, scheduling performance is more preferable.

Learning scheduling rule f, f from schedule history related data by random forests algorithm is to true scheduling rule in fact A kind of then estimation of ySoIt is that there is a certain error between y.Error includes three parts: noise δ², varianceAnd deviationWherein noise δ²It is inevitable, but can be by reducing varianceOr deviationThe error of algorithm is reduced, to improve the performance of random forests algorithm.The correlation between decision tree is reduced simultaneously ρ can reduce variance, therefore if the similarity between two decision numbers is excessive, retain the decision number that test does very well, thus Reduce the correlation ρ between decision tree.Traditional random forests algorithm is using the voting mechanism that the minority is subordinate to the majority, random gloomy No matter fine or not the classification performance of decision tree is in woods, weight all having the same.Such mechanism results in determining for classification performance difference Plan tree and the good decision tree of classification performance influence degree having the same for final result.Therefore Bayes's voting machine is used herein System.The mechanism is classified in testing based on every decision tree shows one weight of setting, then votes according to this weight.

Beneficial effects of the present invention are mainly manifested in: being dug from the scheduling relevant historical data with industrial big data Scheduling rule is dug to instruct the method for scheduling as main body frame, the data prediction model for combining big data technology is established, mentions The high speed and accuracy of data prediction, establishes the behavior aggregate based on disturbance attribute, reduces scheduling rule complexity Degree, improve scheduling rule computational efficiency is higher and scheduling performance, establish based on the scheduling for improving random forests algorithm Mining model improves the generalization ability and scheduling performance of scheduling rule.

Detailed description of the invention

Fig. 1 is that scheduling rule of the invention excavates overall architecture.

Fig. 2 is the scheduling data prediction model of combination big data technology of the invention.

Fig. 3 is multiple parameter data Filtering system of the invention.

Fig. 4 is the flow chart that improved random forests algorithm of the invention excavates scheduling rule.

Fig. 5 is the resulting dispatching party of flexible job shop dynamic dispatching method of use of the invention based on industrial big data Case.

Specific embodiment

- Fig. 5 referring to Fig.1, a kind of flexible job shop dynamic dispatching method based on industrial big data, overall framework reference Fig. 1 is specifically divided into three parts: first part, specific referring to Fig. 2 in conjunction with the scheduling data prediction model of big data technology It is divided into data acquisition, Data Integration, data conversion and data screening；First part, based on disturbance hierarchical cluster attribute strategy；Third portion Point, based on the scheduling rule mining model for improving random forests algorithm.Its technical step generally is as follows: step 1, data Acquisition: existing from MES, ERP, SCADA etc. using the metadata acquisition tool Sqoop and Flume under Hadoop ecosystem Historical data relevant to scheduling is acquired in information system, and is stored in HDFS file system.Acquiring data includes three Divide D_h={ d1, d2, d3 }: d1 is system disturbance information relevant to disturbance when scheduling scheme is formulated；D2 is certain road work of workpiece When sequence selects processing machine, the status information of every machine in the collection of machines of this procedure can be currently processed；D3 is sky When not busy machine needs to select workpiece to be processed in waiting list, the status information of each workpiece in current queue.

Data conversion: d2 and d3 in data after integration is partially converted into the shape of trained example using Spark by step 3 Formula is scheduled rule digging convenient for data mining algorithm.It specifically includes:

Step 3.1: for the scheduling data acquisition system D of acquisition_hThe part d2, by actual selection in certain history scheduling scheme Machine m1 is considered as most suitable machine, and the machine in the alternative collection of machines { m2, m3... } of this process is can be processed with other in it Compare to form trained example one by one.

Step 3.2: for the scheduling data acquisition system D of acquisition_hThe part d3, by actual selection in certain history scheduling scheme Workpiece j1 is considered as most suitable machine, by its with other etc. workpiece in workpiece set { j2, j3... } to be processed compare one by one Form training example.

Step 4, data screening: from three Maximal Makespan, total tardiness time, machine total load indexs to history tune Degree scheme is considered, and screening obtains showing the scheduling relevant historical data set that good scheduling scheme generates in commission D_b.It specifically includes:

Step 4.1: in Maximal Makespan index, to only use the dispatching party of SPT rule generation in situation of the same race The Maximal Makespan of case is as screening criteria.It only uses SPT rule and refers to that most fast machine and idle machine are processed in workpiece selection Device selects process time shortest workpiece.The scheduling scheme that Maximal Makespan meets this index enters step 4.2, is unsatisfactory for then It eliminates.

Step 4.2: identical to use EDD rule combination SPT rule to complete in situation of the same race in total tardiness time index In the case of scheduler task total extension time as screening criteria.SPT+EDD rule refers to that workpiece selection processing is most fast Machine and idle machine selection delivery date earliest workpiece.The scheduling scheme that total tardiness time meets this index enters step 4.3, it is unsatisfactory for, eliminates.

Step 4.3: in machine total load index, to combine LMWT and SPT rule to complete scheduler task in situation of the same race Machine total load as screening criteria, LMWT+SPT rule refers to workpiece selection free time longest machine and idle Machine choice process time shortest workpiece.The scheduling number that the scheduling scheme of these three indexs generates in commission can be met simultaneously According to set, it will the input as scheduling rule mining algorithm.

Step 5；Cluster based on disturbance attribute: using DBSCAN to D_bUsing scheduling scheme as a unit (i.e. dispatching party The data that case generates are as an object), according to D_bIn solution formulation when system disturbance attribute (part d1) be based on Disturb the cluster of attribute.It specifically includes:

Step 5.1: to d1 partial data standardization, if certain data of disturbance attribute in each scheme is X1, X2, X3..., Xn, then they need to convert by such as formula (1).

Step 5.3: finding out the kernel object p of one not processed (not being classified as some cluster or labeled as noise) at random (the object number for including in the radius of field is not less than MinPts), new cluster C is established, all objects in p radius of neighbourhood Eps are added Enter Candidate Set N.

Step 5.4: finding out not yet processed object q in a Candidate Set N at random.If q is kernel object, by q neighbour Object that is not processed and not being added to N is added in N in the radius Eps of domain.If q is not included into any one cluster, q is added Enter C.

Step 5.5: step 5.4 is repeated, until N is sky.

Step 5.6: repeating step 5.3,5.4,5.5, when not new object can be added to any cluster, the mistake Journey terminates

Step 6, random forest scheduling rule are excavated: being used improved random forests algorithm, divided from cluster each after cluster The forest scheduling rule 1 that workpiece selection machine problem Wa Jue be resolved selects workpieces processing with idle machine is solved the problems, such as Random forest scheduling 2.It specifically includes:

Step 6.1: for the cluster after each cluster, (being excavated from the d2 (excavating random forest scheduling rule 1) in cluster with d3 Random forest scheduling rule 2) in extract trained example with putting back to, be respectively formed new training the example collection P1 and P2 of k, be used for Construct decision tree.

Step 6.2:P1 and P2 randomly chooses m characteristic attribute from d2 and d3 respectively, and calculates best divisional mode, point K decision tree T1 and T2 Xun Lian not obtained.

Wherein decision tree building process are as follows:

Step 6.2.1: creation root node N.

Step 6.2.2: whether there are also remaining training examples for training of judgement example collection, if the return node N without if, if having Then in next step.

Step 6.2.3: whether the scheduling decision of training of judgement example collection residue training example is all C, if then returning Node N, and it is labeled as class C, if having in next step.

Step 6.2.4: judge to produce attribute list whether be it is empty, it is empty then be labeled as most classes occur in sample, otherwise In next step.

Step 6.2.5: check whether the attribute in Attribute class table is continuity, and connection attribute will be obtained by dichotomy The maximum attribute separate mode of attribute gain G (D, A).(all properties value of attribute can be divided by two parts by dichotomy, This shared N-1 kind division methods, the division threshold value of dichotomy are selection two respectively in the average value of adjacent two o'clock.Information gain meter Calculation mode such as formula (2), (3), (4)).

G (D, A)=H (D)-H (D | A) (2)

G (D, A) indicates the information gain of attribute A in formula (2)；H (D, A) classification information entropy in formula (3)；H in formula (4) (D | A conditional entropy) is indicated；In addition, D indicates training Exemplar Data Set, | D | indicate the training example quantity of D, and D has K classification C_k, k=1,2；|C_k| it indicates in classification C_kIn training example number.D can be divided into n subset D by attribute A₁, D₂..., D_n, | D_i| it is D_iTraining example number.D_iIn belong to class C_kThe collection of training example be combined into D_ik, | D_ik| it is D_ikInstruction Practice example number.

Step 6.2.6: the selection maximum attribute flag node N of information gain-ratio, information gain-ratio calculation formula such as formula (5), (6), return step 6.2.2.

GR (D, A)=G (D, A)/H (A) (5)

GR (D, A) indicates information gain-ratio in formula (5)；H (A) indicates division information；Other symbol meanings are same as above.

Step 6.3: using non-selected trained example in d2 and d3, testing the classification chart of decision tree in T1 and T2 respectively It is existing.

Step 6.4: calculate the similarity S in T1 or T2 between decision tree, calculation formula such as formula (7), if decision tree it Between similarity be greater than 60%, then the test performance in comparison step 6.3, retains the decision tree done very well, and forms random gloomy Woods.

DT in formula (7)₁With DT₂Indicate two decision trees of progress similarity calculation；K indicates DT₁With DT₂To test case The identical number of classification results；r_1nWith r_2nWhen indicating that n-th classification results are identical, DT₁With DT₂The characteristic attribute used, c table Show classification results；Work as r_1n=r_2nWhen, i.e. DT₁With DT₂When obtaining identical classification results with identical characteristic attribute, I (r_1n.c, r_2nIt .c)=1, is otherwise the number of test case for 0, Nt.

Step 6.5: by Bayes's voting mechanism, calculating separately the weight w of every decision tree in T1 and T2, h is calculated public Formula such as formula (8), (9) obtain forest scheduling scheduling rule 1 and random forest scheduling rule 2.

V represents the number that this decision tree correctly classifies to test case in formula (8), (9)；M is indicated to test case mistake Classification number；

Step 7.2.1, for workpiece machine choice problem, if m1, m2 are two machines in M, according to selected by step 7.1 The selection result of every decision tree in random forest scheduling rule is calculated in random forest scheduling rule 1, wraps in these results Selection 1 and selection 2 have been contained (selecting 1 to represent, the former m1 is suitable, and it is suitable that selection 2 represents the latter m2).Work is selected for idle machine Part problem, if j1, j2 are two workpiece in J, the random forest scheduling rule 2 according to selected by step 7.1 is calculated random The selection result of every decision tree in forest scheduling rule, decision 1 and decision 2 are contained in these results, and (decision 1 represents the former J1 is suitable, and it is suitable that decision 2 represents the latter j2).

Step 7.2.2: selection result WR, WR after every decision tree weighting are obtained by Bayes's voting mechanism and calculate public Formula such as formula (10), and the average value AWR of weighted results is acquired, if AWR indicates that the former m1 or j1 is suitable less than 1.5, if AWR is greater than 1.5 indicate that the latter m2 or j2 are suitable.

WR=wC+hR (10)

C represents the classification results that this decision tree provides in formula (10)；R represents the equal of the classification results that all decision trees provide Value, w, h calculation formula are shown in formula (8), (9).

Example: in certain scheduler task, needing workpieces processing JT1, JT2 ..., JT8 each 100, i.e., and 10 batches, they Delivery date be respectively 20.0,22.0,14.0,21.0,19.0,22.0,18.0,23.0 process unit's times, the per pass of workpiece Process time such as table one of the process on each machine.And mechanical disorder has occurred when the time is 4, in first of work of JT1 Find that its second operation work occurs to lack material after the completion of sequence, when the time is 10, the process time of workpiece all increases by 10%.

Each work pieces process timetable of table one

The scheduling scheme that is obtained by the flexible job shop dynamic dispatching method based on industrial big data as shown in figure 5, Abscissa indicates the time in figure, and ordinate indicates machine, the percentile digital representation workpiece type in Gantt chart, unit numbers table Show operation number.The Maximal Makespan of final scheme is 21.8 process unit's times, 5.3 process unit's time of Zong Tuoqi, machine 96.4 process unit's time of total load.

Flexible job shop dynamic scheduling problem can smoothly be solved using patented method, and excavated using this method The property made is strong, computational efficiency is high to instruct flexible job shop scheduling to have actually for scheduling rule, without carrying out to scheduling problem The features such as modeling, the disturbance in energy real-time response workshop.

Claims

1. a kind of flexible job shop dynamic dispatching method based on industrial big data, comprising the following steps:

Step 1, data acquisition: using the metadata acquisition tool Sqoop and Flume under Hadoop ecosystem, from MES, ERP, Historical data relevant to scheduling is acquired in the existing information system such as SCADA, and is stored in HDFS file system；Acquisition Data include three parts D_h={ d1, d2, d3 }: d1 is system disturbance information relevant to disturbance when scheduling scheme is formulated；D2 is When certain procedure of workpiece selects processing machine, the state of every machine in the collection of machines of this procedure can be currently processed Information；When d3 is that idle machine needs to select workpiece to be processed in waiting list, the shape of each workpiece in current queue State information；

Step 2, Data Integration: by Tool for Data Warehouse Hive using SQL statement to the scheduling data in HDFS file system Set D_hIt is divided by unit of scheduling scheme, i.e., the scheduling relevant historical data generated primary scheduling scheme in commission It is divided into together；

Step 3, the d2 and d3 in data after integration: being partially converted into the form of trained example using Spark by data conversion, Rule digging is scheduled convenient for data mining algorithm；It specifically includes:

Step 3.1: for the scheduling data acquisition system D of acquisition_hThe part d2, by the machine of actual selection in certain history scheduling scheme M1 is considered as most suitable machine, and the machine in the alternative collection of machines { m2, m3... } of this process is can be processed one by one with other in it Compare to form trained example；

Step 3.2: for the scheduling data acquisition system D of acquisition_hThe part d3, by the workpiece of actual selection in certain history scheduling scheme J1 is considered as most suitable machine, by its with other etc. workpiece in workpiece set { j2, j3... } to be processed compare to be formed one by one Training example；

Step 4, data screening: from three Maximal Makespan, total tardiness time, machine total load indexs to history dispatching party Case is considered, and screening obtains showing the scheduling relevant historical data set D that good scheduling scheme generates in commission_b；Tool Body includes:

Step 4.1: in Maximal Makespan index, to only use the scheduling scheme of SPT rule generation in situation of the same race Maximal Makespan is as screening criteria；It only uses SPT rule and refers to that workpiece selection is processed most fast machine and idle machine and selected Select process time shortest workpiece；The scheduling scheme that Maximal Makespan meets this index enters step 4.2, is unsatisfactory for, and washes in a pan It eliminates；

Step 4.2: in total tardiness time index, to use EDD rule combination SPT rule to complete same case in situation of the same race Under scheduler task total extension time as screening criteria；SPT+EDD rule refers to that most fast machine is processed in workpiece selection Device and idle machine select delivery date earliest workpiece；The scheduling scheme that total tardiness time meets this index enters step 4.3, no Satisfaction is then eliminated；

Step 4.3: in machine total load index, to combine LMWT and SPT rule to complete the machine of scheduler task in situation of the same race For device total load as screening criteria, LMWT+SPT rule refers to workpiece selection free time longest machine and idle machine Select process time shortest workpiece；The scheduling data set that the scheduling scheme of these three indexs generates in commission can be met simultaneously It closes, it will the input as scheduling rule mining algorithm；

Step 5；Cluster based on disturbance attribute: using DBSCAN to D_bUsing scheduling scheme as unit, i.e., a scheduling scheme produces Raw data are as an object, according to D_bIn solution formulation when system disturbance attribute, i.e. the part d1 is carried out based on disturbance The cluster of attribute；It specifically includes:

Step 5.1: to d1 partial data standardization, if certain data of disturbance attribute in each scheme is X1, X2, X3..., Xn, then they need to convert by such as formula (1)；

In formula (1)Indicate the mean value of attribute；S is expressed as standard deviation；Y1, Y2, Y3..., Yn are the data after standardization；

Step 5.2: the parameter field radius Eps of DBSCAN algorithm is determined, with pair included at least in the radius of kernel object field As number MinPts；

Step 5.3: finding out the kernel object p of one not processed (not being classified as some cluster or labeled as noise), core at random The object number for including in the radius of the field heart object p is not less than MinPts, establishes new cluster C, will be all in p radius of neighbourhood Eps Candidate Set N is added in object；

Step 5.4: finding out not yet processed object q in a Candidate Set N at random；If q is kernel object, by q neighborhood half Object that is not processed and not being added to N is added in N in diameter Eps；If q is not included into any one cluster, C is added in q；

Step 5.5: step 5.4 is repeated, until N is sky；

Step 5.6: repeating step 5.3,5.4,5.5, when not new object can be added to any cluster, the process knot Beam

Step 6, random forest scheduling rule are excavated: being used improved random forests algorithm, dug respectively from cluster each after cluster The forest scheduling scheduling rule 1 that pick is resolved workpiece selection machine problem selects workpieces processing with idle machine is solved the problems, such as Random forest scheduling 2；It specifically includes:

Step 6.1: for the cluster after each cluster, (being excavated random from d2 (excavating random forest scheduling rule 1) and the d3 in cluster Forest scheduling rule 2) in extract trained example with putting back to, new training the example collection P1 and P2 of k are respectively formed, for constructing Decision tree；

Step 6.2:P1 and P2 randomly chooses m characteristic attribute from d2 and d3 respectively, and calculates best divisional mode, instructs respectively Get k decision tree T1 and T2；

Wherein decision tree building process are as follows:

Step 6.2.1: creation root node N；

Step 6.2.2: whether there are also remaining training examples for training of judgement example collection, if the return node N without if, if under having One step；

Step 6.2.3: whether the scheduling decision of training of judgement example collection residue training example is all C, if then return node N, and it is labeled as class C, if having in next step；

Step 6.2.4: judge to produce attribute list whether be it is empty, it is empty then be labeled as most classes occur in sample, it is otherwise next Step；

Step 6.2.5: check whether the attribute in Attribute class table is continuity, and connection attribute will obtain attribute by dichotomy The maximum attribute separate mode of gain G (D, A)；(all properties value of attribute can be divided by two parts by dichotomy, this Shared N-1 kind division methods, the division threshold value of dichotomy are selection two respectively in the average value of adjacent two o'clock；Information gain calculating side The following formula of formula (2), (3), (4)；

G (D, A)=H (D)-H (D | A) (2)

G (D, A) indicates the information gain of attribute A in formula (2)；H (D, A) classification information entropy in formula (3)；H in formula (4) (D | A) table Show conditional entropy；In addition, D indicates training Exemplar Data Set, | D | indicate the training example quantity of D, and D has K classification C_k, k= 1,2；|C_k| it indicates in classification C_kIn training example number；D can be divided into n subset D by attribute A₁, D₂..., D_n, | D_i| it is D_iTraining example number；D_iIn belong to class C_kThe collection of training example be combined into D_ik, | D_ik| it is D_ikTraining example Number；

Step 6.2.6: the selection maximum attribute flag node N of information gain-ratio, information gain-ratio calculation formula such as formula (5), (6), return step 6.2.2；

GR (D, A)=G (D, A)/H (A) (5)

GR (D, A) indicates information gain-ratio in formula (5)；H (A) indicates division information；Other symbol meanings are same as above；

Step 6.3: using non-selected trained example in d2 and d3, the classification for testing decision tree in T1 and T2 respectively is showed；

Step 6.4: calculating the similarity S in T1 or T2 between decision tree, calculation formula such as formula (7), if between decision tree Similarity is greater than 60%, then the test performance in comparison step 6.3, retains the decision tree done very well, form random forest；

DT in formula (7)₁With DT₂Indicate two decision trees of progress similarity calculation；K indicates DT₁With DT₂Classify to test case and ties The identical number of fruit；r_1nWith r_2nWhen indicating that n-th classification results are identical, DT₁With DT₂The characteristic attribute used, c presentation class knot Fruit；Work as r_1n=r_2nWhen, i.e. DT₁With DT₂When obtaining identical classification results with identical characteristic attribute, I (r_1n.c,r_2n.c)= 1, it is otherwise the number of test case for 0, Nt；

Step 6.5: by Bayes's voting mechanism, calculating separately the weight w of every decision tree in T1 and T2, h, calculation formula is such as Formula (8), (9) obtain forest scheduling scheduling rule 1 and random forest scheduling rule 2；

V represents the number that this decision tree correctly classifies to test case in formula (8), (9)；M indicates to classify to test case mistake Number；

Step 7, scheduling rule use: instructing flexible job shop dynamic dispatching by excavated random forest scheduling rule；Tool Body includes:

Step 7.1 is found according to being that workpiece machine choice or idle machine select workpieces processing problem the problem of solution Random forest scheduling rule 1 corresponding to cluster belonging to bumpy weather with current flexible job shop or random forest scheduling rule Then 2；

Step 7.2, according to selected random forest scheduling rule, optimal method is selected by comparing two-by-two, in candidate machine collection It closes in M or candidate artifacts set J and selects most suitable workpiece or machine；

Step 7.2.1, it is random according to selected by step 7.1 if m1, m2 are two machines in M for workpiece machine choice problem The selection result of every decision tree in random forest scheduling rule is calculated in forest scheduling rule 1, contains in these results Selection 1 and selection 2, selecting 1 to represent, the former m1 is suitable, and it is suitable that selection 2 represents the latter m2；Idle machine selection workpiece is asked Topic, if j1, j2 are two workpiece in J, random forest scheduling rule 2, is calculated random forest according to selected by step 7.1 The selection result of every decision tree in scheduling rule, decision 1 and decision 2 are contained in these results, and decision 1 represents the former j1 and closes Suitable, it is suitable that decision 2 represents the latter j2；

Step 7.2.2: selection result WR, the WR calculation formula after every decision tree weighting are obtained such as by Bayes's voting mechanism Formula (10), and the average value AWR of weighted results is acquired, if AWR indicates that the former m1 or j1 is suitable less than 1.5, if AWR is greater than 1.5 Indicate that the latter m2 or j2 are suitable；

WR=wC+hR (10)

C represents the classification results that this decision tree provides in formula (10)；R represents the mean value for the classification results that all decision trees provide, W, h calculation formula are shown in formula (8), (9).