CN106650993B

CN106650993B - Dynamic resource optimization method based on Markov decision process

Info

Publication number: CN106650993B
Application number: CN201610887855.XA
Authority: CN
Inventors: 杨建新; 秦强; 吉军; 刘文军; 杨一铭
Original assignee: Information Central Of China North Industries Group Corp
Current assignee: Information Central Of China North Industries Group Corp
Priority date: 2016-10-11
Filing date: 2016-10-11
Publication date: 2020-07-03
Anticipated expiration: 2036-10-11
Also published as: CN106650993A

Abstract

The invention belongs to the technical field of dynamic resource optimization, and particularly relates to a dynamic resource optimization method based on a Markov decision process. The method breaks through the traditional manufacturing resource selection method, abstracts the problem that a plurality of development tasks accurately regulate and control the cloud manufacturing resources in the cloud manufacturing environment into a Markov decision-making selection process, and realizes the mathematical modeling of uncertainty of the development process on resource selection; the expected development cost is taken as a target function, a cross entropy method is adopted for calculation, the combination optimization problem is converted into a correlation random optimization problem, the optimal selection probability of cloud manufacturing resources is obtained, reasonable scheduling and efficient utilization of manufacturing resources in the collaborative development work of complex products are achieved, and the product development risk and the manufacturing cost are effectively reduced.

Description

Dynamic resource optimization method based on Markov decision process

Technical Field

The invention belongs to the technical field of dynamic resource optimization, and particularly relates to a dynamic resource optimization method based on a Markov decision process.

Background

Under the current global manufacturing environment, the development of complex products is often completed by scheduling enterprise resources with different regions, different types and different characteristics. Cloud manufacturing is supported by an information network, geographically dispersed enterprise resources with complementary capabilities are connected by integrating social manufacturing resources and capabilities, sharing, integration and cooperative work of the dispersed manufacturing resources are realized, design, manufacture and assembly of complex products and the whole life cycle of sales and service are completed cooperatively, and the maximum benefit is obtained while market demands are better responded.

Manufacturing resources in a cloud manufacturing environment comprise various physical elements of all production activities of an enterprise in the whole life cycle of a product, have the characteristics of various varieties, heterogeneous shapes, geographical dispersion and the like, accurately regulate and control cloud manufacturing resources and manufacturing capacity, construct a cloud manufacturing resource combination with optimal overall service quality and highest cluster cooperation capacity, and become the key for smoothly developing cloud manufacturing.

The advantages and disadvantages of the cloud manufacturing resource combination optimization model and the solving mechanism directly influence the product development quality and whether the development process can be safely and smoothly carried out, the cloud manufacturing characteristics determine that the manufacturing resource selection in the product development process is full of uncertain factors, most of the current research on the manufacturing resource optimization configuration only considers the problems of time, cost, quality, resource evaluation and the like, and the influence of the uncertainty of the product development process on the cloud manufacturing resource selection is not fully considered: namely, the possibility of failure exists in the product development, which not only makes the product development process full of risks, but also has great influence on the product development cost and the development period.

Therefore, how to realize dynamic optimization selection of cloud manufacturing resources under the influence of uncertainty in the product development process remains a technical problem which needs to be solved urgently.

The Markov decision process is a stochastic dynamic system optimal decision process based on the Markov process theory, and has the characteristic that under the condition of knowing the current state, the future evolution of the Markov decision process is independent of the past evolution of the Markov decision process, namely, a decision maker periodically or continuously observes a stochastic dynamic system with Markov property in the decision process and makes decisions sequentially. At present, the Markov decision process is widely applied to a plurality of fields of natural science and engineering technology, and particularly, a great deal of practice and popularization are achieved on the aspect of prediction technology.

Disclosure of Invention

Technical problem to be solved

The technical problem to be solved by the invention is as follows: how to provide a dynamic resource optimization method based on a Markov decision process, which tries to regard a dynamic resource optimization configuration problem in a cloud manufacturing environment as a Markov decision process and utilizes a cross entropy algorithm to carry out optimization solution on dynamic resources.

(II) technical scheme

In order to solve the above technical problems, the present invention provides a dynamic resource optimization method based on a markov decision process, which is implemented based on a dynamic resource optimization system, the dynamic resource optimization system comprising: the system comprises a complex product task decomposition module, a development capability network construction module, a dynamic resource selection decision module and a cross entropy solving module;

step S1: decomposing the total task F through a complex product task decomposition module according to the performance requirement, the structure requirement and the precision requirement of the complex product to form n development subtasks, namely F ═ { F ═

F

_i1,2, …, n, where f_iRepresenting the ith subtask in the development process;

step S2: according to the requirements of each development subtask on cloud manufacturing resources, a dynamic complex product development capability network consisting of development capability resources of a cross-region enterprise is established through a development capability network construction module; the developing capability network construction module comprises: the system comprises an enterprise development capacity resolver, a capacity resource pool builder and a capacity network builder;

the step S2 includes the following sub-steps:

step S201: the enterprise manufacturing resources of cross regions under the cloud manufacturing environment are virtualized through an enterprise development capacity decomposer, and enterprise development capacity is uniformly expressed as an enterprise development capacity unit c_ij＝{lov(c_ij),f_iJ }; wherein lov (c)_ij) For a certain subtask f_iFor enterprise j, the level of development ability to complete the task, and the size of the level reflects the expected level of completion of the development task;

step S202: based on step S201, for a certain subtask fi, according to the number of enterprises in the enterprise manufacturing resources across the region, repeating step S201 several times, and further establishing a virtual enterprise development capability resource pool cp (i) { c) } c by the capability resource pool builder_i1,c_i2,,…,c_ij}，i＝1～n；

Step S203: according to the sequential relationship between the adjacent subtasks, the capability network builder establishes the sequential relationship between the enterprise development capability resource pools and the association relationship between each enterprise development capability unit in the two adjacent enterprise development capability resource pools, so as to form a dynamic complex product development capability network;

step S3: based on a complex product development capability network, a dynamic resource selection decision module obtains a dynamic resource allocation strategy according to a Markov decision method; the step S3 includes the steps of:

step S301: in the complex product development capacity network diagram, in each subtask development process, a certain time t (t is 0,1,2, …) is set to be capable of allocating only one enterprise development capacity unit c_ijThe corresponding development requirements are met;

step S302: taking the current subtask development stage corresponding to the time t as the task state corresponding to the time t, the task state space of the total task F can be expressed as S ═ S_t,t≥0}＝{f₁,f₂,…,f_n}；

Step S303: for a certain time t, the corresponding subtask is subtask f_iThen its task state is S_t＝f_iDefining the enterprise development capacity unit corresponding to the time t

At theta_ijProbabilistic success of the development subtask f_iThen the next sub-task f is entered at the next time t +1_i+1In the enterprise development capacity unit distribution stage, the task state at the moment t +1 is S_t+1＝f_i+1；

At 1-theta_ijIndicating incomplete development task f_iIs the task development failed, the task state at the next time t +1 is the same as the state at the time t, i.e. S_t+1＝S_t＝f_i(ii) a Wherein the probability theta_ijAnd development capability level lov (c)_ij) The relation of (A) is as follows: theta_ij＝lov(c_ij)/10；

Step S304: for a certain subtask f_iEnterprise development capability unit distributed at time t

In other words, a development task of size of about

The development cost of (2);

step S305: setting S_t＝f_nIs the target state, i.e. the final state;

step S306: from task start time 0 to task completion time t, the Enterprise development capability Unit assignment process can be described by a history of the Markov process:

step S307: history description H according to step S306_tIs acquired in history H_tAllocating enterprise development capability units under conditions

Probability set of

Namely a dynamic resource allocation strategy;

step S308: setting gamma as slave demand state S₀＝f₁First reaching the final state S_t＝f_nThe development times of the enterprise development capacity units distributed in time are defined as a development capacity sequence from time 0 to time gamma

Step S309: the dynamic resource scheduling optimization configuration problem in the cloud manufacturing environment can be described as seeking an optimal selection strategy so that the development expectation cost Z (X) is minimum; wherein the content of the first and second substances,

E_πrepresenting the expectation with respect to the probability density pi;

step S4: optimizing and outputting the dynamic resource allocation strategy by adopting a cross entropy solving module; the step S4 includes the steps of:

step S401: aiming at the dynamic resource allocation strategy pi, performing initialization operation to enable the probability of each enterprise development capacity unit in the dynamic resource allocation strategy pi being allocated to be the same, namely after initialization, the process of allocating the enterprise development capacity units is a random process, and therefore the dynamic resource allocation strategy pi is characterized as an initial transfer matrix P with the same element values and the sum of the element values of each row being 1;

step S402: randomly selecting an enterprise development capacity unit as a starting point in a complex product development capacity network corresponding to the total task F, and generating a path X through n steps of different state random transitions based on an initial transition matrix P in view of the fact that the number of subtasks is n₁,X₂,…,X_nSince the state transition process is random, N paths can be obtained, and each path X is calculated_iCost Z (X)_i)；

Step S403: will develop the expense Z (X)_i) Sorting from small to large:

Z(X_i)₍₁₎≤Z(X_i)₍₂₎≤…≤Z(X_i)_(N)its quantile value rho is

Step S404: calculating by utilizing a Lagrange multiplier method according to the obtained quantile value to obtain a first probability transfer matrix P ', wherein an element P in the first probability transfer matrix P' is_ij' is represented as:

wherein p is_ij' represents the probability that the allocated capacity development unit is j when the subtask i is developed;

is shown inIn N paths, for the paths with the development cost not higher than gamma, the times of developing the units are distributed when the subtask i is developed;

representing the times of distributing capacity developing units to be j when developing subtasks i for paths with developing cost not higher than gamma in the N paths;

step S405, according to the first probability transition matrix P 'and the initial transition matrix P, correcting by adopting a smoothing technology to obtain a second probability transition matrix P ″ - α. P' + (1- α). P, wherein α is a smoothing parameter;

step S406: reassigning the initial transition matrix P, and assigning a second probability transition matrix P' to the initial transition matrix P;

step S407: repeating steps S402 to S406 until for a given number of iterations d, a transition matrix P occurs for different initials, all resulting in a slave demand state S₀＝f₁First reaching the final state S_t＝f_nTime, development cost Z (X)_i) By fractional value

Until the end;

step S408: when the condition of the step S407 occurs, it is regarded as that an optimal selection policy occurs, and the current initial transfer matrix P is output as the optimal selection policy, so that the optimal resource combination of dynamic resources in the cloud manufacturing environment of the complex product can be obtained.

(III) advantageous effects

Compared with the prior art, the invention provides a dynamic resource optimization method based on a Markov decision process, breaks through the traditional manufacturing resource selection method, abstracts the problem that a plurality of development tasks accurately regulate and control cloud manufacturing resources in a cloud manufacturing environment into a Markov decision selection process, and realizes mathematical modeling of uncertainty of the development process on resource selection; the expected development cost is taken as a target function, a cross entropy method is adopted for calculation, the combination optimization problem is converted into a correlation random optimization problem, the optimal selection probability of cloud manufacturing resources is obtained, reasonable scheduling and efficient utilization of manufacturing resources in the collaborative development work of complex products are achieved, and the product development risk and the manufacturing cost are effectively reduced.

Drawings

And the graph l is a complex product development capability network.

Figure 2 is a schematic diagram of a dynamic resource selection decision for a markov decision process.

FIG. 3 is a schematic diagram of a comparison between development costs of an optimal selection strategy and a random selection strategy.

Detailed Description

In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.

To solve the above technical problem, the present invention provides a dynamic resource optimization method based on a markov decision process, as shown in fig. 1 to 3, which is implemented based on a dynamic resource optimization system, and the dynamic resource optimization system includes: the system comprises a complex product task decomposition module, a development capability network construction module, a dynamic resource selection decision module and a cross entropy solving module;

F

the step S2 includes the following sub-steps:

step S201: the enterprise manufacturing resources of cross-region enterprises under the cloud manufacturing environment are virtualized through an enterprise development capacity decomposer, and the enterprises are connectedThe industry development capability is uniformly expressed as an enterprise development capability unit c_ij＝{lov(c_ij),f_iJ }; wherein lov (c)_ij) For a certain subtask f_iFor enterprise j, the level of development ability to complete the task, and the size of the level reflects the expected level of completion of the development task;

step S202: on the basis of step S201, aiming at a certain subtask f_iRepeating step S201 several times according to the number of enterprises in the enterprise manufacturing resources across the region, and further establishing a virtual enterprise development capability resource pool cp (i) { c) } by the capability resource pool builder_i1,c_i2,,…,c_ij}，i＝1～n；

Step S203: according to the sequential relationship between the adjacent subtasks, the capability network builder establishes the sequential relationship between the enterprise development capability resource pools and the incidence relationship between each enterprise development capability unit in the two adjacent enterprise development capability resource pools (namely, the incidence relationship between one enterprise development capability unit in one enterprise development capability resource pool and any enterprise development capability unit in the adjacent enterprise development capability resource pools), thereby forming a dynamic complex product development capability network;

Step S303: for a certain time t, the corresponding subtask is subtask f_iThen its task state is S_t＝f_iTo determineDefining enterprise development capacity unit corresponding to the moment t

In other words, a development task of size of about

The development cost of (2);

step S305: setting S_t＝f_nIs the target state, i.e. the final state;

Probability set of

Namely a dynamic resource allocation strategy;

E_πrepresenting the expectation with respect to the probability density pi;

Step S403: will develop the expense Z (X)_i) Sorting from small to large：

Z(X_i)₍₁₎≤Z(X_i)₍₂₎≤…≤Z(X_i)_(N)Its quantile value rho is

representing the times of distributing capacity development units when developing the subtask i for the path with development cost not higher than gamma in the N paths;

Until the end;

Examples

In this embodiment, a task F is decomposed according to a complex product task decomposition model, and a task set is formed: f ═ F_i|i＝1,2,…,8}。

The cross-regional enterprise manufacturing resources in the cloud manufacturing environment are virtualized and serviced, and the enterprise development capacity unit is expressed as c_ij＝{lov(c_ij),f_iJ, where j is 1,2, … 5.

Establishing a virtual enterprise development capacity resource pool: CP (1) ═ c₁₁,c₁₂,c₁₃,c₁₄,c₁₅}，CP(2)＝{c₂₁,c₂₂,c₂₃,c₂₄,c₂₅},CP(3)＝{c₃₁,c₃₂,c₃₃,c₃₄,c₃₅},CP(4)＝{c₄₁,c₄₂,c₄₃,c₄₄,c₄₅},CP(5)＝{c₅₁,c₅₂,c₅₃,c₅₄,c₅₅},CP(6)＝{c₆₁,c₆₂,c₆₃,c₆₄,c₆₅},CP(7)＝{c₇₁,c₇₂,c₇₃,c₇₄,c₇₅},CP(8)＝{c₈₁,c₈₂,c₈₃,c₈₄,c₈₅}

For convenience of description, all enterprises in the cloud manufacturing environment are developed into a capability unit c_ijCorresponding development capability element rating lov (c)_ij) And the development cost is expressed by a set, and the table 1 shows a development capability unit level set LOV ═ LOV (c)_ij) I 8, j 5, table 2 is the development cost set C_S＝{C_s(c_ij) I 8, j 5, where lov(c_ij) And C_S(c_ij) Given by a preset value.

Table 1 the developed capability unit rating set LOV is:

TABLE 2 development cost set C_SComprises the following steps:

and allocating a candidate development capacity resource pool for each subtask to form a complex product development capacity network, as shown in fig. 1.

A schematic diagram of a Markov decision process based dynamic resource selection decision is shown in FIG. 2, where θ is_ijRepresenting development capability element c_ijSuccessfully completes the development task f_iThe probability of (d); 1-theta_ijIndicating incomplete development task f_iI.e. the product development failed at the current stage. Where probability θ and Productivity level lov (c)_ij) The relation of (A) is as follows: theta_ij＝lov(c_ij)/10. Table 3 shows task f₁～f₈Probability of development success θ i_jI.e. theta ═ theta_ij|i＝8,j＝5}。

TABLE 3 probability of success of development task θ_ijThe set of (a) is:

in this embodiment, a cross entropy algorithm is used to optimize the dynamic resource selection decision model.

Generating an initial transition matrix P such that the probability of each development capability unit of the transition matrix being selected is equal, i.e.

Let N be 5000, ρ be 0.16, α beAnd when d is 10, optimizing the dynamic resource selection strategy based on the Markov decision process by using a cross entropy method. The minimum development cost 2132 is obtained after iterative calculation for 46 times, wherein the optimal selection probability

Is composed of

Therefore, in this example, development subtask f₁～f₈The Markov decision process-based dynamic resource optimization selection strategy is c₁₄,c₂₂,c₃₃,c₄₄,c₅₃,c₆₄,c₇₁,c₈₅。

The optimization selection strategy and the random selection strategy are respectively subjected to simulation operation, the development cost of the two selection strategies after 10 times of operation is shown in fig. 3, the average development cost of the optimization selection strategy in 10 times of simulation is 3113, and the average development cost of the random selection strategy is 3610.9. Simulation results show that the average development cost of the optimal selection strategy is lower than the average cost of the random selection strategy. However, it should be noted that it cannot be guaranteed that the development cost per time in the simulation process is smaller than the random selection strategy in the simulation process, because whether the development capability unit can complete the task per time has randomness, which has a great influence on the development cost, and therefore, the influence of the randomness is reduced by comparing the average development cost.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A dynamic resource optimization method based on a Markov decision process is characterized by being implemented based on a dynamic resource optimization system, wherein the dynamic resource optimization system comprises the following steps: the system comprises a complex product task decomposition module, a development capability network construction module, a dynamic resource selection decision module and a cross entropy solving module;

step S1: decomposing the total task F through a complex product task decomposition module according to the performance requirement, the structure requirement and the precision requirement of the complex product to form n development subtasks, namely F ═ { F ═ F_i1,2, …, n, where f_iRepresenting the ith subtask in the development process;

the step S2 includes the following sub-steps:

step S202: on the basis of step S201, aiming at a certain subtask f_iRepeating step S201 several times according to the number of enterprises in the enterprise manufacturing resources across the region, and further establishing a virtual enterprise development capability resource pool cp (i) { c) } by the capability resource pool builder_i1,c_i2,…,c_ij}，i＝1～n；

In other words, a development task of size of about

The development cost of (2);

step S305: setting S_t＝f_nIs the target state, i.e. the final state;

Probability density of

Namely a dynamic resource allocation strategy pi;

step S308: set gamma at history H_tConditional from the demand state S₀＝f₁First reaching the final state S_t＝f_nThe development times of the enterprise development capacity units distributed in time are defined as a development capacity sequence by all enterprise development capacity units distributed from the time 0 to the time gamma' when the development times reach gamma

Step S309: the dynamic resource scheduling optimization configuration problem in the cloud manufacturing environment can be described as seeking a dynamic resource allocation strategy pi, wherein the strategy enables the development expectation cost Z (X) to be minimum; wherein the content of the first and second substances,

E_πexpressing the expectation of pi relative to the dynamic resource allocation strategy;

step S401: aiming at the dynamic resource allocation strategy pi, carrying out initialization operation to ensure that the probability of tasks allocated to each enterprise development capacity unit is the same, namely, the dynamic resource allocation strategy pi is characterized as an initial transfer matrix P with the same element values and the sum of the element values of each row being 1;

Step S403: will develop the expense Z (X)_i) Sorting from small to large:

Z(X_i)₍₁₎≤Z(X_i)₍₂₎≤…≤Z(X_i)_(N)then its rho.100% quantile value is

represented in N paths, with no more than development cost

When developing the subtask i, allocating the number of times of capability development units;

indicating that for N paths, the development cost is not higher than that

When developing the subtask i, allocating the number of times that the capacity development unit is j;

step S405, according to the first probability transition matrix P 'and the initial transition matrix P, correcting by adopting a smoothing technology to obtain a second probability transition matrix P ″ -, wherein α is a smoothing parameter, and the second probability transition matrix P ″ -, is α. P' + (1- α). P;

step S406: reassigning the initial transition matrix P, namely assigning the second probability transition matrix P' to the initial transition matrix P;

Until the end;