CN113077188A - MTO enterprise order accepting method based on average reward reinforcement learning - Google Patents

MTO enterprise order accepting method based on average reward reinforcement learning Download PDF

Info

Publication number
CN113077188A
CN113077188A CN202110468897.0A CN202110468897A CN113077188A CN 113077188 A CN113077188 A CN 113077188A CN 202110468897 A CN202110468897 A CN 202110468897A CN 113077188 A CN113077188 A CN 113077188A
Authority
CN
China
Prior art keywords
order
enterprise
mto
reinforcement learning
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110468897.0A
Other languages
Chinese (zh)
Other versions
CN113077188B (en
Inventor
吴克宇
钱静
陈超
刘忠
黄金才
程光权
胡星辰
杜航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110468897.0A priority Critical patent/CN113077188B/en
Publication of CN113077188A publication Critical patent/CN113077188A/en
Application granted granted Critical
Publication of CN113077188B publication Critical patent/CN113077188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06316Sequencing of tasks or work
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Accounting & Taxation (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Factory Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an MTO enterprise order receiving method based on average reward reinforcement learning, which comprises the following steps of: assuming order information, determining a system state set, determining a system action set, determining an immediate return function, constructing an order receiving model and solving the order receiving model; on the basis of factors considered by the traditional MTO enterprise order receiving problem, the invention increases order inventory cost and various customer priority factors, constructs an order receiving model in the semi-Markov decision process, uses the SMART algorithm to solve, and on the basis, uses the greedy algorithm to sequence and produce the received orders so as to maximize the long-term average income of the enterprise.

Description

MTO enterprise order accepting method based on average reward reinforcement learning
Technical Field
The invention relates to the technical field of enterprise order acceptance selection, in particular to an MTO enterprise order acceptance method based on average reward reinforcement learning.
Background
The MTO enterprise refers to an enterprise which is produced by the enterprise according to a client order, different clients have different requirements on the type of the order, the MTO enterprise organizes and produces the order according to the order requirements put forward by the clients, under the normal condition, the capacity of the enterprise is limited, and the enterprise cannot accept the orders of all clients due to the limitation of various cost factors, so that the MTO enterprise is required to make a corresponding order accepting method, the success of one MTO enterprise depends on the selectivity of the order accepting method to a great extent, and a good order accepting method plays a great role in the long-term profit of the enterprise;
from the existing research, some achievements have been obtained by a decision method related to order acceptance problems, but with the rapid development of electronic commerce, the personalized requirements of consumers become more and more obvious, traditional production enterprises usually do not directly contact terminal customers during product production, when the requirements of customers are diversified, the requirements are difficult to meet, and the existing order acceptance methods are not comprehensive in factors considered in the modeling process, so that order acceptance strategies cannot be effectively determined according to the production capacity and the order states of the enterprises.
Disclosure of Invention
Aiming at the problems, the invention aims to provide an MTO enterprise order receiving method based on average reward reinforcement learning, which increases order inventory cost and various customer priority factors on the basis of the factors considered by the order receiving problem of the traditional MTO enterprise, constructs an order receiving model in the semi-Markov decision process, uses the SMART algorithm to solve, and uses the greedy algorithm to perform sequencing production on the received orders on the basis so as to maximize the long-term average income of the enterprise.
In order to achieve the purpose of the invention, the invention is realized by the following technical scheme: an MTO enterprise order accepting method based on average reward reinforcement learning comprises the following steps:
the method comprises the following steps: assumption of order information
Supposing that an MTO enterprise produces through a single production line and n types of customer orders exist on the market, the order information comprises customer priority mu, price p, quantity Q, unit product production cost c, lead time LT and latest delivery time DT;
step two: determining a set of system states
According to step one, if there are n order types in the system, the system state can be represented by vector S: (μ, p, Q, LT, DT, T), where T represents the production time still required for an order that has been accepted before the decision phase;
step three: determining a set of system actions
According to step one, when a customer order arrives, a decision to accept and reject the order needs to be made, and the set of actions in the model can be represented by the vector a ═ (a)1,a2) Is shown in the specification, wherein a1Indicating acceptance of an order, a2Indicating a rejection of the order;
step four: determining an immediate reward function
After the MTO enterprise makes a decision whether to accept an order, the obtained immediate return function is as follows:
Figure BDA0003044541660000021
in the formula, I ═ p × Q represents the profit for the order, C ═ C × Q represents the production cost consumed, Y represents the deferred penalty cost for the enterprise, N represents the cost for producing the inventory cost, and J represents the rejection cost for the order;
step five: building order acceptance model
Constructing an order receiving model in a half Markov decision process according to a system state set, a system action set and an immediate return function, simulating a real MTO enterprise order receiving problem based on an average reward reinforcement learning idea, wherein according to a Bellman optimal theorem, a corresponding optimal strategy in the half Markov decision process problem is as follows:
Figure BDA0003044541660000031
wherein
Figure BDA0003044541660000032
Figure BDA0003044541660000033
Figure BDA0003044541660000034
Define the average reward, t, achieved during decision period mmRepresenting the time at which decision period m transitions from state s to state s';
step six: order acceptance model solution
The method comprises the following steps of adopting reinforcement learning average reward as an evaluation target, solving an order accepting model in a half Markov decision process through an average reward reinforcement learning SMART algorithm, sequencing orders in the SMART algorithm by using a greedy algorithm to obtain an optimal order accepting decision, wherein an updating formula of the average reward reinforcement learning SMART algorithm is as follows:
Figure BDA0003044541660000035
where α represents the learning rate, m represents the current iteration index, rm(s, a, s') represents the immediate reward obtained after taking action a in state s, tm(s, a, s ') denotes the time for transition from state s to s', RmRepresents the cumulative return, p, for the mth decision periodmRepresents the average return, t, of the mth decision periodmRepresenting the cumulative time of the mth decision period.
The further improvement lies in that: in the first step, the order of the customer achieves the poisson distribution with the obedience parameter of lambda, and the price and the required quantity of the order are evenly distributed.
The further improvement lies in that: in the second step, based on the MTO enterprise with limited energy production, if T has the maximum upper limit value and n order types, the state set S of the system has n × T states.
The further improvement lies in that: in the fourth step, the three equations of r (s, a) represent the equation when Q (s, a) is from top to bottom1)>Q(s,a2) When, and at the current state, an order can be inserted into the current production plan, an immediate return is made equal to the net profit obtained by accepting the order, when Q (s, a)1)>Q(s,a2) When, but the order cannot be inserted into the current production plan in the current state, an order net profit equal to the loss is immediately returned, when Q (s, a)1)<Q(s,a2) The immediate return equals the rejection cost.
The further improvement lies in that: in the fourth step, the postponing penalty cost Y ═ μ × u { (T + Q/b) -LT }, where u denotes the postponing penalty cost per unit time and b denotes the unit production capacity of the enterprise.
The further improvement lies in that: in the fourth step, the product produced by the customer before the lead period is not taken in advance, so that the inventory cost N ═ Q × h { LT- (T + Q/b) } generated by temporarily storing the product in the MTO enterprise warehouse is caused, wherein h represents the unit product storage cost per unit time.
The further improvement lies in that: in the sixth step, the exploratory probability e which is reduced along with the increase of the simulation iteration number is adopted to ensure the convergence of the SMART algorithm of the average reward reinforcement learning, and alpha and e are attenuated according to a DCM scheme:
Figure BDA0003044541660000051
Figure BDA0003044541660000052
where χ represents an arbitrarily large real number.
The invention has the beneficial effects that: on the basis of factors considered by the traditional MTO enterprise order receiving problem, the invention increases order inventory cost and various customer priority factors, constructs an order receiving model in the semi-Markov decision process, uses the SMART algorithm to solve, and on the basis, uses the greedy algorithm to sequence and produce the received orders so as to maximize the long-term average income of the enterprise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of an order acceptance method of the present invention;
FIG. 2 is a diagram of a reinforcement learning order decision interaction of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," "fourth," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Referring to fig. 1 and 2, the embodiment provides an MTO enterprise order acceptance method based on average reward reinforcement learning, including the following steps:
the method comprises the following steps: assumption of order information
Supposing that an MTO enterprise produces by a single production line and n types of customer orders exist in the market, wherein order information comprises customer priority mu, price p, quantity Q, unit product production cost c, lead time LT and latest delivery time DT, the customer orders achieve Poisson distribution with compliance parameter lambda, and the price and the required quantity of the orders are uniformly distributed;
step two: determining a set of system states
According to step one, if there are n order types in the system, the system state can be represented by vector S: (mu, p, Q, LT, DT, T), where T represents the production time still needed for an accepted order before the decision phase, and T has the maximum upper limit value and n order types based on the limited-capacity MTO enterprise, then the state set S of the system has n × T states in total;
step three: determining a set of system actions
According to step one, when a customer order arrives, a decision to accept and reject the order needs to be made, and the set of actions in the model can be represented by the vector a ═ (a)1,a2) Is shown in the specification, wherein a1Indicating acceptance of an order, a2Indicating a rejection of the order;
step four: determining an immediate reward function
After the MTO enterprise makes a decision whether to accept an order, the obtained immediate return function is as follows:
Figure BDA0003044541660000071
in the formula, I ═ p × Q denotes the profit to be obtained from the order, C ═ C × Q denotes the production cost of consumption, Y denotes the deferred penalty cost of the enterprise, N denotes the cost of producing inventory, J denotes the rejection cost of the order, and the three equations of r (s, a) denote the cost of rejection when Q (s, a) is respectively expressed from top to bottom1)>Q(s,a2) When, and at the current state, an order can be inserted into the current production plan, an immediate return is made equal to the net profit obtained by accepting the order, when Q (s, a)1)>Q(s,a2) When, but the order cannot be inserted into the current production plan in the current state, an order net profit equal to the loss is immediately returned, when Q (s, a)1)<Q(s,a2) When the product is returned immediately, the immediate return is equal to rejection cost, the delay penalty cost Y of the enterprise is mu u { (T + Q/b) -LT }, wherein u represents the delay penalty cost per unit time, b represents the unit production capacity of the enterprise, and the product produced by the customer before the lead time is not taken in advance, so that the inventory cost N of the product temporarily stored in the MTO enterprise warehouse is Q h { LT- (T + Q/b) }, wherein h represents the unit product storage cost per unit time;
step five: building order acceptance model
Constructing an order receiving model in a half Markov decision process according to a system state set, a system action set and an immediate return function, simulating a real MTO enterprise order receiving problem based on an average reward reinforcement learning idea, wherein according to a Bellman optimal theorem, a corresponding optimal strategy in the half Markov decision process problem is as follows:
Figure BDA0003044541660000081
wherein
Figure BDA0003044541660000082
Figure BDA0003044541660000083
Figure BDA0003044541660000084
Represents the average return, t, achieved during decision period mmRepresenting the time at which decision period m transitions from state s to state s';
step six: order acceptance model solution
The method comprises the following steps of adopting reinforcement learning average reward as an evaluation target, solving an order accepting model in a half Markov decision process through an average reward reinforcement learning SMART algorithm, sequencing orders in the SMART algorithm by using a greedy algorithm to obtain an optimal order accepting decision, wherein an updating formula of the average reward reinforcement learning SMART algorithm is as follows:
Figure BDA0003044541660000085
where α represents the learning rate, m represents the current iteration index, rm(s, a, s') represents the immediate reward obtained after taking action a in state s, tm(s, a, s ') denotes the time for transition from state s to s', RmRepresents the cumulative return, p, for the mth decision periodmRepresents the average reward of the mth decision period,tmRepresenting the cumulative time of the mth decision period, the convergence of the average reward reinforcement learning SMART algorithm is guaranteed with a heuristic probability e that decreases with increasing number of simulation iterations, and α and e decay according to the DCM scheme:
Figure BDA0003044541660000091
Figure BDA0003044541660000092
where χ represents an arbitrarily large real number.
The SMART algorithm flow is as follows:
1. initializing m, Qm(s,a)、tm、rm、ρmIs 0, e-0.2 alpha-0.1, and order _ list 2]
2.While m<Maxsteps do
3. Calculate e according to DCM mechanismmAnd alpham
4. Randomly generating a number erandomIf em < erandomSelecting the action a with the largest state-action cost function if em>erandomThen randomly select action a in the action set
5. If a is a1,Q(s,a1)>Q(s,a2) When the order can be inserted into the current production plan in the current state, R-C-mu Y-N, and the order is added into the to-be-produced list order _ list; if a is a1,Q(s,a1)>Q(s,a2) And cannot be inserted into the current production plan in the current state, R ═ R-C- μ x Y-N; if a is a2,Q(s,a1)<Q(s,a2),r=-μ*J
6. Executing action a to obtain the next stage state s', rm(s,a,s′),tm(s,a,s′)
7. Updating state-action cost functions
Figure BDA0003044541660000101
8. If no search is taken, t is updatedm←tm+tm(s,a,s′),Rm+1←Rm+rm(s,a,s′),pm+1←Rm+1/tm+1Otherwise tm+1←tm,Rm+1←Rm,ρm+1←ρm
9. When the order is produced, selecting the order to be produced at the next moment in the order _ list by using a greedy algorithm, and deleting the selected order from the order _ list of the queue to be produced
10. Updating the decision stage m +1
The MTO enterprise order receiving method based on the average reward reinforcement learning increases order inventory cost and various customer priority factors on the basis of factors considered by the traditional MTO enterprise order receiving problem, constructs an order receiving model in a semi-Markov decision process, solves the order by using a SMART algorithm, and performs sequencing production on the received orders by using a greedy algorithm on the basis to maximize the long-term average income of an enterprise, so that the MTO enterprise order receiving method has high order receiving and selecting capability and good adaptability to environmental changes, can balance the profit orders and various costs to bring higher income for the MT0 enterprise, can also meet the personalized demand of a customer, and keeps close connection with the customer.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. An MTO enterprise order receiving method based on average reward reinforcement learning is characterized in that: the method comprises the following steps:
the method comprises the following steps: assumption of order information
Supposing that an MTO enterprise produces through a single production line and n types of customer orders exist on the market, the order information comprises customer priority mu, price p, quantity Q, unit product production cost c, lead time LT and latest delivery time DT;
step two: determining a set of system states
According to step one, if there are n order types in the system, the system state can be represented by vector S: (μ, p, Q, LT, DT, T), where T represents the production time still required for an order that has been accepted before the decision phase;
step three: determining a set of system actions
According to step one, when a customer order arrives, a decision to accept and reject the order needs to be made, and the set of actions in the model can be represented by the vector a ═ (a)1,a2) Is shown in the specification, wherein a1Indicating acceptance of an order, a2Indicating a rejection of the order;
step four: determining an immediate reward function
After the MTO enterprise makes a decision whether to accept an order, the obtained immediate return function is as follows:
Figure FDA0003044541650000011
in the formula, I ═ p × Q represents the profit for the order, C ═ C × Q represents the production cost consumed, Y represents the deferred penalty cost for the enterprise, N represents the cost for producing the inventory cost, and J represents the rejection cost for the order;
step five: building order acceptance model
Constructing an order receiving model in a half Markov decision process according to a system state set, a system action set and an immediate return function, simulating a real MTO enterprise order receiving problem based on an average reward reinforcement learning idea, wherein according to a Bellman optimal theorem, a corresponding optimal strategy in the half Markov decision process problem is as follows:
Figure FDA0003044541650000021
wherein
Figure FDA0003044541650000022
Figure FDA0003044541650000023
Figure FDA0003044541650000024
Represents the average return, t, achieved during decision period mmRepresenting the time at which decision period m transitions from state s to state s';
step six: order acceptance model solution
The method comprises the following steps of adopting reinforcement learning average reward as an evaluation target, solving an order accepting model in a half Markov decision process through an average reward reinforcement learning SMART algorithm, sequencing orders in the SMART algorithm by using a greedy algorithm to obtain an optimal order accepting decision, wherein an updating formula of the average reward reinforcement learning SMART algorithm is as follows:
Figure FDA0003044541650000025
where α represents the learning rate, m represents the current iteration index, rm(s, a, s') represents the immediate reward obtained after taking action a in state s, tm(s, a, s ') denotes the time for transition from state s to s', RmRepresents the cumulative return, p, for the mth decision periodmRepresents the average return, t, of the mth decision periodmRepresenting the cumulative time of the mth decision period.
2. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the first step, the order of the customer achieves the poisson distribution with the obedience parameter of lambda, and the price and the required quantity of the order are evenly distributed.
3. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the second step, based on the MTO enterprise with limited energy production, if T has the maximum upper limit value and n order types, the state set S of the system has n × T states.
4. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the fourth step, the three equations of r (s, a) represent the equation when Q (s, a) is from top to bottom1)>Q(s,a2) When, and at the current state, an order can be inserted into the current production plan, an immediate return is made equal to the net profit obtained by accepting the order, when Q (s, a)1)>Q(s,a2) When, but the order cannot be inserted into the current production plan in the current state, an order net profit equal to the loss is immediately returned, when Q (s, a)1)<Q(s,a2) The immediate return equals the rejection cost.
5. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the fourth step, the postponing penalty cost Y ═ μ × u { (T + Q/b) -LT }, where u denotes the postponing penalty cost per unit time and b denotes the unit production capacity of the enterprise.
6. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the fourth step, the product produced by the customer before the lead period is not taken in advance, so that the inventory cost N ═ Q × h { LT- (T + Q/b) } generated by temporarily storing the product in the MTO enterprise warehouse is caused, wherein h represents the unit product storage cost per unit time.
7. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the sixth step, the exploratory probability e which is reduced along with the increase of the simulation iteration number is adopted to ensure the convergence of the SMART algorithm of the average reward reinforcement learning, and alpha and e are attenuated according to a DCM scheme:
Figure FDA0003044541650000041
Figure FDA0003044541650000042
where χ represents an arbitrarily large real number.
CN202110468897.0A 2021-04-28 2021-04-28 MTO enterprise order accepting method based on average reward reinforcement learning Active CN113077188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110468897.0A CN113077188B (en) 2021-04-28 2021-04-28 MTO enterprise order accepting method based on average reward reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110468897.0A CN113077188B (en) 2021-04-28 2021-04-28 MTO enterprise order accepting method based on average reward reinforcement learning

Publications (2)

Publication Number Publication Date
CN113077188A true CN113077188A (en) 2021-07-06
CN113077188B CN113077188B (en) 2022-11-08

Family

ID=76619029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110468897.0A Active CN113077188B (en) 2021-04-28 2021-04-28 MTO enterprise order accepting method based on average reward reinforcement learning

Country Status (1)

Country Link
CN (1) CN113077188B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246950A (en) * 2012-10-30 2013-08-14 中国科学院沈阳自动化研究所 Method for promising order of semiconductor assembly and test enterprise
CN103927628A (en) * 2011-08-16 2014-07-16 上海交通大学 Order management system and order management method oriented to customer commitments
CN110517002A (en) * 2019-08-29 2019-11-29 烟台大学 Production control method based on intensified learning
CN111080408A (en) * 2019-12-06 2020-04-28 广东工业大学 Order information processing method based on deep reinforcement learning
CN111126905A (en) * 2019-12-16 2020-05-08 武汉理工大学 Casting enterprise raw material inventory management control method based on Markov decision theory

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927628A (en) * 2011-08-16 2014-07-16 上海交通大学 Order management system and order management method oriented to customer commitments
CN103246950A (en) * 2012-10-30 2013-08-14 中国科学院沈阳自动化研究所 Method for promising order of semiconductor assembly and test enterprise
CN110517002A (en) * 2019-08-29 2019-11-29 烟台大学 Production control method based on intensified learning
CN111080408A (en) * 2019-12-06 2020-04-28 广东工业大学 Order information processing method based on deep reinforcement learning
CN111126905A (en) * 2019-12-16 2020-05-08 武汉理工大学 Casting enterprise raw material inventory management control method based on Markov decision theory

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王晓欢等: "基于强化学习的订单生产型企业的订单接受策略", 《***工程理论与实践》 *
郝鹃等: "基于平均强化学习的订单生产方式企业订单接受策略", 《计算机应用》 *

Also Published As

Publication number Publication date
CN113077188B (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN109636011A (en) A kind of multishift operation plan scheduling method based on improved change neighborhood genetic algorithm
US10628791B2 (en) System and method of simultaneous computation of optimal order point and optimal order quantity
CN109816315A (en) Path planning method and device, electronic equipment and readable storage medium
CN108550090A (en) A kind of processing method and system of determining source of houses pricing information
WO2018161908A1 (en) Product object processing method and device, storage medium and electronic device
CN110555578B (en) Sales prediction method and device
CN110046761A (en) A kind of ethyl alcohol inventory&#39;s Replenishment Policy based on multi-objective particle
CN110310057A (en) Kinds of goods sequence and goods yard processing method, device, equipment and its storage medium
CN109961198A (en) Related information generation method and device
CN109741083A (en) A kind of material requirement weight predicting method based on enterprise MRP
CN116207739B (en) Optimal scheduling method and device for power distribution network, computer equipment and storage medium
KR20230070779A (en) Demand response management method for discrete industrial manufacturing system based on constrained reinforcement learning
CN115334106A (en) Microgrid transaction consensus method and system based on Q method and power grid detection and evaluation
CN113077188B (en) MTO enterprise order accepting method based on average reward reinforcement learning
CN113592240A (en) Order processing method and system for MTO enterprise
CN117113608A (en) Cold-chain logistics network node layout method and equipment
CN110533485A (en) A kind of method, apparatus of object select, storage medium and electronic equipment
CN102542432B (en) Inventory management system and method
CN110047001A (en) A kind of futures data artificial intelligence analysis method and system
CN110210885A (en) Excavate method, apparatus, equipment and the readable storage medium storing program for executing of potential customers
CN110414875A (en) Capacity data processing method, device, electronic equipment and computer-readable medium
CN114677183A (en) New product sales prediction method and device, computer equipment and storage medium
CN112579721A (en) Method and system for constructing crowd distribution map, terminal device and storage medium
CN113127167A (en) Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm
CN112950033A (en) Reservoir dispatching decision method and system based on reservoir dispatching rule synthesis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant