CN114186712A - Container loading and unloading intelligent method and system based on reinforcement learning - Google Patents

Container loading and unloading intelligent method and system based on reinforcement learning Download PDF

Info

Publication number
CN114186712A
CN114186712A CN202111284086.1A CN202111284086A CN114186712A CN 114186712 A CN114186712 A CN 114186712A CN 202111284086 A CN202111284086 A CN 202111284086A CN 114186712 A CN114186712 A CN 114186712A
Authority
CN
China
Prior art keywords
agent
reinforcement learning
intelligent
unloading
intelligent agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111284086.1A
Other languages
Chinese (zh)
Inventor
孔雨昕
陈志勇
史玉良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202111284086.1A priority Critical patent/CN114186712A/en
Publication of CN114186712A publication Critical patent/CN114186712A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a container loading and unloading intelligent method based on reinforcement learning, which comprises the following steps: acquiring container size data and a loading and unloading plan; acquiring initial states, parameters and allocation tasks of a plurality of agents; according to the allocation task, a decision result is obtained by using an agent reinforcement learning model; the intelligent agent comprises a storage yard intelligent agent, a loading and unloading equipment intelligent agent and a berthing intelligent agent, the decision result is obtained by utilizing an intelligent agent reinforcement learning model, the allocation task and the intelligent agent are divided, the intelligent agent reinforcement learning model is established, and the state space, the action space and the reward value of each intelligent agent in the environment are set. The container loading and unloading method based on the multi-agent reinforcement learning establishes the reinforcement learning model, automatically generates the container loading and unloading plan through the continuous iterative learning of the multi-agent system, replaces the old mode of manually making the loading and unloading plan by service personnel, realizes the container loading and unloading flow intelligentization based on the multi-agent reinforcement learning, and lays a foundation for the construction of an intelligent port.

Description

Container loading and unloading intelligent method and system based on reinforcement learning
Technical Field
The invention relates to the technical field of container loading and unloading intellectualization, in particular to a container loading and unloading intellectualization method and system based on reinforcement learning.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The early container handling problem is mainly solved by depending on the experience of port staff, the port staff is required to comprehensively consider the problems of a berth idle state, a handling equipment specification and an idle state, a yard idle state, a berth specification and the like, and a related plan is made, but along with the increase of the current port traffic and the expansion of the cargo handling scale, the manual making of the handling and yard plan by the port staff alone is difficult to meet the production requirement.
The existing research methods do not realize full-automatic planning and scheduling in the aspects of site allocation, wharf berth allocation, mechanical facility scheduling and the like, and do not closely combine shore bridge scheduling optimization, yard scheduling optimization and container loading and unloading scheduling optimization. Meanwhile, reinforcement learning is a popular field and is not actually applied to the production practice of the port industry. Therefore, there is a need for an intelligent container loading and unloading method and system based on reinforcement learning.
Disclosure of Invention
The invention provides a container loading and unloading intelligent method and system based on reinforcement learning, aiming at solving the problems.
According to some embodiments, the invention adopts the following technical scheme:
an intelligent container loading and unloading method based on reinforcement learning comprises the following steps:
acquiring container size data and a loading and unloading plan;
acquiring initial states, parameters and allocation tasks of a plurality of agents;
according to the allocation task, a decision result is obtained by using an agent reinforcement learning model;
the intelligent agent comprises a storage yard intelligent agent, a loading and unloading equipment intelligent agent and a berthing intelligent agent, the decision result is obtained by utilizing an intelligent agent reinforcement learning model, the allocation task and the intelligent agent are divided, the intelligent agent reinforcement learning model is established, and the state space, the action space and the reward value of each intelligent agent in the environment are set.
Furthermore, the initial state of the agent includes the use condition and berth specification of the berthing agent, the use condition, specification and type of the handling equipment agent, and the presence box condition of the storage yard agent.
And further, the intelligent agent obtains decision-making actions according to the distributed tasks and the self state, and calculates the stored decision-making actions and state characteristics after K steps of calculation.
Further, the intelligent agent uploads the formed state characteristics, decision actions and reward values to an experience pool after the decision actions in the step K to form experiences.
And further, after the experience pool is uploaded, aggregating the parameters of the intelligent agent, and returning an aggregation result to the intelligent agent for parameter updating.
And further, after the parameters are updated, sampling all experiences according to sampling probability, and the intelligent agent iteratively trains and optimizes the intelligent agent reinforcement learning model according to the sampling experiences until all tasks are completed.
Further, the aggregating the parameters of the agent comprises updating the formula by using the parameters of the agent and the cost function, and generating the TD-error.
An intelligent container handling system based on reinforcement learning, comprising:
the data acquisition module is configured to acquire the initial state, parameters and allocation tasks of the intelligent agent;
the reinforcement learning module is configured to obtain a decision result by utilizing an intelligent body reinforcement learning model according to the distribution task;
and the intelligent agent obtains a decision action according to the distribution target through a reinforcement learning model, and temporarily stores the decision action and the corresponding state characteristics.
A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute a reinforcement learning based container handling intelligence method.
A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium stores instructions adapted to be loaded by a processor and to perform the reinforcement learning based container handling intelligence method.
Compared with the prior art, the invention has the beneficial effects that:
the invention starts from the container loading and unloading operation process, deeply analyzes the linkage relation among the yard, the loading and unloading equipment and the berth, enables the intelligent realization of the container loading and unloading process to be possible, overcomes the defect that the existing container loading and unloading plan needs the manual formulation of wharf service personnel, generates the container loading and unloading plan and the yard plan as accurately as possible, and reduces the manual intervention to the maximum extent; based on the thought of multi-agent reinforcement learning, a reinforcement learning model is established, a container loading and unloading plan is automatically generated through continuous iterative learning of a multi-agent system, the old mode that business personnel manually make the loading and unloading plan is replaced, container loading and unloading flow intelligentization based on the multi-agent reinforcement learning is realized, and a foundation is laid for the construction of an intelligent port.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. FIG. 1 is a schematic flow chart of an intelligent container loading and unloading method based on multi-agent reinforcement learning according to the present invention;
FIG. 2 is a schematic diagram of interaction between agent i and the environment in the multi-agent reinforcement learning model of the present invention;
FIG. 3 is a schematic diagram of an algorithm flow of an intelligent container loading and unloading method based on multi-agent reinforcement learning according to the present invention;
FIG. 4 is a diagram illustrating reward trends for each agent in the multi-agent reinforcement learning model of the present invention;
FIG. 5 is a system modeling structure diagram of an intelligent container handling method based on multi-agent reinforcement learning according to the present invention;
FIG. 6 is a schematic view showing the analysis and comparison of the loading and unloading costs in the container loading and unloading process according to the present invention;
fig. 7 is a schematic view showing the analysis and comparison of the loading and unloading time in the container loading and unloading process of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the following figures and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example 1.
As shown in fig. 1, an intelligent container handling method based on reinforcement learning includes:
acquiring container size data and a loading and unloading plan;
acquiring initial states, parameters and allocation tasks of a plurality of agents;
according to the allocation task, a decision result is obtained by using an agent reinforcement learning model;
the intelligent agent comprises a storage yard intelligent agent, a loading and unloading equipment intelligent agent and a berthing intelligent agent, the decision result is obtained by utilizing an intelligent agent reinforcement learning model, the allocation task and the intelligent agent are divided, the intelligent agent reinforcement learning model is established, and the state space, the action space and the reward value of each intelligent agent in the environment are set.
The method specifically comprises the following steps:
1) dividing tasks and intelligent agents, establishing a reinforcement learning model for each intelligent agent, and setting state space, action space and reward value of the intelligent agent in the environment;
first, the container handling task is divided, the container handling plan is decomposed into a plurality of handling subtasks according to the container type, and the subtasks are sequentially executed.
In this example, the yard is divided by regions according to the order of berths; each region is divided into blocks according to the sequence of land survey measured by the sea; each block is further divided into "bei" blocks, corresponding to the BAY of the ship's box space. Each bay is 20GP container size and numbered sequentially, e.g., if the container size is 40GP, two adjacent bays are occupied.
Stacking rules are as follows: containers of the same type may exist in different shellfish places, and containers of different types are stored in different shellfish places, and each shellfish place can only store containers of the same type.
Second, the agent is partitioned. 3 agents in the container loading and unloading operation, namely a storage yard agent, a loading and unloading equipment agent and a berth agent, are numbered as 1, 2 and 3; if the state space is S, the action space is A and the reward value is R, a triple is generated
Figure BDA0003332187090000051
Wherein the content of the first and second substances,
Figure BDA0003332187090000052
respectively showing the state, action and reward of each agent i (i is more than or equal to 1 and less than or equal to 3) after the container n triggers the loading and unloading event at the time t of the e-th training round.
State space: that is, the specific characteristics observed by each agent at the current time, the state space of the berthing agent in this example includes: the position and the use condition of the berth and the berth specification; intelligent loading and unloading equipmentThe state space of (a) includes: specification, type, location and use of handling equipment; the state space of the yard agent includes: the status of the bye placement and the status of the presence box. In the present case, it is preferred that,
Figure BDA0003332187090000061
wherein p isi,pjRespectively representing basic information such as the specification and the position of the agent i,
Figure BDA0003332187090000062
indicating the current agent i usage at time instant e-th round t.
An action space: an action space is established for each agent for storing actions that each agent may generate. In the present case, it is preferred that,
Figure BDA0003332187090000063
the action decision of the intelligent agent after the container n triggers the loading and unloading operation starting event at the time t of the e-th round is shown, and the actually required actions include whether to execute the loading and unloading operation of the current container, and the specific loading and unloading equipment for executing the actions and the arrangement situation of the subsequent yard berths.
The reward value is as follows: the setting of the reward value R of the multi-agent model for realizing the intelligent establishment of the container loading and unloading process mainly considers three specific aspects: real-time reward of individual r1The reward of the current agent is fed back to the current environment in real time; second, global real-time award r2The reward that the current environment gives to all agents; ultimate reward r of system3I.e. the multi-agent system awards all agents globally after completing all tasks. Thus, the reward value for the agent is:
Figure BDA0003332187090000064
moreover, when the agent selects an action to obtain a real-time reward, the influence of the current action on future reward punishment obtaining conditions needs to be considered, so that the reward value of the agent at the moment t is represented as
Figure BDA0003332187090000065
Wherein, gamma represents a discount factor, and the larger gamma represents the more recent reward, the larger influence on the current is; smaller γ means that the reward has less influence on the current.
In this example, if the current status is
Figure BDA0003332187090000066
The next state is
Figure BDA0003332187090000067
The expected state is
Figure BDA0003332187090000068
Then
Figure BDA0003332187090000069
Figure BDA00033321870900000610
Such as if
Figure BDA00033321870900000611
Then r is1Positive, indicated as near target; such as if
Figure BDA00033321870900000612
Figure BDA00033321870900000613
Then r is1Negative, indicating a distance from the target. In particular, r1、r2Always satisfy r1≥r2
2) Acquiring initial states of all agents, and taking the initial states as initial states in a reinforcement learning model;
the current state of the intelligent body comprises the use condition and the berth specification of the berthing intelligent body, the use condition, the specification and the type of the loading and unloading equipment intelligent body and the situation of a box in the yard of the intelligent body. In this example, the initial state of agent i is obtainedAnd is recorded as
Figure BDA0003332187090000071
T representing the 1 st training round0And (3) the state of each agent i (i is more than or equal to 1 and less than or equal to 3) after the container n triggers the loading and unloading event.
3) And the intelligent agent decides the next action according to the distributed target and the current self state, interacts with the environment to obtain the next decision, and temporarily stores the generated state characteristics and decision actions. The concrete steps are step S3-1 to step S3-4.
Step S3-1, the agent i observes the environment to obtain the current state value
Figure BDA0003332187090000072
Step S3-2, agent i passes action
Figure BDA0003332187090000073
And observing the environment to obtain the state
Figure BDA0003332187090000074
Then new state is added
Figure BDA0003332187090000075
Sending the information to the agent i + 1;
step S3-3, the agent i obtains the real-time feedback reward value r given by the external environment1
Step S3-4, saving the state values of all agents from step S3-1 to S3-3
Figure BDA0003332187090000076
Next step status value
Figure BDA0003332187090000077
Motion value
Figure BDA0003332187090000078
And real-time feedback reward value r ═ r (r)1,r2,…,rn) And temporarily storing the experience.
4) After a period of time (K step), calculating the reward value of the experience stored in the experience temporary storage area to form state characteristics, decision actions and reward values, uploading the state characteristics, decision actions and reward values to an experience pool, and performing iterative optimization training of a subsequent agent:
5) and aggregating the parameters uploaded by all the agents, and returning the result to each agent for parameter updating. In this example, each agent uploads local parameters to the multi-agent system, then the multi-agent system updates the formula according to the cost function to generate a TD-error, which is favorable for learning of the agent and should be sampled more, and meanwhile, since the agent is in the process of continuous iterative optimization, the experience generated at an earlier time is unfavorable for iterative training of the following agent, and therefore the sampling probability of the newly generated experience should be increased.
6) Sampling all experiences according to the sampling probability, and iteratively training and optimizing the intelligent agent model by each intelligent agent according to the sampling experiences;
FIG. 4 is a reward trend graph obtained by each agent during iterative training. Green, yellow, blue are the performance of agent 1, agent 2, agent 3 in the model, respectively. Initially, the rewards earned by the agent were very low and very unstable due to random variation of the model parameters. Along with continuous iterative optimization, the performance of each agent is better and better, and the obtained rewards are more and stable step by step. 7) And repeating the steps 3) to 6) until all tasks are completed. The system modeling structure is shown in fig. 5.
The embodiment relies on a comparison experiment carried out in a certain port, and fig. 6 and 7 are schematic diagrams respectively showing the comparison of container loading and unloading cost and loading and unloading time based on a multi-agent reinforcement learning model, so that the method not only solves the problem that manpower is consumed in manual planning in the practical application process, but also can reduce the container loading and unloading cost to a certain extent. Because the intelligent system continuously carries out iterative learning, the container loading and unloading method based on multi-intelligent agent reinforcement learning can always find the best loading and unloading operation mode in the actual operation process, and not only the container moving cost, the transportation cost and the box reversing cost are reduced by shortening the container stacking time and optimizing the transportation path, but also the storage cost and the like are correspondingly reduced due to the shortening of the loading and unloading time.
Example 2.
An intelligent container handling system based on reinforcement learning, comprising:
the data acquisition module is configured to acquire the initial state, parameters and allocation tasks of the intelligent agent;
the reinforcement learning module is configured to obtain a decision result by utilizing an intelligent body reinforcement learning model according to the distribution task;
and the intelligent agent obtains a decision action according to the distribution target through a reinforcement learning model, and temporarily stores the decision action and the corresponding state characteristics.
Example 3.
A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute a reinforcement learning based container handling intelligence method.
Example 4.
A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; the computer readable storage medium stores instructions adapted to be loaded by a processor and to perform the reinforcement learning based container handling intelligence method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. A container loading and unloading intelligent method based on reinforcement learning is characterized by comprising the following steps:
acquiring container size data and a loading and unloading plan;
acquiring initial states, parameters and allocation tasks of a plurality of agents;
according to the allocation task, a decision result is obtained by using an agent reinforcement learning model;
the intelligent agent comprises a storage yard intelligent agent, a loading and unloading equipment intelligent agent and a berthing intelligent agent, the decision result is obtained by utilizing an intelligent agent reinforcement learning model, the allocation task and the intelligent agent are divided, the intelligent agent reinforcement learning model is established, and the state space, the action space and the reward value of each intelligent agent in the environment are set.
2. The container handling intelligent method based on reinforcement learning of claim 1, wherein the initial state of the agent comprises the usage and berthing specifications of a berthing agent, the usage and specifications and types of a handling equipment agent, and the presence of a container of a storage yard agent.
3. The container handling intelligent method based on reinforcement learning as claimed in claim 2, wherein the agent obtains decision-making actions according to the assigned tasks and its own status, and after K steps of calculation, calculates the stored decision-making actions and status features.
4. The container handling intelligent method based on reinforcement learning as claimed in claim 3, wherein the agent uploads the formed status characteristics, decision actions and reward values to the experience pool after K decision actions to form experience.
5. The intelligent container handling method based on reinforcement learning of claim 4, wherein after the experience pool is uploaded, the parameters of the intelligent agent are aggregated, and the aggregated result is returned to the intelligent agent for parameter updating.
6. The intelligent reinforcement learning-based container handling method according to claim 5, wherein after the parameters are updated, all experiences are sampled according to sampling probability, and the intelligent agent iteratively trains and optimizes the intelligent agent reinforcement learning model according to the sampling experiences until all tasks are completed.
7. The container handling intelligent method based on reinforcement learning of claim 6, wherein the aggregating the parameters of the agent comprises updating a formula with the parameters of the agent and a cost function to generate TD-error.
8. An intelligent container handling system based on reinforcement learning, comprising:
the data acquisition module is configured to acquire the initial state, parameters and allocation tasks of the intelligent agent;
the reinforcement learning module is configured to obtain a decision result by utilizing an intelligent body reinforcement learning model according to the distribution task;
and the intelligent agent obtains a decision action according to the distribution target through a reinforcement learning model, and temporarily stores the decision action and the corresponding state characteristics.
9. A computer-readable storage medium characterized by: stored with instructions adapted to be loaded by a processor of a terminal device and to perform a reinforcement learning based container handling intelligence method according to any of claims 1-7.
10. A terminal device characterized by: the system comprises a processor and a computer readable storage medium, wherein the processor is used for realizing instructions; a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform a reinforcement learning based container handling intelligence method of any of claims 1-7.
CN202111284086.1A 2021-11-01 2021-11-01 Container loading and unloading intelligent method and system based on reinforcement learning Pending CN114186712A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111284086.1A CN114186712A (en) 2021-11-01 2021-11-01 Container loading and unloading intelligent method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111284086.1A CN114186712A (en) 2021-11-01 2021-11-01 Container loading and unloading intelligent method and system based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN114186712A true CN114186712A (en) 2022-03-15

Family

ID=80540566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111284086.1A Pending CN114186712A (en) 2021-11-01 2021-11-01 Container loading and unloading intelligent method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN114186712A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465151A (en) * 2020-12-17 2021-03-09 电子科技大学长三角研究院(衢州) Multi-agent federal cooperation method based on deep reinforcement learning
CN112801290A (en) * 2021-02-26 2021-05-14 中国人民解放军陆军工程大学 Multi-agent deep reinforcement learning method, system and application
CN113128705A (en) * 2021-03-24 2021-07-16 北京科技大学顺德研究生院 Intelligent agent optimal strategy obtaining method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465151A (en) * 2020-12-17 2021-03-09 电子科技大学长三角研究院(衢州) Multi-agent federal cooperation method based on deep reinforcement learning
CN112801290A (en) * 2021-02-26 2021-05-14 中国人民解放军陆军工程大学 Multi-agent deep reinforcement learning method, system and application
CN113128705A (en) * 2021-03-24 2021-07-16 北京科技大学顺德研究生院 Intelligent agent optimal strategy obtaining method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
王刚: ""基于Multi-Agent的港口集装箱码头物流作业***集成优化研究"", 《物流技术》, vol. 26, no. 4, 30 April 2007 (2007-04-30), pages 3 *
高雪峰: ""基于深度强化学习的自动化集装箱码头双场桥动态调度研究"", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》, no. 02, 15 February 2021 (2021-02-15), pages 3 *
高雪峰: "基于深度强化学习的自动化集装箱码头双场桥动态调度研究", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》, no. 02, 15 February 2021 (2021-02-15), pages 3 *

Similar Documents

Publication Publication Date Title
Xiang et al. Reactive strategy for discrete berth allocation and quay crane assignment problems under uncertainty
Hu et al. A three-stage decomposition method for the joint vehicle dispatching and storage allocation problem in automated container terminals
CA2992053C (en) Online hierarchical ensemble of learners for activity time prediction in open pit mining
Zheng et al. A two-stage stochastic programming for single yard crane scheduling with uncertain release times of retrieval tasks
CN106651049B (en) Rescheduling method for automatic container terminal loading and unloading equipment
CN108932366B (en) Simulation intelligent scheduling method and system for unloading production of coal wharf
Wang et al. Tree based searching approaches for integrated vehicle dispatching and container allocation in a transshipment hub
Caballini et al. An event-triggered receding-horizon scheme for planning rail operations in maritime terminals
Misir et al. A selection hyper-heuristic for scheduling deliveries of ready-mixed concrete
Hartmann Scheduling reefer mechanics at container terminals
WO2016118122A1 (en) Optimization of truck assignments in a mine using simulation
CN106651280A (en) Container ship logistics transportation scheduling method and system
Maione et al. A generalized stochastic Petri net approach for modeling activities of human operators in intermodal container terminals
Verma et al. A reinforcement learning framework for container selection and ship load sequencing in ports
Zhang et al. Vehicle dynamic dispatching using curriculum-driven reinforcement learning
CN114186712A (en) Container loading and unloading intelligent method and system based on reinforcement learning
Jin et al. Container port truck dispatching optimization using Real2Sim based deep reinforcement learning
US20230252395A1 (en) A quay crane operation method
Rida Modeling and optimization of decision-making process during loading and unloading operations at container port
JP4209109B2 (en) Container terminal operation optimization system
CN115293443B (en) Bridge crane and container ship loading and unloading operation time prediction method, system and medium
KR20100048004A (en) An integrated scheduling method for different types of equipment in automated container terminals
CN115545369A (en) Automated quayside container bridge resource planning decision-making method, terminal and medium
Wu et al. Integrated proactive-reactive approach and a hybrid adaptive large neighborhood search algorithm for berth and quay crane scheduling under uncertain combination.
Liu et al. Fuzzy optimization of storage space allocation in a container terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination