CN114296440A

CN114296440A - AGV real-time scheduling method integrating online learning

Info

Publication number: CN114296440A
Application number: CN202111158873.1A
Authority: CN
Inventors: 吕玉江; 王延忠; 陈燕燕; 姚依铭; 王姝濛
Original assignee: Beijing Great Wall Aviation Measurement And Control Technology Research Institute Co ltd; Beijing Ruisai Chang Cheng Aeronautical M & C Technology Co ltd; China Aviation Industry Corp of Beijing Institute of Measurement and Control Technology
Current assignee: Beijing Great Wall Aviation Measurement And Control Technology Research Institute Co ltd; Beijing Ruisai Chang Cheng Aeronautical M & C Technology Co ltd; China Aviation Industry Corp of Beijing Institute of Measurement and Control Technology
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-04-08
Anticipated expiration: 2041-09-30
Also published as: CN114296440B

Abstract

The invention belongs to the technical field of aviation intelligent manufacturing, and discloses an AGV real-time scheduling method integrating online learning. The method comprises the following steps: s1, acquiring position information, environment information and path planning information of the on-site AGV; s2, matching the most suitable trolley by using a genetic algorithm to complete a distribution task according to the information obtained in the S1; s3, adopting the most suitable trolley matched in the S2 to execute a transportation task; in the process of executing the transportation task, planning a shortest path for the trolley by adopting a reinforcement learning algorithm; and S4, the trolley completes the transportation task according to the shortest path planned in the S3. The genetic algorithm and the reinforcement learning system are combined, the genetic algorithm is used for solving the problem of dispatching the AGV, the reinforcement learning system is used for solving the problems of AGV avoidance and path planning in the dispatching process, and tasks are reasonably distributed to the AGV. The method reduces the complexity of the algorithm, improves the working efficiency of the AGV and can bring greater economic benefits.

Description

AGV real-time scheduling method integrating online learning

Technical Field

The invention belongs to the technical field of aviation intelligent manufacturing, and particularly relates to an AGV real-time scheduling method integrating online learning.

Background

At present, domestic intelligent logistics becomes a key link for realizing intelligent manufacturing, the AGV bears the carrying work of parts in a logistics system, and corresponding parts are timely sent to a proper place according to the requirements of the system. Therefore, the improvement of the working efficiency of the AGV trolley has great significance to the development of companies. Compared with foreign countries, the domestic logistics demand is larger, and the research work of real-time regulation and control is very important.

At present, domestic AGV are greatly influenced by an algorithm in the aspect of real-time scheduling, and complex space environment influences the scheduling efficiency of AGV. If the calculation speed is increased by improving hardware facilities of the AGV trolley, huge cost is needed, the calculation speed is not easy to realize, but the current algorithm is optimized, the working efficiency of the AGV can be greatly improved, the optimal distribution arrangement is realized in the distribution process, and the method is suitable for popularization.

Disclosure of Invention

Aiming at the problems, the invention aims to provide an AGV real-time scheduling method integrating online learning, which combines a genetic algorithm and a reinforcement learning system, solves the scheduling problem of an AGV by using the genetic algorithm, solves the problems of AGV avoidance and path planning by using the reinforcement learning system in the scheduling process, and reasonably distributes tasks to the AGV. The method reduces the complexity of the algorithm, improves the working efficiency of the AGV and can bring greater economic benefits.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

an AGV real-time scheduling method integrating online learning, the method comprising:

s1, acquiring position information, environment information and path planning information of the on-site AGV;

s2, matching the most suitable trolley by using a genetic algorithm according to the information acquired in the S1 to complete the distribution task;

s3, adopting the most suitable trolley matched in the S2 to execute a transportation task; in the process of executing the transportation task, planning a shortest path for the trolley by adopting a reinforcement learning algorithm;

and S4, the trolley completes the transportation task according to the shortest path planned in the S3.

The technical scheme of the invention has the characteristics and further improvements that:

(1) and S4, in the process of completing the transportation task, the trolley uses an improved artificial potential field method to realize autonomous obstacle avoidance.

(2) The position information, the environment information and the path planning information of the on-site AGV in the S1 are specifically as follows: the position information of the AGV car refers to a random initial position set for the AGV car;

the environment information refers to the random initial positions of a plurality of obstacles in the area where the AGV trolley can reach and the random initial positions of a plurality of workpieces needing to be transported;

the path planning information refers to a three-point initial path formed by the current position of the AGV trolley, the position of the workpiece to be transported and the destination position to which the workpiece is to be transported.

(3) In S2, in the process of matching the most suitable car by using the genetic algorithm, the design objective function is: f is min_1≤k≤M{max_1≤i≤N{L_ik}}；

Wherein, f is an objective function, which represents the longest time for completing the transportation of a certain workpiece in each transportation scheme, and the transportation scheme corresponding to the minimum value of the longest time is selected as the optimal transportation scheme in various transportation schemes; (ii) a

L is the workpiece delivery time, L_ikIndicating the time for completing the delivery of the workpiece i on the AGV trolley k; i is a delivery target point, i is 1, 2, …, N; n is the total number of the workpieces distributed for generation; k is the kth AGV cart, k is 1, 2, …, M; m is AGV smallThe total number of vehicles.

(4) In S3, in the process of executing the transportation task, a process of planning the shortest path for the car using a reinforcement learning algorithm specifically includes:

s31, building a grid map with a preset size as a two-dimensional simulation environment;

s32, setting the motion of the AGV trolley in the environment established in the two-dimensional grid map;

s33, representing the instant reward function by a nonlinear piecewise function, representing by a scalar R, and feeding back the current state and reward value when the AGV encounters different objects through the design of the reward function so as to change the behavior of the AGV;

s34, initializing an environment state and starting an exploration environment, starting from a starting point, acquiring a Q value corresponding to the current state by the trolley, finding out an action corresponding to the Q value through greedy decision, and recording the coordinate of the current state; judging whether the currently selected action is favorable or not through the advantage function, if the action obtains a positive reward value, executing the action, transferring to the next state to obtain the reward value, and storing the reward value in a sample playback cache region;

and S35, evaluating whether the current strategy is the optimal strategy or the optimal path according to the average reward value of the scene number.

(5) In S32, the setting of the motion of the AGV in the environment established in the two-dimensional grid map is specifically:

defining an action space model of the AGV trolley as four discrete actions, namely A is [0, 1; 0. -1; -1, 0; 1,0], the AGV car is treated as a particle, represented by a circle, and the target points are represented by boxes.

(6) In the step S34, the first step,

when the trolley is in the exploration state, if the current state coordinate has an obstacle, the reward value is-1; if no obstacle exists on the current state coordinate, the reward value is returned to be 1, and the next state is entered; if the current state is the target point, the reward value is returned to be 2, and a final path is planned;

(7) in the S4, in the process of implementing autonomous obstacle avoidance by using the artificial potential field method, the artificial potential field method specifically includes:

virtual potential fields are introduced among the AGV trolleys and between the AGV trolleys and surrounding obstacles;

when the distance between the AGV trolleys or the distance between the AGV trolleys and the obstacle is smaller than the expected distance, the AGV trolleys are far away;

when the distance between the AGV carts is greater than the desired distance, the AGV carts are brought closer together by the attractive force.

(8) The method for realizing autonomous obstacle avoidance by using the artificial potential field method specifically comprises the following steps:

regarding each AGV as a mass point, wherein the communication radius is r, the expected distance between the AGV is d alpha, and the expected distance between the AGV and the obstacle is d beta; the force borne by the AGV is the resultant force of the forces from other AGV trolleys within the communication radius and the repulsive force of the obstacle, and when the resultant force is 0, the AGV reaches a balanced state;

when all AGV dollies can both be in balanced state, just realized independently keeping away the barrier.

Compared with the existing algorithm, the invention has the advantages that: in the invention, reinforcement learning is adopted as a learning method from an environment state to behavior mapping, can be used for an uncertain environment, and can automatically adapt to the change of the environment; according to the method, the state can be updated in real time by using a Q-learning algorithm in reinforcement learning, so that an intelligent body can perform model optimization according to the current state in real time; the multiple AGV scheduling uses a genetic algorithm, so that the optimization problem of a complex structure can be solved, and the searching performance of the algorithm is not limited by the performance of a function; according to the method, the cloud edge cooperation technology is used for realizing the quantitative evaluation of the AGV car on-site operation data and the shortest path and time of logistics transportation.

Drawings

Fig. 1 is a schematic flowchart of an AGV real-time scheduling method incorporating online learning according to an embodiment of the present invention;

FIG. 2 is an environmental two-dimensional map of the present invention;

FIG. 3 is a diagram of the AGV traveling trajectory according to the present invention;

FIG. 4 is a block diagram of the Q-learning algorithm of the present invention;

fig. 5 is a table showing the planning time of the cart path of the workshop and the distribution sequence of the workpieces.

Detailed Description

The technical solution of the present invention is further explained with reference to the drawings and the detailed description.

The embodiment of the invention provides an AGV real-time scheduling method integrating online learning, and as shown in FIG. 1, the method comprises the following steps:

And S4, in the process of completing the transportation task, the trolley uses an improved artificial potential field method to realize autonomous obstacle avoidance.

The position information, the environment information and the path planning information of the on-site AGV in the S1 are specifically as follows: the position information of the AGV car refers to a random initial position set for the AGV car;

Specifically, a two-dimensional simulation environment is built by using a Python language and a TKinter library. The path planning algorithm experimental environment is a 30 × 30 grid map, the set obstacles are basically the same as those of a certain researched and researched processing workshop, and the starting points and the target points are also randomly set. The constructed environment two-dimensional map is shown in figure 2. In the figure, the black part is an obstacle, and the white part is a non-obstacle area. The goal of the MATLAB simulation algorithm is to initialize the environment state and begin exploring the environment.

Setting an action space: the setting is to set the motion of the AGV car in the environment established in the two-dimensional grid diagram, and the experiment defines the real motion space model of the robot as four discrete motions, namely, A is equal to [0, 1; 0. -1; -1, 0; 1,0]. The AGV car is approximated as a particle, represented by a circle, with the target points represented by boxes.

The reward function is a feedback that evaluates the goodness of the action performed by the AGV car to transition from the current state to another state, usually expressed as a scalar R. The AGV car aiming at the subject testing environment adopts a non-linear piecewise function to represent an instant reward function, and when the car passes through a feasible region in the environment exploration, an instant reward of 1 is obtained; when the trolley encounters an obstacle, a reward value of-1 is obtained, and when the trolley reaches the target position, a reward value of 2 is obtained. And finally, judging whether the strategy is the optimal strategy or not through the total reward value.

Fig. 3 shows a path simulation result of one of the paths, and after an optimal Q function is obtained through multiple operations, an optimal path can be obtained. The specific flow of the Q learning algorithm is shown in fig. 4.

For the scheduling problem of multiple AGVs, a genetic algorithm is used for solving the scheduling problem, and in order to express the problem more clearly, the car scheduling problem is represented by the following mathematical symbols: workpiece set J for setting to-be-delivered work_i＝{j₁，j₂，…， j_NAGV machine set M for distribution_k＝{m₁，m₂，m_i，…，m_M}, each delivered workpiece J_iWith different number of deliveries, each delivery number being V_i＝{V¹ _i，V² _i，…，V^Ji _i}, each delivered workpiece V^Ji _iMust be delivered at a specified target point m (V)^Ji _i)∈M_kI.e. each step V^Ji _iTarget point m (V) of delivery^Ji _i) Immobilization of wherein m (V)^Ji _i) Watch (A)V of workpiece Ji^Ji _iThe process is carried out by a trolley robot m_iAnd (4) carrying out distribution. Where k is 1, 2, …, M. The actual workshop scheduling system is complex, and the problem is simplified in order to meet the scheduling target in the modeling process as the research of the system.

The following assumptions were made:

1) in the distribution process, the conditions of damage of processing equipment and failure of the trolley are not considered.

2) Each AGV car can only complete one delivery task at a time.

3) The distribution path and the distribution time of each workpiece are planned in real time, and other auxiliary processing time is not considered.

4) All AGV trolley machines are available at time t ═ 0.

5) All workpieces can be dispensed at time t-0.

6) The delivery of the individual workpieces cannot be interrupted once the delivery has started, except in the case of an emergency.

7) The sequence of the delivered work pieces must satisfy the precedence constraint condition.

Because the researched trolley dispatching problem takes the shortest final delivery completion time of all workpieces as the optimal dispatching target, an objective function can be designed as follows:

f＝min_1≤k≤M{max_1≤i≤N{L_ik}}

in the formula: f is an objective function, which represents the longest time for completing the transportation of a certain workpiece in each transportation scheme, and the transportation scheme corresponding to the minimum value of the longest time is selected as the optimal transportation scheme in various transportation schemes; l is the workpiece delivery time, L_ikIndicating the time for completing the delivery of the workpiece i on the AGV trolley k; i is a delivery target point, i is 1, 2, …, N; n is the total number of the workpieces distributed for generation; k is the kth AGV, k is 1, 2, …, M; m is the total number of AGV trolleys.

Each transportation scheme is a scheme of transporting the workpieces to be transported to a destination by adopting different trolleys under the current environment.

The constraint of the objective function is:

L_ik-T_ik≥L_ih (5)

L_jk-L_ik≥T_jk (6)

equation (5) may be described as a preferential constraint for workpiece delivery: suppose AGV dolly h delivers work piece i before dolly k, L_ikIndicates the time (time to reach target point) at which the delivery of the workpiece i on the AGV car k is completed, L_ihShows the delivery time (time to reach the target point) T of the workpiece i delivered on the carriage h_ikIndicating the delivery time (time taken to deliver) of the workpiece i on the AGV cart k.

Equation (6) may be described as selecting preferential constraints for the AGV: assuming that the workpieces i and j need to be delivered by the trolley k at a certain moment, if the task of the workpiece i is issued before the task of the workpiece j, the workpiece i is delivered first. Wherein L is_ikIndicating the time of completion, T, of the workpiece i on the carriage k_ikIndicating the delivery time, L, of the workpiece i on the carriage k_ihIndicating the time, L, of completion of the workpiece i on the carriage h_jkIndicating the time, T, for completion of the workpiece j on the carriage k_jkThe delivery time of the workpiece j on the trolley k is shown, M is the number of trolleys, and N is the number of workpieces.

The simulation experiment is carried out, taking 6 × 10 scheduling problem as an example, and the example related parameters are as follows: the workpiece delivery task Ji is {1, 2, 3, 4, 5, 6}, where i is 1, 2, 3, 4, 5, 6; AGV car number Mj ═ {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, where j ═ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10; and each workpiece preferably corresponds to a final target point and a delivery sequence point set in the process of the target point. As can be seen from fig. 5, the delivery order of each workpiece and the time spent in each delivery.

Through multiple times of simulation, when the number of the seed groups is 100, the selection probability is 0.8, the cross probability is 0.8, the variation probability is 0.6 and the iteration number is 50, the fitness is optimal and the optimal solution is 42.

In cloud edge cooperation, a central cloud issues instructions to robots at different positions, the robots are obtained and sent back to a designated place, in order to finish autonomous obstacle avoidance in the distribution process, an improved artificial potential field method is used, topology switching in the advancing process is considered, and a control protocol is improved so that optimization of an obstacle avoidance algorithm can be achieved when an AGV trolley cluster finishes formation tasks.

The manual potential field method is characterized in that virtual potential fields are introduced among the intelligent AGV trolleys and between the intelligent AGV trolleys and the surrounding environment, and when the distance between the intelligent AGV trolleys or the distance between the intelligent AGV trolleys and the obstacle is smaller than an expected distance, the virtual potential fields are expressed as repulsive force to enable the intelligent AGV trolleys to be far away; when the distance between the intelligent AGV trolleys is larger than the expected distance, the intelligent AGV trolleys are expressed as gravitation, and the intelligent AGV trolleys are close to each other. For each intelligent AGV, regardless of a complex mechanical model and an aerodynamic principle, the intelligent AGV is considered as a mass point, the communication radius is r, the expected distance between the intelligent AGV is d alpha, and the expected distance between the intelligent AGV and an obstacle is d beta. Then, the force received by the intelligent AGV is the resultant force of the forces from other intelligent AGV in the communication radius and the repulsive force of the obstacle, and when the resultant force is 0, the intelligent AGV reaches a balanced state. When all intelligent AGV trolleys can be in a balanced state, safe formation obstacle avoidance walking can be realized.

After cloud edge cooperative computing is used, logistics distribution of the AGV is achieved, and the description of goods distribution, motion state conditions and environment conditions of the intelligent AGV is as follows:

(1) the area around the center of the intelligent AGV is the range including the obstacle and the intelligent AGV in the neighborhood.

(2) Dynamic obstacles in the environment may be detected by the intelligent AGV.

(3) All intelligent AGV dollies can both receive the intelligent AGV dolly in the field to and the information of barrier. Including position, velocity, obstacle radius, etc.

(4) And one trolley of the leading team leads the other trolley to realize autonomous dynamic obstacle avoidance in the driving process, and records the current motion state of the trolleys, the obstacle state of the surrounding environment and the like in real time.

(5) After the trolley finishes the task of distributing the goods for the first time, the trolley continues to start a second distribution task at the current position. The cart can complete delivery tasks to the target point at any starting point.

The embodiment of the invention provides a construction method of an AGV real-time scheduling model integrating online learning, which optimizes the real-time performance of AGV scheduling and realizes the function of reasonably distributing tasks for a plurality of AGV trolleys. The genetic algorithm and the reinforcement learning system are combined, the genetic algorithm is used for solving the problem of dispatching the AGV, the reinforcement learning system is used for solving the problems of AGV avoidance and path planning in the dispatching process, and tasks are reasonably distributed to the AGV. And a cloud edge cooperative computing technology is introduced to complete the tasks of trolley scheduling and path planning of an aeronautical processing workshop, and the aims of reasonably matching resources and requirements and minimizing the path planning distance of the trolley are fulfilled in the scheduling process. The method reduces the complexity of the algorithm, improves the working efficiency of the AGV trolley, and can bring greater economic benefit for companies.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An AGV real-time scheduling method fused with online learning is characterized by comprising the following steps:

s2, matching the most suitable trolley by using a genetic algorithm to complete a distribution task according to the information obtained in the S1;

2. The AGV real-time scheduling method integrating online learning of claim 1, wherein in S4, the improved artificial potential field method is used to realize autonomous obstacle avoidance during the transportation task of the car.

3. The AGV real-time scheduling method integrating online learning according to claim 1, wherein the position information, the environment information and the path planning information of the field AGV in S1 specifically comprise: the position information of the AGV car refers to a random initial position set for the AGV car;

4. The AGV real-time scheduling method integrating online learning according to claim 1, wherein in the process of matching the most suitable car by using the genetic algorithm in S2, an objective function is designed as follows: f is min_1≤k≤M{max_1≤i≤N{L_ik}}；

Wherein, f is an objective function, which represents the longest time for completing the transportation of a certain workpiece in each transportation scheme, and the transportation scheme corresponding to the minimum value of the longest time is selected as the optimal transportation scheme in various transportation schemes;

l is the workpiece delivery time, L_ikIndicating the time for completing the delivery of the workpiece i on the AGV trolley k; i is a delivery target point, i is 1, 2, …, N; n is the total number of the workpieces distributed for generation; k is the kth AGV, k is 1, 2, …, M; m is the total number of AGV trolleys.

5. The AGV real-time scheduling method integrating online learning according to claim 1, wherein in S3, in the process of executing the transportation task, a process of planning a shortest path for the car by using a reinforcement learning algorithm specifically includes:

s34, initializing an environment state and starting an exploration environment, starting from a starting point by the trolley, acquiring a Q value corresponding to the current state, finding out an action corresponding to the Q value through greedy decision, and recording the coordinate of the current state; judging whether the currently selected action is favorable or not through the advantage function, if the action obtains a positive reward value, executing the action, transferring to the next state to obtain the reward value, and storing the reward value in a sample playback cache region;

6. The AGV real-time scheduling method integrating online learning according to claim 5, wherein in S32, the motion of the AGV car in the environment established in the two-dimensional grid map is set, specifically:

7. The AGV real-time scheduling method with integrated online learning of claim 5, wherein in S34,

when the trolley is in the exploration state, if the current state coordinate has an obstacle, the reward value is-1; if no barrier exists on the current state coordinate, returning the reward value to be 1, and entering the next state; if the current state is the target point, the reward value is returned to be 2, and a final path is planned.

8. The AGV real-time scheduling method integrating online learning according to claim 2, wherein in S4, in the process of implementing autonomous obstacle avoidance using the artificial potential field method, the artificial potential field method specifically includes:

9. The AGV real-time scheduling method integrating online learning according to claim 8, wherein the implementation of autonomous obstacle avoidance using an artificial potential field method specifically comprises: