CN117114524B - Logistics sorting method based on reinforcement learning and digital twin - Google Patents

Logistics sorting method based on reinforcement learning and digital twin Download PDF

Info

Publication number
CN117114524B
CN117114524B CN202311369261.6A CN202311369261A CN117114524B CN 117114524 B CN117114524 B CN 117114524B CN 202311369261 A CN202311369261 A CN 202311369261A CN 117114524 B CN117114524 B CN 117114524B
Authority
CN
China
Prior art keywords
sorting
grid
package card
package
logistics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311369261.6A
Other languages
Chinese (zh)
Other versions
CN117114524A (en
Inventor
黄川�
崔曙光
张崴
李然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese University of Hong Kong Shenzhen
Original Assignee
Chinese University of Hong Kong Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese University of Hong Kong Shenzhen filed Critical Chinese University of Hong Kong Shenzhen
Priority to CN202311369261.6A priority Critical patent/CN117114524B/en
Publication of CN117114524A publication Critical patent/CN117114524A/en
Application granted granted Critical
Publication of CN117114524B publication Critical patent/CN117114524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B07SEPARATING SOLIDS FROM SOLIDS; SORTING
    • B07CPOSTAL SORTING; SORTING INDIVIDUAL ARTICLES, OR BULK MATERIAL FIT TO BE SORTED PIECE-MEAL, e.g. BY PICKING
    • B07C3/00Sorting according to destination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a logistics sorting method based on reinforcement learning and digital twin bodies, which comprises the following steps: s1, acquiring historical cargo data in a logistics sorting system; s2, collecting historical sorting data of sorting grids of a sorting machine in a logistics sorting system, and fitting a grid processing efficiency function; s3, integrating package card information through a clustering algorithm to obtain a package card category similarity matrix and a transition probability matrix; s4, designing a reinforcement learning strategy and a value network, and constructing leaf nodes of the Monte Carlo tree; s5, obtaining an optimal grid sorting strategy by expanding leaf nodes of the Monte Carlo tree; s6, constructing a digital twin body for the logistics sorting systems of different logistics transfer fields, and acquiring an optimal grid sorting strategy. According to the invention, the number of the grid-locking cargoes and the grid-locking time data are counted respectively, a Monte Carlo tree search reinforcement learning algorithm is adopted, the generalization of a sorting plan is improved, and the method is suitable for a transfer field logistics sorting system with different site condition factors.

Description

Logistics sorting method based on reinforcement learning and digital twin
Technical Field
The invention relates to the field of logistics sorting, in particular to a logistics sorting method based on reinforcement learning and digital twin.
Background
Existing logistics sorting methods are generally based on manual sorting experience rules of a sorting machine, and are modeled by analyzing historical packing rules, grid distribution and historical sorting data. Such modeling methods have a number of drawbacks: first, when efficiency evaluation and predictive analysis are carried out on a conveyor belt and a grid of a sorting machine, efficiency factors of manual sorting are limited, modeling accuracy and prediction accuracy are not high, and modeling efficiency is low. Secondly, the existing sorting plan optimization can only aim at specific historical shifts of specific sites at a time, and the sorting plan is generally adjusted by means of human experience, so that the generalization of the digital twin model is weaker, and the digital twin model is difficult to adapt to different shift conditions of a plurality of transit sites.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a logistics sorting method based on reinforcement learning and digital twin bodies.
The aim of the invention is realized by the following technical scheme: a reinforcement learning and digital twins based stream sorting method comprising the steps of:
s1, acquiring historical cargo data in a logistics sorting system;
s2, collecting historical sorting data of sorting grids of a sorting machine in a logistics sorting system, and fitting a grid processing efficiency function;
s3, integrating package card information through a clustering algorithm to obtain a package card category similarity matrix and a transition probability matrix;
s4, designing a reinforcement learning strategy and a value network based on the similarity and the transition probability matrix of the package card category, and constructing leaf nodes of the Monte Carlo tree;
s5, obtaining an optimal grid sorting strategy by expanding leaf nodes of the Monte Carlo tree;
s6, constructing a digital twin body for the logistics sorting systems of different logistics transfer fields, simulating in the digital twin body to obtain historical cargo data and historical sorting data in the logistics transfer fields, and dynamically adjusting Monte Carlo trees according to the steps S1-S5 to obtain an optimal grid sorting strategy of the logistics sorting system in the current logistics transfer field.
The beneficial effects of the invention are as follows: based on the historical sorting data and the current sorting data of the logistics transit site, the number of the grid-locking cargoes and the grid-locking time data are respectively counted, and sorting efficiency functions of all grids are fitted through historical grid sorting information. According to the method, detailed analysis of distribution information of personnel in each site is not needed, only the historical sorting information of the grid openings is collected, the uncertain artificial sorting efficiency is converted into quantifiable data taking the grid openings as units, and the modeling efficiency is improved. The Monte Carlo tree search reinforcement learning algorithm is adopted, so that generalization of the optimized sorting plan is improved, and the sorting system can be suitable for the transfer field logistics sorting systems with different field condition factors.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description.
Aiming at the constraint conditions of different transfer sites and a plurality of sorting optimization targets, the invention adopts a logistics sorting method based on reinforcement learning and digital twinning. The sorting grid is used as the minimum unit of sorting and bagging of the logistics sorting machine, the goods package plate is used as the goods clustering of the goods flow direction and the aging information formulated by the transfer field, and the mapping relation between the goods package plate and the goods clustering greatly influences the sorting efficiency of the whole logistics. The method is characterized in that historical sorting data of a large number of sorting grids of the logistics sorting machine are collected, a grid processing efficiency function is fitted, and the sorting system is directly modeled by combining information such as the conveying speed of a conveying belt of the sorting machine and the loading speed. Considering that in the logistics sorting process, each sorting machine grid has a storage space with a fixed capacity, and the storage space reaches the upper limit, the manual grid locking, packaging and emptying treatment can be carried out. So in the process of logistics sorting, the locking efficiency of the sorting lattice opening determines the cargo handling efficiency of the logistics sorting system. As shown in fig. 1, the method comprises the steps of:
a reinforcement learning and digital twins based stream sorting method comprising the steps of:
s1, acquiring historical cargo data in a logistics sorting system;
historical cargo data in the logistics sorting system comprises: in the set time, package card number information, package card flow information and package card aging information of each package in the logistics sorting system are included; the package card flow information comprises package flow city codes; the ageing information of the package card comprises aviation parts and land transportation parts.
At this time, the preprocessing of the historical cargo data may further include: according to the aging information and the flow direction information of each package card, primarily classifying the packages:
primarily classifying packages into two types of aviation parts (fast) and land transportation parts (slow) according to aging information;
and then classifying the flow direction of each type of package according to the city codes according to the package flow direction information.
S2, collecting historical sorting data of sorting grids of a sorting machine in a logistics sorting system, and fitting a grid processing efficiency function;
the historical sorting data of the sorting grid of the sorting machine in the step S2 includes: the number of the sorting grid openings, the grid locking times of the sorting grid openings and the grid locking time of each sorting grid opening when each grid is locked are set in the set time T, and the sorting grid opening locking comprises the number of packages;
the step S2 includes:
s201, based on historical grid locking time and grid locking number data, fitting grid locking number functions of all grids respectivelyLock grid time function>: setting the grid locking times of each sorting grid opening to be M in the time T;
a1, counting the number of packages contained in the sorting grid openings with each number at the moment when the ith grid is locked, fitting a relation function between the number of packages and the number of the sorting grid openings by using a discount approximation method, and marking asIn the function ofxIndicates sorting lattice number,/->Is numbered asxThe number of packages corresponding to the grid openings,x=1,2,…,XXindicates the number of sorting grids, +.>The method is used for representing the relation between the number of packages obtained by fitting and the sorting grid number;
counting the grid locking time of each numbered sorting grid, fitting the relation function of the grid locking time and the sorting grid number by using a recurrence approximation method, and marking asIn the function ofxIndicates sorting lattice number,/->Is numbered asxThe grid locking time length corresponding to the grid opening,x=1,2,…,XXindicates the number of sorting grids, +.>The method is used for representing the relation between the grid locking time length obtained by fitting and the sorting grid port number;
a2, when i=1, 2, …, M, repeatedly executing A1 to obtain each time of locking the gridAnd->Thus obtainingAnd->Averaging to obtain a relation function of the number of packages and the number of sorting grids, and marking the relation function as +.>、/>
S202, calculating a grid processing efficiency function
The set time T is historical one week time;
s3, integrating package card information through a clustering algorithm to obtain a package card category similarity matrix and a transition probability matrix;
the step S3 includes:
s301, using aging information and flow direction information in package card information of each package as feature vectors: assuming that the aging characteristic of the package card is z, the flow direction characteristic is w, and the characteristic vector of the package card is%z , w);
Setting the number N of clusters to be equal to the number X of the grid openings, and clustering the package card feature vectors through a clustering algorithm (the unsupervised clustering algorithm of the clustering algorithm generally adopts Kmeans++), so as to obtain the package card feature vectorsNThe package card categories are recorded as:the cluster center of each package card category is marked as +.>And counting the proportion of the packages of each package card category to the total packages:
feature vectors defining individual package card categoriesWherein->Representing feature vectors of k package card categories, wherein +.>Indicating the aging characteristics contained in the cluster center of the nth pack category; />Flow direction characteristics of the cluster center of the nth package card category are represented, and the package card category is +.>Indicating the proportion of packages of the nth package card category to the total packages,
s302, by calculationConstructing a similarity matrix between the nth package card category and all package card categories with Euclidean distance between feature vectors of each package card category>The matrix is a 1*N matrix, and the kth column in the matrix representsAnd->European distance,/, of->When n=k, the calculated euclidean distance is 0;
after the similarity matrix is standardized, the method is obtainedAcquiring a state transition matrix between the current package card category and all package card categories>,/>Representing modulo, which is also a 1*N matrix;
s303, atRepeatedly executeStep S303, obtaining a similarity matrix and a state transition matrix corresponding to each package card category.
S4, designing a reinforcement learning strategy and a value network based on the similarity and the transition probability matrix of the package card category, and constructing leaf nodes of the Monte Carlo tree;
s401, similarity matrix according to package card categoriesState transition matrix->Designing a reinforcement learning strategy network>
Design state-action setn , a) The state n represents the package card category n corresponding to the current material flow sorting grid, and the actionaSelecting a package card category to be placed for representing the next logistics sorting grid number;
for the current state n: the estimated capacity value of the strategy network of the current logistics sorting grid arrangement is as follows:wherein->Representation->The element of column k; the transfer probability of the package card category selection to be arranged by the next logistics sorting grid number is as follows: />;/>Representation->The kth column element of (a);
s402, design logistics sorting gridEstimated capacity value networkFor state-action setn , a) Calculating the estimated capacity value of the estimated capacity value network under the condition that the package card class n corresponding to the current material flow sorting grid is met:
the estimated productivity value networkIn, first traverse->And at each value of k, calculate +.>The calculation results of k at each value are summed and the result obtained is recorded as +.>The estimated capacity value of the capacity value network is estimated;
s403, constructing leaf nodes of a Monte Carlo tree:
wherein each leaf node corresponds to one sorting grid, and each sorting grid only has one package of package card category,P(n,a) Is the previous transition probability, its value is equal to,/>For the evaluation initial value before the expansion of the current leaf node of the Monte Carlo tree,/is given>;/>Estimating an expanded reward function for the current leaf node of the Monte Carlo tree; />For the number of accesses before the expansion of the current leaf node of the Monte Carlo tree, +.>The number of selection times after the current leaf node of the Monte Carlo tree is unfolded;Q(n , a) The weighted value average for the current node:
is a preset weight;
when the maximum estimated productivity value is estimated for each Monte Carlo tree node, carrying out fixed times or simulated material flow sorting grid distribution allocation limiting fixed time under the current package card category state n so as to enable a strategy networkFor the estimated productivity value of the logistics sorting grid under the package card type state n +.>Approximation value network->For the estimated productivity value of the logistics sorting grid under the package card type state n +.>
S5, obtaining an optimal grid sorting strategy by expanding leaf nodes of the Monte Carlo tree;
in the embodiment of the present application, in step S5, the nodes of the monte carlo tree are expanded according to the procedures of selecting, expanding, evaluating and backtracking, where:
the selection flow is as follows:
the method comprises the steps of giving the position of a root node, searching child node information, and selecting a rule according to polynomial confidence to obtain the current optimal child node, wherein the policy is as follows:
wherein the method comprises the steps ofThe constant is used for determining the expansion degree when selecting the child nodes, and nodes with higher selection probability are initially selected in the selection process, but nodes with higher value are gradually selected in the expansion process of the Monte Carlo tree;
the expansion flow is as follows:
in the process of developing the Monte Carlo tree, a strategy network in S4 is usedPolicy network->Giving the probability of previous transfer according to the package card category transfer matrixP(n , a) Executing until the next leaf node of the Monte Carlo tree or until the full expansion of the Monte Carlo tree;
the evaluation flow is as follows:
starting from leaf nodes in the process of expanding the Monte Carlo tree until the Monte Carlo tree is fully expanded, and obtaining a reward value selected by the current branch according to the wrapping grid distribution and the reward function of the current branch when the Monte Carlo tree is fully expanded;
the construction of the rewarding function is based on queuing theory, and the average speed on the conveyor belt is obtained according to the physical data of the hardware facilities of the transfer field sorting systemvAccording to the proportion of each package card category to the total package, the package card isAnd the total wrapping proportion of each package card category is independent and distributed, so that the corresponding grid arrival rate of each package card category is obtained:
based on the grid port processing efficiency function obtained according to the historical grid port locking time and the grid port locking number data in S2And the grid occupied by each package card category according to the distributionxObtaining the team length of each package card category in logistics sorting:
by constructing a reward functionEach branch evaluation process of the Monte Carlo tree is greatly accelerated, the simulation times of a transfer logistics sorting digital twin system in calling are greatly reduced, and the sorting optimization speed based on historical data is improved;
the backtracking process is as follows:
after the Monte Carlo tree is fully expanded, the selection times are updated upwards from the lowest node of the Monte Carlo tree,/>And corresponding leaf node evaluation value and bonus value under expansion strategy +.>,/>Until the current leaf node, calculating to obtain the weighted value average value of the current leaf node through a backtracking flowQ(n , a):
After the expansion exploration interaction of the multi-round Monte Carlo tree is carried out, when the change of the accumulated prize value of all the current nodes is lower than a preset threshold value, the expansion path of the Monte Carlo tree of the current root node is an optimal path, namely an optimal grid sorting strategy.
Because each leaf node corresponds to one sorting bin, an optimal bin sorting strategy is obtained.
S6, constructing a digital twin body for the logistics sorting systems of different logistics transfer fields, simulating in the digital twin body to obtain historical cargo data and historical sorting data in the logistics transfer fields, and dynamically adjusting Monte Carlo trees according to the steps S1-S5 to obtain an optimal grid sorting strategy of the logistics sorting system in the current logistics transfer field.
The construction logistics transfer field logistics sorting digital twin body is based on historical data (such as the time length of locking the logistics sorting lattice and the information that the locking lattice is the number of packages) and a common operation mechanism model of the series fusion logistics transfer field sorting machine comprises: conveyor belt model, sorting grid model, parcel shelf model, parcel scan model, etc. Because the site conditions of the transfer sites differ from one stream to another, when constructing a digital twin for the transfer site stream sorting in a stream, variable parameters are added to the common operating mechanism model, such as: the number difference of the grid openings, the number difference of goods shelves on the package (influencing the package loading rate), the fixed rate difference of the conveyor belt (influencing the rate of the package falling into the sorting grid openings on the conveyor belt) and the like. The flow lattice sorting strategy obtained by Monte Carlo tree search used in a fixed field has no corresponding universality. The reward function of the search algorithm in the design of the Monte card Lu Shu is obtained based on a theoretical model of queuing theory M/G/n, and the change of fixed parameters exists in the field difference of the transfer in the face of different logistics. Simulation is performed by applying test indicators of actual logistics sorting into logistics sorting digital twins, for example: the indexes of peak capacity, package half-circle drop-in port proportion, average capacity and the like are used for dynamically adjusting corresponding parameters in a Monte Carlo tree search algorithm, such as: the method comprises the steps that the total number n of package card categories corresponding to the number difference of grid arrangement is the package loading rate corresponding to the number difference of shelves on packages, the fixed rate difference is set by a conveyor belt to correspond to the package arrival rate of each material flow sorting grid, and the like, so that the corresponding optimal sorting strategy is obtained.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A method for sorting a stream based on reinforcement learning and digital twins, which is characterized in that: the method comprises the following steps:
s1, acquiring historical cargo data in a logistics sorting system;
s2, collecting historical sorting data of sorting grids of a sorting machine in a logistics sorting system, and fitting a grid processing efficiency function;
s3, integrating package card information through a clustering algorithm to obtain a package card category similarity matrix and a transition probability matrix;
s4, designing a reinforcement learning strategy and a value network based on the similarity and the transition probability matrix of the package card category, and constructing leaf nodes of the Monte Carlo tree;
the step S4 includes:
s401, similarity matrix according to package card categoriesState transition matrix->Designing reinforcement learning strategy network
Design state-action setn , a) The state n represents the package card category n corresponding to the current material flow sorting grid, and the actionaSelecting a package card category to be placed for representing the next logistics sorting grid number;
for the current state n: the estimated capacity value of the strategy network of the current logistics sorting grid arrangement is as follows:wherein->Representation->The element of column k; the transfer probability of the package card category selection to be arranged by the next logistics sorting grid number is as follows: />;/>Representation->The kth column element of (a);
s402, designing a logistics sorting grid port estimated productivity value networkFor state-action setn , a) The package card category corresponding to the current material flow sorting grid is metnUnder the state, calculating the estimated capacity value of the estimated capacity value network:
the estimated productivity value networkIn, first traverse->And at each value of k, calculate +.>The calculation results of k at each value are summed and the result obtained is recorded as +.>The estimated capacity value of the capacity value network is estimated;
s403, constructing leaf nodes of a Monte Carlo tree:
wherein each leaf node corresponds to one sorting grid, and each sorting grid only has one package of package card category,P(n,a) Is the previous transition probability, its value is equal to,/>For the evaluation initial value before the expansion of the current leaf node of the Monte Carlo tree,/is given>;/>Estimating an expanded reward function for the current leaf node of the Monte Carlo tree; />For the number of accesses before the expansion of the current leaf node of the Monte Carlo tree, +.>The number of selection times after the current leaf node of the Monte Carlo tree is unfolded; />The weighted value average for the current node:
is a preset weight;
when the maximum estimated productivity value is estimated for each Monte Carlo tree node, carrying out fixed times or simulated material flow sorting grid distribution allocation limiting fixed time under the current package card category state n so as to enable a strategy networkFor the estimated productivity value of the logistics sorting grid under the package card type state n +.>Approximation value network->For the estimated productivity value of the logistics sorting grid under the package card type state n +.>
S5, obtaining an optimal grid sorting strategy by expanding leaf nodes of the Monte Carlo tree;
s6, constructing a digital twin body for the logistics sorting systems of different logistics transfer fields, simulating in the digital twin body to obtain historical cargo data and historical sorting data in the logistics transfer fields, and dynamically adjusting Monte Carlo trees according to the steps S1-S5 to obtain an optimal grid sorting strategy of the logistics sorting system in the current logistics transfer field.
2. A reinforcement learning and digital twins based stream sorting method according to claim 1, characterized by: the historical cargo data in the logistics sorting system in step S1 includes: within a set time T, package card number information, package card flow information and package card aging information of each package in the logistics sorting system are included; the package card flow information comprises package flow city codes; the ageing information of the package card comprises aviation parts and land transportation parts.
3. A reinforcement learning and digital twins based stream sorting method according to claim 1, characterized by: the historical sorting data of the sorting grid of the sorting machine in the step S2 includes: the number of the sorting grids, the grid locking times of the sorting grids, and the grid locking time of each sorting grid when each grid is locked in the set time T, wherein the sorting grid locking time comprises the number of packages.
4. A reinforcement learning and digital twins based stream sorting method according to claim 1, characterized by: the step S2 includes:
s201, based on historical grid locking time and grid locking number data, fitting grid locking number functions and grid locking time functions of all grids respectively: setting the grid locking times of each sorting grid opening to be M in the time T;
a1, counting the number of packages contained in the sorting grid openings with each number at the moment when the ith grid is locked, fitting a relation function between the number of packages and the number of the sorting grid openings by using a discount approximation method, and marking asIn the function +.>Indicates sorting lattice number,/->Is numbered->The number of packages corresponding to the grid openings,x=1,2,…,XXindicates the number of sorting grids, +.>The method is used for representing the relation between the number of packages obtained by fitting and the sorting grid number;
counting the grid locking time of each numbered sorting grid, fitting the relation function of the grid locking time and the sorting grid number by using a recurrence approximation method, and marking asIn the function +.>Indicates sorting lattice number,/->Is numbered->The grid locking time length corresponding to the grid opening,x=1,2,…,XXindicates the number of sorting grids, +.>The method is used for representing the relation between the grid locking time length obtained by fitting and the sorting grid port number;
a2, when i=1, 2, …, M, repeatedly executing A1 to obtain each time of locking the gridAnd->Thus obtainingAnd->Averaging to obtain a relation function of the number of packages and the number of sorting grids, and marking the relation function as +.>、/>
S202, calculating a grid processing efficiency function
5. A reinforcement learning and digital twins based stream sorting method according to claim 2 or 4, characterized by: the set time T is a historical one-week time.
6. A reinforcement learning and digital twins based stream sorting method according to claim 1, characterized by: the step S3 includes:
s301, using aging information and flow direction information in package card information of each package as feature vectors: assuming that the aging characteristic of the package card is z, the flow direction characteristic is w, and the characteristic vector of the package card is%z , w);
Setting the number N of clusters to be equal to the number X of the grid openings, and clustering the package plate feature vectors through a clustering algorithm to obtain NThe package card categories are recorded as:the cluster center of each package card category is marked as +.>And counting the proportion of the packages of each package card category to the total packages: />
Feature vectors defining individual package card categoriesWherein->Representing feature vectors of k package card categories, wherein +.>Indicating the aging characteristics contained in the cluster center of the nth pack category; />Flow direction characteristics of the cluster center of the nth package card category are represented, and the package card category is +.>Indicating the proportion of packages of the nth package card category to the total packages,
s302, by calculationConstructing a similarity matrix between the nth package card category and all package card categories with Euclidean distance between feature vectors of each package card category>The matrix->Is a 1*N matrix, the kth column in the matrix representsAnd->European distance,/, of->When n=k, the calculated euclidean distance is 0;
after the similarity matrix is standardized, the method is obtainedAcquiring a state transition matrix between the current package card category and all package card categories>,/>Representing modulo, the matrix->Also a matrix of 1*N;
s303, atStep S303 is repeatedly executed to obtain a similarity matrix and a state transition matrix corresponding to each package card category.
CN202311369261.6A 2023-10-23 2023-10-23 Logistics sorting method based on reinforcement learning and digital twin Active CN117114524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311369261.6A CN117114524B (en) 2023-10-23 2023-10-23 Logistics sorting method based on reinforcement learning and digital twin

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311369261.6A CN117114524B (en) 2023-10-23 2023-10-23 Logistics sorting method based on reinforcement learning and digital twin

Publications (2)

Publication Number Publication Date
CN117114524A CN117114524A (en) 2023-11-24
CN117114524B true CN117114524B (en) 2024-01-26

Family

ID=88796944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311369261.6A Active CN117114524B (en) 2023-10-23 2023-10-23 Logistics sorting method based on reinforcement learning and digital twin

Country Status (1)

Country Link
CN (1) CN117114524B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711624A (en) * 2018-12-28 2019-05-03 深圳蓝胖子机器人有限公司 Packing method, equipment and computer readable storage medium
CN113761702A (en) * 2020-11-03 2021-12-07 北京京东振世信息技术有限公司 Sorting scheme evaluation method and device
CN113770037A (en) * 2021-02-23 2021-12-10 北京京东乾石科技有限公司 Sorting method and sorting device
CN114186727A (en) * 2021-12-02 2022-03-15 交通运输部水运科学研究所 Multi-cycle logistics network planning method and system
CN115291576A (en) * 2022-08-18 2022-11-04 南京邮电大学 RFID modeling method for virtual simulation in warehouse logistics environment
CN115860401A (en) * 2022-12-14 2023-03-28 武汉理工大学 Intelligent logistics distribution system and method driven by digital twin
WO2023103692A1 (en) * 2021-12-07 2023-06-15 阿里巴巴达摩院(杭州)科技有限公司 Decision planning method for autonomous driving, electronic device, and computer storage medium
CN116304593A (en) * 2023-03-02 2023-06-23 深圳数位大数据科技有限公司 Follow-up brand research based on improved AGNES clustering algorithm
CN116384853A (en) * 2023-03-01 2023-07-04 湖北普罗格科技股份有限公司 Digital twin intelligent logistics management method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11900306B2 (en) * 2017-07-05 2024-02-13 United Parcel Service Of America, Inc. Verifiable parcel distributed ledger shipping and tracking system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711624A (en) * 2018-12-28 2019-05-03 深圳蓝胖子机器人有限公司 Packing method, equipment and computer readable storage medium
CN113761702A (en) * 2020-11-03 2021-12-07 北京京东振世信息技术有限公司 Sorting scheme evaluation method and device
CN113770037A (en) * 2021-02-23 2021-12-10 北京京东乾石科技有限公司 Sorting method and sorting device
CN114186727A (en) * 2021-12-02 2022-03-15 交通运输部水运科学研究所 Multi-cycle logistics network planning method and system
WO2023103692A1 (en) * 2021-12-07 2023-06-15 阿里巴巴达摩院(杭州)科技有限公司 Decision planning method for autonomous driving, electronic device, and computer storage medium
CN115291576A (en) * 2022-08-18 2022-11-04 南京邮电大学 RFID modeling method for virtual simulation in warehouse logistics environment
CN115860401A (en) * 2022-12-14 2023-03-28 武汉理工大学 Intelligent logistics distribution system and method driven by digital twin
CN116384853A (en) * 2023-03-01 2023-07-04 湖北普罗格科技股份有限公司 Digital twin intelligent logistics management method and device
CN116304593A (en) * 2023-03-02 2023-06-23 深圳数位大数据科技有限公司 Follow-up brand research based on improved AGNES clustering algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
快递联盟服务站收件***仿真分析方法;石章鹏;赵小勇;黄志鹏;周子一;;物流技术(16);140-143 *

Also Published As

Publication number Publication date
CN117114524A (en) 2023-11-24

Similar Documents

Publication Publication Date Title
CN108230049A (en) The Forecasting Methodology and system of order
Adulyasak et al. Benders decomposition for production routing under demand uncertainty
CN111563706A (en) Multivariable logistics freight volume prediction method based on LSTM network
Hsu et al. Temperature prediction and TAIFEX forecasting based on fuzzy relationships and MTPSO techniques
CN105929690A (en) Flexible workshop robustness scheduling method based on decomposition multi-target evolution algorithm
CN111738520A (en) System load prediction method fusing isolated forest and long-short term memory network
CN111680820A (en) Distributed photovoltaic power station fault diagnosis method and device
CN108710905A (en) Spare part quantity prediction method and system based on multi-model combination
CN115564114A (en) Short-term prediction method and system for airspace carbon emission based on graph neural network
CN114648170A (en) Reservoir water level prediction early warning method and system based on hybrid deep learning model
CN111768622A (en) Short-time traffic prediction method based on improved wolf algorithm
CN111697560B (en) Method and system for predicting load of power system based on LSTM
CN110941902A (en) Lightning stroke fault early warning method and system for power transmission line
CN112884236A (en) Short-term load prediction method and system based on VDM decomposition and LSTM improvement
CN114091360B (en) Multi-model fused voltage transformer error state evaluation method
CN116799796A (en) Photovoltaic power generation power prediction method, device, equipment and medium
Rifai et al. Reentrant FMS scheduling in loop layout with consideration of multi loading-unloading stations and shortcuts
CN117114524B (en) Logistics sorting method based on reinforcement learning and digital twin
CN111553509A (en) Rail transit route selection evaluation and cost optimization method aiming at geological environment risks
CN106548257A (en) A kind of standby redundancy quota formulating method based on decision-tree model
CN109697531A (en) A kind of logistics park-hinterland Forecast of Logistics Demand method
CN117170980A (en) Early warning method, device, equipment and storage medium for server hardware abnormality
CN116303786B (en) Block chain financial big data management system based on multidimensional data fusion algorithm
CN115907079B (en) Airspace traffic flow prediction method based on attention space-time diagram convolutional network
Shahraki et al. A new approach for forecasting enrollments using harmony search algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant