CN111353625B

CN111353625B - Method, device, computer equipment and storage medium for predicting net point quantity

Info

Publication number: CN111353625B
Application number: CN201811572809.6A
Authority: CN
Inventors: 刘曙铭; 王本玉; 湛长兰; 李凤; 肖沙沙; 吴敏礽; 金晶
Original assignee: SF Technology Co Ltd
Current assignee: SF Technology Co Ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2024-06-25
Anticipated expiration: 2038-12-21
Also published as: CN111353625A

Abstract

The application discloses a method, a device, equipment and a readable storage medium for predicting net point quantity. The method comprises the following steps: constructing a time-consuming prediction model; acquiring real-time route data and historical route data corresponding to the website waybill data; predicting the quantity of the pieces in each network point fixed time period by combining the real-time routing data and the time-consuming model; and calculating the route missing rate according to the historical route data, and determining the final piece quantity in the fixed time period of the network point. According to the method, a new time-consuming model is built, so that the problems of inaccurate detection results and long calculation period of a machine learning algorithm in the prior art are solved, and the accuracy of flood peak part quantity prediction is improved.

Description

Method, device, computer equipment and storage medium for predicting net point quantity

Technical Field

The invention relates to the technical field of logistics, in particular to a method and a device for predicting net point quantity, computer equipment and a storage medium.

Background

In the field of logistics express transportation, how to better predict the amount of the express is an important link for improving the service of enterprises. The basis for optimizing the express delivery scheme is to predict the daily express delivery quantity of the website, because in the actual delivery process, the time of finally arriving at the website is affected by various factors.

At present, the part quantity prediction model learns the part quantity growth trend through the dot dimension and the time dimension by using a machine learning algorithm to conduct time sequence prediction. The prediction of time series is based on the recognition of "long-term trends", "seasonal variations", "cyclic variations", and the prediction of "irregular variations" is poor. For example, there is a large error in the current industry for peak predictions for certain specific dates (e.g., 618, 11, etc.), mainly because these dates have peak volume floods, and it is difficult to learn the trend of increasing volume from historical data, thus resulting in a large error in predictions. In addition, the fact that the dispatching of each website cannot well utilize the routing information and the historical time-consuming information leads to the fact that the quantity of the dispatching flood peak cannot be timely perceived, and the fact that prediction is inaccurate is also caused.

How to solve the phenomenon of inaccurate prediction of the flood peak of the piece quantity caused by a special date is a problem to be solved urgently.

Disclosure of Invention

In view of the foregoing drawbacks or shortcomings of the prior art, it is desirable to provide a method, apparatus, computer device, and storage medium for predicting a net point piece amount to improve accuracy of the net point predicted piece amount.

In a first aspect, an embodiment of the present invention provides a method for predicting a net point quantity, where the method includes:

Constructing a time-consuming prediction model, wherein the time-consuming prediction model is trained based on Xgboost models;

Acquiring real-time route data and historical route data corresponding to the website waybill data;

Predicting the quantity of the pieces in each network point fixed time period by combining the real-time routing data and the time-consuming model;

And calculating the route missing rate according to the historical route data, and determining the final piece quantity in the fixed time period of the network point.

In a second aspect, an embodiment of the present invention provides a mesh point piece quantity predicting apparatus, including:

The model construction module is used for constructing a time-consuming prediction model which is trained based on the Xgboost model;

The data acquisition module is used for acquiring real-time routing data and historical routing data corresponding to the website waybill data;

The piece quantity prediction module is used for predicting the piece quantity in the fixed time period of each network point by combining the real-time routing data and the time-consuming model;

And the part quantity determining module is used for calculating the route missing rate according to the historical route data and determining the final part quantity in the fixed time period of the network point.

In a third aspect, embodiments of the present application provide a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a method as described in embodiments of the present application when the program is executed by the processor.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program for:

the computer program, when executed by a processor, implements a method as described in embodiments of the application.

According to the mesh point piece quantity prediction method provided by the embodiment of the application, the problem that the detection result of the machine learning algorithm is inaccurate and the calculation period is long in the prior art is solved by optimizing and constructing a new time-consuming model, and the accuracy of the flood peak piece quantity prediction is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic flow chart of a method for predicting net-point quantity according to an embodiment of the present application;

FIG. 2 is a schematic block diagram of a mesh point component amount predicting apparatus according to an embodiment of the present application;

fig. 3 is a schematic diagram of a computer system suitable for use in implementing the terminal device of the embodiment of the present application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

The system of the device can comprise terminal equipment, a network, a server and the like. The terminal device may be, for example, a computer device. The network is used for connecting communication between the terminal device and the server. Which may be a wireless, wired or optical fiber or the like transmission medium.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a method for predicting a net point quantity according to an embodiment of the present application. The method may be performed at the server side.

As shown in fig. 1, the method includes:

In step 110, a time-consuming prediction model is constructed, which is trained based on Xgboost models. In an embodiment of the present application, such a time-consuming predictive model is built from a number of CART tree integration based on Xgboost, xgboost is an acronym for extreme gradient-rise (Extreme Gradient Boosting), which is similar to the gradient-rise framework, but more efficient. It has both a linear model solver and a tree learning algorithm. The sample output of the classification tree is in the form of a class. In fact, classification and regression are of one model, but the result of classification is discrete and regression is continuous.

A time-consuming model is built by using Xgboost algorithm based on Spark computing engine, and the time-consuming table is updated according to the hour frequency. Because Spark is a distributed computing engine based on a memory, most of the computing process is carried out in the memory, intermediate results are stored in the memory, and the disk reading and writing are not needed, so that the model training speed can be improved when the data volume is large, the frequency updating is improved, and the updating is realized according to the hour. For the time-consuming model, the latest change trend can be obtained by acquiring new data, and the latest change trend can directly influence the accuracy of the time-consuming model, so that the accuracy of calculating time consumption is improved by improving the updating speed of the model frequency.

Step 120, acquiring real-time route data and historical route data corresponding to the website waybill data.

In the embodiment of the application, the real-time routing table contains the data of the whole life cycle of the waybill, and all nodes in the whole process upload the routing data from the pick-up of the courier to the network point, then to each transfer, land transportation, air transportation and the like, and finally to dispatch to the hand of the user. By analyzing these routing data, we can obtain the time consumption between transitions in each session from the history data. The route data comprises a plurality of operations, such as a gate-on receiving route, an arrival website route, a departure route, an arrival transit field route, an arrival destination and the like, and by screening out part of the route data, the time-consuming condition of a path can be obtained. If a fast article takes an hour from the intermediate transfer a to the intermediate transfer B, we will take the time of reaching the intermediate transfer B as the end time according to the route data of the fast article reaching the intermediate transfer a as the start time, and the time of reaching the intermediate transfer B is the time of subtracting the start time from the end time.

And performing de-duplication, cleaning and complement operations on real-time route data and historical route data corresponding to the waybill data of each website, and taking the real-time route data and the historical route data as data sources for model input. Because problems occur in the data acquisition process due to misoperation, service systems and other human factors in the routing table, the problems of duplication, deletion or abnormal value of data occur. For these problems, the most original data needs to be subjected to a deduplication operation, so that the uniqueness of the data is ensured. It is also necessary to determine the outlier of the data, for example, it takes an hour from the route data from the place a to the place B, and some data may be calculated for a day or even longer, and the outlier is obviously the outlier of the data which is far beyond the normal value, and it is necessary to process the outlier, so as to ensure that the data source is stable and accurate.

And step 130, predicting the quantity of the pieces in each website in a fixed time period by combining the real-time routing data and the time-consuming model.

In the embodiment of the application, the marking route data is selected from the real-time route table and is subjected to cleaning and preprocessing operations, and the route data representing the accurate position of the waybill is required to be selected because of huge data quantity in the route table, wherein the main data access process is that various route data are transmitted to a server in real time after being generated, and the real-time route data are accessed to a large data processing platform by using a kafka component. Kafka is a sheet of data buffer components. In a real big data application scenario, the data will be explosively increased in a certain period of time, if the real-time data is directly stored in the big data application system, the system root may not process the rapidly increased data, and a buffer mechanism is needed here, which can be used as a buffer band to slowly transfer the suddenly increased data to the big data system, so as to ensure that the data is not lost.

Specifically, the received real-time routing data is analyzed in real-time using the kafka component using SPARKSTREAMING techniques. SPARKSTREAMING is a spark-based high-level component, primarily for real-time computing. The main purpose of the component is to batch process continuous data according to the minimum 1S time period. Each time, the data in the shortest 1S is processed, then the processing and processing operation are carried out on the data in the 1S, the data in the next second is processed after the data in the second is processed, and the real-time system continuously operates for 7 x 24 hours. The time for the waybill to reach the network point can be obtained by combining the waybill table and the time consumption table, the predictable period is shortened according to the route data updated each time, and the prediction accuracy is improved.

And 140, calculating the route missing rate according to the historical route data, and determining the final part quantity in the fixed time period of the network point.

In the embodiment of the application, because the real-time routing data is missing, the predicted quantity flood peak has a certain deviation only according to the missing routing data, and the predicted quantity is lower. And according to the calculated proportion, dividing the real-time route predicted piece quantity data by the proportion to obtain final piece quantity flood peak data.

Optionally, constructing the time-consuming prediction model includes the steps of:

and S11, constructing a feature set based on the time dimension, the service dimension and the geographic dimension.

Specifically, due to date, time period. Factors such as rush hour, people who go to the middle shift and arrange the shift, weather, road conditions and the like are all related to the time-consuming model, so the characteristics are extracted and characteristic engineering is constructed. The feature engineering is mainly established from the following three dimensions, namely a time dimension, a geographic dimension and a service dimension.

S12, acquiring XgBoost an integrated algorithm and establishing a time-consuming model by using a Spark computing engine.

Specifically, the integration algorithm in the embodiment of the present application may be Xgboost, which is mainly based on a binary tree of a CART regression tree, and the feature is split continuously, for example, the current tree node is split based on the jth feature value, and the sample with the feature value smaller than S is divided into a left subtree, and the sample with the feature value larger than S is divided into a right subtree.

The CART regression tree essentially divides the sample space in the feature dimension, and the optimization of the space division is an NP-hard problem, so that the problem is solved by using a heuristic method in the decision tree model, and a regression tree is finally obtained for solving the optimal segmentation feature and segmentation point.

Boost is ensemble learning, which is to construct multiple classifiers to predict data, and then integrate the predicted results of the multiple classifiers with a certain strategy as the final predicted result. Xgboost is also just one boosting ensemble learning. Xgboost is also a GBDT in nature, but strives to bring speed and efficiency into play. The core algorithm idea is basically to continuously add trees, continuously perform feature splitting to grow a tree, and each time one tree is added, a new function is learned to simulate the residual error of the last prediction. Xgboost it is more to learn the residual than the conventional decision tree algorithm. The model is optimized step by the residual error. For example, if the age of a person is 28 years old, we predict that the age of the person is 25 years old through the first tree, then his residual is the actual value minus the predicted value, then the residual is-3, and the predicted value of the first regression tree is proved to be lower. While in training the second class tree, the residual error is predicted to be-3, and if the predicted value of the second class tree is 4, the combination of the first class tree and the second class tree is 29, and the sum is 1 more than the true value, that is, the predicted value is. And then continuously learning residual errors, wherein the model is trained only if the number of the tree reaches the maximum or the iteration times is finished. The mechanism based on XGBoost learning residual errors has better accuracy than the traditional decision tree.

The calculation engine adopts Spark as an open source cluster calculation framework for real-time processing, and in terms of real-time data analysis, spark has its advanced component SPARKSTREAMING which can well solve the problem of real-time calculation, and is most common in all other solutions. The whole Spark cluster is divided into a Master node and a Worker node, wherein the Master node is resident with a Master daemon process and a Driver process, the Master is responsible for changing serial Tasks into task sets Tasks which can be executed in parallel, meanwhile, error problem processing and the like, the Worker node is resident with a Worker daemon process, the Master node is different from the Worker node in division, the Master load manages all the Worker nodes, and the Worker node is responsible for executing Tasks. Because Spark runs tasks on a cluster basis, it contains two nodes, one is the Master node and the other is the worker node. The Master is primarily responsible for allocating resources, and each time a task runs on top of a spark cluster, the Master is responsible for invoking the resources of the cluster to perform the task, similar to the leader. The workers are more similar to the bottom staff, and after a new task comes, the leader Master can divide the task to the workers so that the workers can perform corresponding calculation tasks. This is the Master and workbench in Spark. Each time a Spark task is started, a Driver process is started on the Master, and the process is responsible for executing the whole task.

And S13, inputting the feature set into the time-consuming model for training to form a time-consuming prediction model.

Specifically, the feature set mainly comprises three aspects of a natural dimension, a time dimension and a service dimension, and the features based on the time dimension mainly comprise feature values of year, month, day, hour, working day, non-working day, holiday and the like, and the influence factors can influence the timeliness of express delivery transportation. The effect of the time dimension on the time consumption of the weekend and the weekday, the holiday and the different time periods of the express item from the place A to the place B is influenced, so that the influence weight must be obtained by analyzing the influence of the different time dimensions on the time consumption of the history data of the last three months.

The main reason for the regional and weather dimensions is that the arrival of the express mail from the site A to the site B is also affected by the dimensions between the regions, the traffic conditions in different regions are different, and the convenient traffic reduces the time consumption. Weather factors can also affect the time consuming of the courier, such as rain or extreme weather, which can greatly affect the transportation of the courier, resulting in increased time consumption. By accessing an external data source or crawling data by using a crawler, the data of the geographic dimension of the website can be obtained to be used as the characteristic construction weight.

In the business dimension direction, the transportation process of the express mail is influenced by human factors. The number of people in the transfer field can directly influence the transfer and transportation efficiency of the express mail, so that the personnel scheduling situation is acquired, the business weight is constructed, and the time consumption is calculated.

Further, the collected characteristic data is input into a time-consuming prediction model for training, so that the model has the capability of predicting the arrival time of the express mail at the key station.

Optionally, a time-consuming model is built by using xgboost algorithm based on spark computing engine, and the time-consuming table is updated according to the hour frequency, so that the trend of time-consuming change can be found in time, and the time-consuming accuracy is improved.

Optionally, acquiring real-time routing data and historical routing data corresponding to the website waybill data, where the real-time routing data and the historical routing data include any one or any combination of two or more of the following: the operation list number, the operation time, the operation code, the operation place, the operation type and the transfer data.

Specifically, the real-time routing table contains the data of the whole life cycle of the waybill, the data starts from the delivery personnel to get on the internet, then goes to each transfer station, land transportation, air transportation and the like, and finally is dispatched to the user, and each node in the whole process can upload the routing data. Because the time-consuming situation of the courier between this transfer of fields can be obtained from this class of data.

Furthermore, the duplicate removal is performed according to the bill number, the operation code and the operation time, so that the uniqueness of the data is ensured, then the data is removed, dirty data mainly comprises important field deletion, when the bill number, the time and other important fields are deleted, the data lose value, the data need to be removed, and the other data need to be removed, and the data need to be cleaned if the data value is abnormal, the data mainly comprises the bill time which is too long and completely exceeds a reasonable range.

Optionally, predicting the part quantity in each website fixed time period by combining the real-time routing data and the time-consuming model includes the following steps:

And S21, determining the position of the net point of the express mail according to the real-time routing data.

Specifically, the real-time location of the express delivery can be determined according to the real-time routing data, for example, during double 11 sales promotion, since the quantity of the express delivery is large and the traffic influence is serious, it is inaccurate to analyze or predict the time of arrival of the express delivery at the terminal station only according to the real-time routing data.

S22, predicting the time of the express item of the site position to reach the final site by using the time-consuming model.

Specifically, based on a time-consuming model established by Xgboosting processing a machine learning model, the influence characteristics influencing the express delivery are collected, and the time of the express of the website reaching the final address is predicted by using the time-consuming model.

And S23, determining the total number of the predicted pieces of each final mesh point in the fixed time period by combining the time of the final mesh point.

Specifically, the predicted data of the time-consuming model is combined and compared with the data of the real-time route, the position of the express mail is precisely positioned through the real-time route, the destination and the route are acquired by combining the waybill, and the arrival time of the amount of the express mail is predicted through the time-consuming model, so that the peak of the amount of the net point piece can be predicted. By continuously updating the real-time route, the arrival time of the express mail at the website is more accurate.

Optionally, calculating the route missing rate according to the historical route data, determining the final part quantity in the fixed time period of the network point includes:

S31, analyzing the missing rate of the historical route data of each network point by using a statistical method.

Specifically, because the real-time route data is missing, the route missing rate is calculated according to the historical route data, and the net point quantity data is complemented according to the proportion. And obtaining the estimated delivery quantity and the actual delivery quantity of the network points, and dividing the estimated delivery quantity by the actual delivery quantity to obtain the route missing rate.

And S32, complementing the numerical value of the predicted piece quantity of each net point based on the missing rate.

Specifically, according to the calculated average missing rate, the predicted net point flood peak piece quantity is divided by the missing rate, and the final net point predicted piece quantity flood peak value can be obtained.

In a second aspect, an embodiment of the present application provides a mesh point component amount prediction apparatus, as shown in fig. 2, which shows a block diagram of a mesh point component amount prediction apparatus according to an embodiment of the present application, where the apparatus includes:

The model construction module 210 is configured to construct a time-consuming prediction model, where the time-consuming prediction model is trained based on the Xgboost model.

The data obtaining module 220 is configured to obtain real-time routing data and historical routing data corresponding to the website waybill data.

And the part quantity predicting module 230 is configured to predict the part quantity in each website fixed time period by combining the real-time routing data and the time-consuming model.

And the part quantity determining module 240 is configured to calculate a route missing rate according to the historical route data, and determine a final part quantity in the fixed time period of the mesh point.

Optionally, the model building module 210 is specifically configured to:

And the feature set unit is used for constructing a feature set based on the time dimension, the service dimension and the geographic dimension.

And the time-consuming model unit is used for acquiring xgboost the integrated algorithm and establishing a time-consuming model by using the Spark computing engine.

Optionally, the piece-quantity prediction module 230 is specifically configured to:

And the determining unit is used for determining the position of the net point of the express mail according to the real-time routing data.

And the first prediction unit is used for predicting the time of the express mail of the site position reaching the final site by using the time-consuming model.

And the second prediction unit is used for determining the total number of the predicted pieces of each final mesh point in the fixed time period by combining the time of the final mesh point.

It should be understood that the units or modules described in the above apparatus correspond to the individual steps in the method described with reference to fig. 1. Thus, the operations and features described above with respect to the method are equally applicable to the apparatus described above and the units comprised therein, and are not further described herein.

Referring now to FIG. 3, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing a server of an embodiment of the present application.

As shown in fig. 3, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.

In particular, according to embodiments of the present disclosure, the process described above with reference to fig. 1 may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method of fig. 1. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules involved in the embodiments of the present application may be implemented in software or in hardware. The described units or modules may also be provided in a processor.

As another aspect, the present application also provides a computer-readable storage medium, which may be a computer-readable medium contained in the apparatus described in the above embodiment; or may be a computer readable medium, alone, that is not assembled into a device. The computer-readable medium stores one or more programs that, when executed by one of the electronic devices, cause the electronic devices to implement the dot count prediction method as described in the above embodiments.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. The technical proposal is formed by mutually replacing the above-mentioned characteristics and the technical characteristics with similar functions (but not limited to)

From the above description, it is clear that: those skilled in the art will appreciate that the present application must be implemented in a hardware background fashion. Based on this understanding, the technical solution of the present application may essentially be said to be that the part contributing to the prior art is embodied in the form of a development program of a computer, and includes several instructions that cause a computer device (a personal computer, a server, or a network device, etc.) to execute the methods described in some parts of the embodiments of the present application.

Claims

1. A method for predicting net point quantity, the method comprising:

Constructing a time-consuming prediction model, wherein the time-consuming prediction model is trained based on Xgboost models; wherein the constructing a time-consuming prediction model comprises:

constructing a feature set based on the time dimension, the business dimension and the geographic dimension;

Acquiring XgBoost an integration algorithm and establishing a time-consuming model by using a Spark computing engine;

inputting the feature set into the time-consuming model for training to form a time-consuming prediction model;

Predicting the quantity of the pieces in each website fixed time period by combining the real-time routing data and the time-consuming model, wherein the method comprises the following steps:

Determining the position of the net point of the express according to the real-time routing data;

Predicting the time of the express mail of the site position reaching the final site by using a time-consuming model;

Determining the total number of predicted pieces of each final net point in a fixed time period by combining the time of the final net point;

2. The method for predicting the quantity of dots according to claim 1, wherein said constructing a feature set based on a time dimension, a service dimension, and a geographic dimension comprises:

the time dimension feature set includes: one or more of year, month, day, hour, working day, non-working day, holiday;

the service dimension feature set includes: manual scheduling and/or vehicle scheduling conditions;

The set of geographic dimension features includes: weather conditions and/or intersection factors.

3. The mesh point component quantity prediction method according to claim 2, wherein the time-consuming prediction model is updated at fixed time intervals based on a newly constructed feature set, and a time-consuming table is output.

4. The method for predicting the quantity of net points according to claim 3, wherein the obtaining real-time routing data and historical routing data corresponding to net point waybill data comprises any one or a combination of any two or more of the following:

the operation list number, the operation time, the operation code, the operation place, the operation type and the transfer data.

5. The method of predicting a net point quantity according to claim 4, wherein calculating a route missing rate according to the historical route data, determining a final quantity within the net point fixed time period, comprises:

analyzing the missing rate of the historical route data of each network point by using a statistical method;

and supplementing the numerical value of the predicted piece quantity of each net point based on the missing rate.

6. A net point piece quantity predicting apparatus, characterized by comprising:

the model construction module is used for constructing a time-consuming prediction model which is trained based on the Xgboost model; the model construction module is specifically configured to:

the feature set unit is used for constructing a feature set based on the time dimension, the service dimension and the geographic dimension;

The time-consuming model unit is used for acquiring XgBoost an integration algorithm and establishing a time-consuming model by using a Spark computing engine;

The training unit is used for inputting the feature set into the time-consuming model for training so as to form a time-consuming prediction model;

The piece quantity prediction module is used for predicting the piece quantity in the fixed time period of each network point by combining the real-time routing data and the time-consuming model; the piece quantity prediction module is specifically used for:

The determining unit is used for determining the position of the net point of the express according to the real-time routing data;

The first prediction unit is used for predicting the time of the express mail at the position of the net point to the final net point by using the time-consuming model;

the second prediction unit is used for determining the total number of predicted pieces of each final mesh point in a fixed time period by combining the time of the final mesh point;

7. A computer device, comprising: at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the method of any one of claims 1-5.

8. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, perform the method according to any of claims 1-5.