CN114548565A - Express prediction method based on random forest - Google Patents
Express prediction method based on random forest Download PDFInfo
- Publication number
- CN114548565A CN114548565A CN202210173732.5A CN202210173732A CN114548565A CN 114548565 A CN114548565 A CN 114548565A CN 202210173732 A CN202210173732 A CN 202210173732A CN 114548565 A CN114548565 A CN 114548565A
- Authority
- CN
- China
- Prior art keywords
- express
- delivery
- random forest
- day
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 26
- 238000012384 transportation and delivery Methods 0.000 claims abstract description 59
- 238000012549 training Methods 0.000 claims abstract description 13
- 238000012216 screening Methods 0.000 claims abstract description 4
- 238000003066 decision tree Methods 0.000 claims description 25
- 238000004140 cleaning Methods 0.000 claims description 8
- 230000002354 daily effect Effects 0.000 claims description 6
- 230000003203 everyday effect Effects 0.000 claims description 4
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 abstract description 8
- 230000000694 effects Effects 0.000 description 3
- 238000007477 logistic regression Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013277 forecasting method Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/083—Shipping
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Game Theory and Decision Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an express prediction method based on a random forest, which comprises the following steps: collecting historical data influencing express delivery singular numbers to form a sample set; screening out feature data from the sample set to form a feature set; constructing an express day delivery prediction function by using a regression prediction method; and solving the delivery amount of the express day by using a random forest method based on the feature set and the express day delivery prediction function. The method is mainly applied to an express service platform, the express singular number can be predicted on the premise of high rapidness and low cost by utilizing the characteristic data to carry out model training, the daily average express quantity can be predicted quickly and accurately by acquiring a random forest algorithm, so that the platform can know the characteristics and the requirements of users more, and then couriers and express vehicles can be dispatched flexibly, the delivery completion time of the users is reduced, and the labor and time cost of the platform are saved.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to an express prediction method based on a random forest.
Background
With the development of social modernization and informatization, the data volume generated by the information management in the express platform is more and more huge. While the technologies such as artificial intelligence and big data analysis are developed, each big unit increasingly pays more attention to the realization of business management by using intelligent products and improves the business handling efficiency by using mass data. However, in the prior art, especially inside the express platform, still there are a lot of mechanized and blind pain points for handling business. How to reduce manpower, material resources and financial resources by using intelligent products in the platform is worthy of deep research. In addition, when a decision maker faces mass data, how to quickly find effective information, search rules in the mass data, and quickly master important information and data details which affect business key characteristics and the like, so that the important point of low efficiency and even wrong study and judgment is avoided.
Disclosure of Invention
The invention provides an express forecasting method based on a random forest, aiming at the problem of waste of express resource allocation such as manpower, material resources and financial resources in the existing large-scale unit, and the express forecasting method can realize the forecasting of express singular number on the premise of high speed, rapidness and low cost. In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
an express prediction method based on a random forest comprises the following steps:
s1, collecting historical data influencing express delivery singularity to form a sample set;
s2, screening out feature data from the sample set to form a feature set;
s3, constructing an express day delivery prediction function by using a regression prediction method;
and S4, solving the delivery quantity of the express delivery day by using a random forest method based on the feature set established in the step S2 and the express delivery day delivery prediction function established in the step S3.
The step S1 includes the following steps:
s1.1, collecting express delivery data sets within a plurality of days of a certain unit;
s1.2, cleaning the sample set collected in the step S1.1 by adopting a mean value substitution method;
and S1.3, summing the express delivery number of each time period every day according to the cleaned sample set to calculate the daily delivery number of the unit.
The features in the feature set include order status, sender department, and sender unit.
In step S3, the expression of the express day delivery prediction function is:
in the formula, alphanCentralizing feature x for featurenE is a random error,the final predicted delivery amount on the express day.
The step S4 includes the following steps:
s4.1, randomly extracting M samples from M samples in a sample set by adopting a bootstrap method to serve as a sub-training set to construct a decision tree, wherein M is M;
s4.2, synchronously constructing T 1 decision trees by adopting the method of the step S4.1;
s4.3, randomly selecting p features from the n features of the feature set as a node splitting subset, selecting 1 feature with the minimum p feature errors as a node splitting feature according to the square error, and keeping the node splitting until the decision tree can not be split any more, wherein n p;
s4.4, splitting the T decision trees according to the method in the step S4.3 to form a random forest;
s4.5, training each split decision tree in M samples randomly based on an express day delivery prediction function to obtain an express day delivery value corresponding to each decision tree;
and S4.6, averaging the delivery values of the express days corresponding to each decision tree to obtain a predicted value of the delivery amount of the express days.
The invention has the beneficial effects that:
the method is mainly applied to an express service platform, the express singular number can be predicted on the premise of high rapidness and low cost by utilizing the characteristic data to carry out model training, the daily average express quantity can be predicted quickly and accurately by acquiring a random forest algorithm, and the resource preparation can be made for the service in advance by pre-estimating the express singular number; the express delivery singular number forecasting effect is good, the platform can know user characteristics and demands better, accordingly, couriers and express delivery vehicles can be dispatched flexibly, the delivery completion time of users is shortened, and therefore platform manpower and time cost are saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a prior art random forest.
FIG. 2 is a graph of the predicted results of the present invention.
FIG. 3 is a graph of the predicted results of a model based on a logistic regression algorithm.
FIG. 4 is a graph of the predicted results of a model based on the minimum absolute shrinkage and selection algorithm.
FIG. 5 is a schematic flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
An express prediction method based on random forests is shown in fig. 5 and comprises the following steps:
s1, collecting historical data influencing express delivery singularity to form a sample set;
s1.1, collecting express delivery data sets within a plurality of days of a certain unit;
the express delivery data set comprises a sender unit, a sender department, the number of express deliveries in each time period every day and the order state of each express. The unit name of the sender is the unit name of the sender, and the sender department refers to the department where the sender is located; the order status includes whether the delivery, in transit, order completed.
S1.2, cleaning the sample set collected in the step S1.1 by adopting a mean value substitution method;
the cleaning comprises cleaning missing values, cleaning contents inconsistent with the original data types, cleaning out unnecessary data and cleaning logic error data. The logical error data refers to problem data which can be found through simple logical reasoning.
And S1.3, summing the express delivery number of each time period every day according to the cleaned sample set to obtain the unit daily delivery number.
In this embodiment, the excel is used to organize the historical data, the raw data includes 7271 pieces of data and 10 fields, the content covers two units, namely, express delivery inside the unit and express delivery between the two units, and the time span is from 5 months in 2021 to 10 months in 2021.
S2, screening out feature data from the sample set to form a feature set, wherein the feature set adopts featureiExpressed, the corresponding expression is:
featurei=(x1,x2,…,xn);
in the formula, xnOne feature in the feature set is represented, and n represents the number of features in the feature set. In this embodiment, n is 3, x1Is order status, x2As sender department, x3Is a sender unit.
S3, constructing an express day delivery prediction function by using a regression prediction method;
the expression of the express day delivery prediction function is as follows:
in the formula, alphanCentralizing feature x for featurenE is a random error,the final predicted delivery amount on the express day.
S4, as shown in FIG. 1, solving the delivery quantity of express delivery day by using a random forest method based on the feature set established in the step S2 and the express delivery day prediction function established in the step S3, including the following steps:
s4.1, randomly extracting M samples from M samples in a sample set by adopting a bootstrap method to serve as a sub-training set to construct a decision tree, wherein M is M;
s4.2, synchronously constructing T 1 decision trees by adopting the method of the step S4.1;
s4.3, randomly selecting p features from the n features of the feature set as a node splitting subset, selecting 1 feature with the minimum p feature errors as a node splitting feature according to a square error, and keeping the node splitting until the decision tree can not be split;
in this embodiment, 3 features are the root node and the content node, the delivery number per day is the output, i.e., the leaf node, and n p.
S4.4, splitting the T decision trees according to the method in the step S4.3 to form a random forest;
s4.5, training each split decision tree in M samples randomly based on an express day delivery prediction function to obtain an express day delivery value corresponding to each decision tree;
and S4.6, obtaining the predicted value of the daily delivery amount of the express by taking the mean value of the daily delivery values of the express corresponding to each decision tree.
The random forest comprises a plurality of decision trees, and a decision tree set is constructed by utilizing a Bootstrap idea, namely, replaced samples form a training set. The random forest is insensitive to noise in a training set and has the characteristic of high training speed, and the model can be trained in parallel by adopting the random forest, so that the training speed is increased, and the effects of quick training and prediction are achieved. Since the random forest is based on multiple decision trees, its algorithm is more robust than a single decision tree algorithm.
The loss function is constructed by Mean Squared Error (MSE) and Mean Absolute Error (MAE), and the difference degree between the prediction algorithm and actual data is judged through the loss function, so that the quality degree of the model can be measured.
The mean square error is calculated as follows:
wherein l ssM ERepresenting the mean square error loss function, m representing the total number of samples, ziThe number of actual day express unions corresponding to the ith sample is represented;
the calculation formula of the average absolute error is as follows:
wherein l ssM EThe mean absolute error loss function is represented.
As shown in fig. 2 to fig. 4, the present application is compared with Logistic Regression (LR) and Least Absolute Shrinkage and Selection (LASSO) algorithm, and as shown in table 1 below, the experiment shows that the present application has the best effect on MSE and MAE indexes.
TABLE 1 comparison of the results
A sender applies for express delivery through an express platform, a courier receives an order to obtain a courier and distributes the courier to a designated area, a receiver signs up to complete orders and the like, the application is applied, taking the Tianjin university comprehensive service platform researched and developed in the mode as an example, the year is from 2019 to 2020, and when the year is 2020, 874 orders are accumulated, and the reservation waiting time of the receiver and the courier is saved by 5 minutes, so that the total time is saved by about 72 hours, and the platform labor and time cost is greatly saved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (5)
1. An express prediction method based on a random forest is characterized by comprising the following steps:
s1, collecting historical data influencing express delivery singularity to form a sample set;
s2, screening out feature data from the sample set to form a feature set;
s3, constructing an express day delivery prediction function by using a regression prediction method;
and S4, solving the delivery quantity of the express delivery day by using a random forest method based on the feature set established in the step S2 and the express delivery day delivery prediction function established in the step S3.
2. The random forest based express prediction method of claim 1, wherein the step S1 comprises the following steps:
s1.1, collecting express delivery data sets within a plurality of days of a certain unit;
s1.2, cleaning the sample set collected in the step S1.1 by adopting a mean value substitution method;
and S1.3, summing the express delivery number of each time period every day according to the cleaned sample set to calculate the daily delivery number of the unit.
3. The random forest based courier prediction method of claim 1, wherein the features in the feature set include order status, sender department, sender unit.
5. The random forest based express prediction method of claim 1, wherein the step S4 comprises the following steps:
s4.1, randomly extracting M samples from M samples in a sample set by adopting a bootstrap method to serve as a sub-training set to construct a decision tree, wherein M is larger than M;
s4.2, synchronously constructing T-1 decision trees by adopting the method of the step S4.1;
s4.3, randomly selecting p features from the n features of the feature set as a node splitting subset, selecting 1 feature with the minimum error of the p features as a node splitting feature according to the square error, and keeping the node splitting until the decision tree can not be split again, wherein n is larger than p;
s4.4, splitting the T decision trees according to the method in the step S4.3 to form a random forest;
s4.5, training each split decision tree in M samples randomly based on an express day delivery prediction function to obtain an express day delivery value corresponding to each decision tree;
and S4.6, averaging the delivery values of the express days corresponding to each decision tree to obtain a predicted value of the delivery amount of the express days.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210173732.5A CN114548565A (en) | 2022-02-24 | 2022-02-24 | Express prediction method based on random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210173732.5A CN114548565A (en) | 2022-02-24 | 2022-02-24 | Express prediction method based on random forest |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114548565A true CN114548565A (en) | 2022-05-27 |
Family
ID=81677175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210173732.5A Pending CN114548565A (en) | 2022-02-24 | 2022-02-24 | Express prediction method based on random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114548565A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961165A (en) * | 2017-12-25 | 2019-07-02 | 顺丰科技有限公司 | Part amount prediction technique, device, equipment and its storage medium |
CN112434847A (en) * | 2020-11-17 | 2021-03-02 | 上海东普信息科技有限公司 | Express delivery quantity prediction method, device, equipment and storage medium based on LSTM model |
CN113240185A (en) * | 2021-05-25 | 2021-08-10 | 天津大学 | County carbon emission prediction method based on random forest |
-
2022
- 2022-02-24 CN CN202210173732.5A patent/CN114548565A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961165A (en) * | 2017-12-25 | 2019-07-02 | 顺丰科技有限公司 | Part amount prediction technique, device, equipment and its storage medium |
CN112434847A (en) * | 2020-11-17 | 2021-03-02 | 上海东普信息科技有限公司 | Express delivery quantity prediction method, device, equipment and storage medium based on LSTM model |
CN113240185A (en) * | 2021-05-25 | 2021-08-10 | 天津大学 | County carbon emission prediction method based on random forest |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110134786A (en) | A kind of short text classification method based on theme term vector and convolutional neural networks | |
CN105956015A (en) | Service platform integration method based on big data | |
CN107563645A (en) | A kind of Financial Risk Analysis method based on big data | |
CN107958080A (en) | A kind of big data report processing method based on ElasticSearch | |
CN111508598B (en) | Respiratory disease outpatient service quantity prediction method | |
CN107577724A (en) | A kind of big data processing method | |
US8996436B1 (en) | Decision tree classification for big data | |
US9147168B1 (en) | Decision tree representation for big data | |
CN107944465A (en) | A kind of unsupervised Fast Speed Clustering and system suitable for big data | |
CN113240185A (en) | County carbon emission prediction method based on random forest | |
CN115470962A (en) | LightGBM-based enterprise confidence loss risk prediction model construction method | |
Nkongolo | Using arima to predict the growth in the subscriber data usage | |
CN111062511B (en) | Aquaculture disease prediction method and system based on decision tree and neural network | |
CN115794803A (en) | Engineering audit problem monitoring method and system based on big data AI technology | |
CN115795131A (en) | Electronic file classification method and device based on artificial intelligence and electronic equipment | |
Si et al. | Optimization of regional forestry industrial structure and economic benefit based on deviation share and multi-level fuzzy comprehensive evaluation | |
CN111815485A (en) | Sentencing prediction method and device based on deep learning BERT model | |
CN111353625B (en) | Method, device, computer equipment and storage medium for predicting net point quantity | |
CN117314266B (en) | Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism | |
CN110796485A (en) | Method and device for improving prediction precision of prediction model | |
Sawalha et al. | Towards an efficient big data management schema for IoT | |
CN112650948B (en) | Information network construction method, system and application for education informatization evaluation | |
CN114548565A (en) | Express prediction method based on random forest | |
CN116992350A (en) | Industrial supply chain optimization method and system based on big data | |
CN110728127A (en) | Automatic generation method of biodiversity assessment report |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |