CN114548565A - Express prediction method based on random forest - Google Patents

Express prediction method based on random forest Download PDF

Info

Publication number
CN114548565A
CN114548565A CN202210173732.5A CN202210173732A CN114548565A CN 114548565 A CN114548565 A CN 114548565A CN 202210173732 A CN202210173732 A CN 202210173732A CN 114548565 A CN114548565 A CN 114548565A
Authority
CN
China
Prior art keywords
express
delivery
random forest
day
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210173732.5A
Other languages
Chinese (zh)
Inventor
李武
张仲
王晓飞
狄筝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202210173732.5A priority Critical patent/CN114548565A/en
Publication of CN114548565A publication Critical patent/CN114548565A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an express prediction method based on a random forest, which comprises the following steps: collecting historical data influencing express delivery singular numbers to form a sample set; screening out feature data from the sample set to form a feature set; constructing an express day delivery prediction function by using a regression prediction method; and solving the delivery amount of the express day by using a random forest method based on the feature set and the express day delivery prediction function. The method is mainly applied to an express service platform, the express singular number can be predicted on the premise of high rapidness and low cost by utilizing the characteristic data to carry out model training, the daily average express quantity can be predicted quickly and accurately by acquiring a random forest algorithm, so that the platform can know the characteristics and the requirements of users more, and then couriers and express vehicles can be dispatched flexibly, the delivery completion time of the users is reduced, and the labor and time cost of the platform are saved.

Description

Express prediction method based on random forest
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to an express prediction method based on a random forest.
Background
With the development of social modernization and informatization, the data volume generated by the information management in the express platform is more and more huge. While the technologies such as artificial intelligence and big data analysis are developed, each big unit increasingly pays more attention to the realization of business management by using intelligent products and improves the business handling efficiency by using mass data. However, in the prior art, especially inside the express platform, still there are a lot of mechanized and blind pain points for handling business. How to reduce manpower, material resources and financial resources by using intelligent products in the platform is worthy of deep research. In addition, when a decision maker faces mass data, how to quickly find effective information, search rules in the mass data, and quickly master important information and data details which affect business key characteristics and the like, so that the important point of low efficiency and even wrong study and judgment is avoided.
Disclosure of Invention
The invention provides an express forecasting method based on a random forest, aiming at the problem of waste of express resource allocation such as manpower, material resources and financial resources in the existing large-scale unit, and the express forecasting method can realize the forecasting of express singular number on the premise of high speed, rapidness and low cost. In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
an express prediction method based on a random forest comprises the following steps:
s1, collecting historical data influencing express delivery singularity to form a sample set;
s2, screening out feature data from the sample set to form a feature set;
s3, constructing an express day delivery prediction function by using a regression prediction method;
and S4, solving the delivery quantity of the express delivery day by using a random forest method based on the feature set established in the step S2 and the express delivery day delivery prediction function established in the step S3.
The step S1 includes the following steps:
s1.1, collecting express delivery data sets within a plurality of days of a certain unit;
s1.2, cleaning the sample set collected in the step S1.1 by adopting a mean value substitution method;
and S1.3, summing the express delivery number of each time period every day according to the cleaned sample set to calculate the daily delivery number of the unit.
The features in the feature set include order status, sender department, and sender unit.
In step S3, the expression of the express day delivery prediction function is:
Figure BDA0003518308980000011
in the formula, alphanCentralizing feature x for featurenE is a random error,
Figure BDA0003518308980000021
the final predicted delivery amount on the express day.
The step S4 includes the following steps:
s4.1, randomly extracting M samples from M samples in a sample set by adopting a bootstrap method to serve as a sub-training set to construct a decision tree, wherein M is M;
s4.2, synchronously constructing T 1 decision trees by adopting the method of the step S4.1;
s4.3, randomly selecting p features from the n features of the feature set as a node splitting subset, selecting 1 feature with the minimum p feature errors as a node splitting feature according to the square error, and keeping the node splitting until the decision tree can not be split any more, wherein n p;
s4.4, splitting the T decision trees according to the method in the step S4.3 to form a random forest;
s4.5, training each split decision tree in M samples randomly based on an express day delivery prediction function to obtain an express day delivery value corresponding to each decision tree;
and S4.6, averaging the delivery values of the express days corresponding to each decision tree to obtain a predicted value of the delivery amount of the express days.
The invention has the beneficial effects that:
the method is mainly applied to an express service platform, the express singular number can be predicted on the premise of high rapidness and low cost by utilizing the characteristic data to carry out model training, the daily average express quantity can be predicted quickly and accurately by acquiring a random forest algorithm, and the resource preparation can be made for the service in advance by pre-estimating the express singular number; the express delivery singular number forecasting effect is good, the platform can know user characteristics and demands better, accordingly, couriers and express delivery vehicles can be dispatched flexibly, the delivery completion time of users is shortened, and therefore platform manpower and time cost are saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a prior art random forest.
FIG. 2 is a graph of the predicted results of the present invention.
FIG. 3 is a graph of the predicted results of a model based on a logistic regression algorithm.
FIG. 4 is a graph of the predicted results of a model based on the minimum absolute shrinkage and selection algorithm.
FIG. 5 is a schematic flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
An express prediction method based on random forests is shown in fig. 5 and comprises the following steps:
s1, collecting historical data influencing express delivery singularity to form a sample set;
s1.1, collecting express delivery data sets within a plurality of days of a certain unit;
the express delivery data set comprises a sender unit, a sender department, the number of express deliveries in each time period every day and the order state of each express. The unit name of the sender is the unit name of the sender, and the sender department refers to the department where the sender is located; the order status includes whether the delivery, in transit, order completed.
S1.2, cleaning the sample set collected in the step S1.1 by adopting a mean value substitution method;
the cleaning comprises cleaning missing values, cleaning contents inconsistent with the original data types, cleaning out unnecessary data and cleaning logic error data. The logical error data refers to problem data which can be found through simple logical reasoning.
And S1.3, summing the express delivery number of each time period every day according to the cleaned sample set to obtain the unit daily delivery number.
In this embodiment, the excel is used to organize the historical data, the raw data includes 7271 pieces of data and 10 fields, the content covers two units, namely, express delivery inside the unit and express delivery between the two units, and the time span is from 5 months in 2021 to 10 months in 2021.
S2, screening out feature data from the sample set to form a feature set, wherein the feature set adopts featureiExpressed, the corresponding expression is:
featurei=(x1,x2,…,xn);
in the formula, xnOne feature in the feature set is represented, and n represents the number of features in the feature set. In this embodiment, n is 3, x1Is order status, x2As sender department, x3Is a sender unit.
S3, constructing an express day delivery prediction function by using a regression prediction method;
the expression of the express day delivery prediction function is as follows:
Figure BDA0003518308980000031
in the formula, alphanCentralizing feature x for featurenE is a random error,
Figure BDA0003518308980000032
the final predicted delivery amount on the express day.
S4, as shown in FIG. 1, solving the delivery quantity of express delivery day by using a random forest method based on the feature set established in the step S2 and the express delivery day prediction function established in the step S3, including the following steps:
s4.1, randomly extracting M samples from M samples in a sample set by adopting a bootstrap method to serve as a sub-training set to construct a decision tree, wherein M is M;
s4.2, synchronously constructing T 1 decision trees by adopting the method of the step S4.1;
s4.3, randomly selecting p features from the n features of the feature set as a node splitting subset, selecting 1 feature with the minimum p feature errors as a node splitting feature according to a square error, and keeping the node splitting until the decision tree can not be split;
in this embodiment, 3 features are the root node and the content node, the delivery number per day is the output, i.e., the leaf node, and n p.
S4.4, splitting the T decision trees according to the method in the step S4.3 to form a random forest;
s4.5, training each split decision tree in M samples randomly based on an express day delivery prediction function to obtain an express day delivery value corresponding to each decision tree;
and S4.6, obtaining the predicted value of the daily delivery amount of the express by taking the mean value of the daily delivery values of the express corresponding to each decision tree.
The random forest comprises a plurality of decision trees, and a decision tree set is constructed by utilizing a Bootstrap idea, namely, replaced samples form a training set. The random forest is insensitive to noise in a training set and has the characteristic of high training speed, and the model can be trained in parallel by adopting the random forest, so that the training speed is increased, and the effects of quick training and prediction are achieved. Since the random forest is based on multiple decision trees, its algorithm is more robust than a single decision tree algorithm.
The loss function is constructed by Mean Squared Error (MSE) and Mean Absolute Error (MAE), and the difference degree between the prediction algorithm and actual data is judged through the loss function, so that the quality degree of the model can be measured.
The mean square error is calculated as follows:
Figure BDA0003518308980000041
wherein l ssM ERepresenting the mean square error loss function, m representing the total number of samples, ziThe number of actual day express unions corresponding to the ith sample is represented;
the calculation formula of the average absolute error is as follows:
Figure BDA0003518308980000042
wherein l ssM EThe mean absolute error loss function is represented.
As shown in fig. 2 to fig. 4, the present application is compared with Logistic Regression (LR) and Least Absolute Shrinkage and Selection (LASSO) algorithm, and as shown in table 1 below, the experiment shows that the present application has the best effect on MSE and MAE indexes.
Figure BDA0003518308980000043
TABLE 1 comparison of the results
A sender applies for express delivery through an express platform, a courier receives an order to obtain a courier and distributes the courier to a designated area, a receiver signs up to complete orders and the like, the application is applied, taking the Tianjin university comprehensive service platform researched and developed in the mode as an example, the year is from 2019 to 2020, and when the year is 2020, 874 orders are accumulated, and the reservation waiting time of the receiver and the courier is saved by 5 minutes, so that the total time is saved by about 72 hours, and the platform labor and time cost is greatly saved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (5)

1. An express prediction method based on a random forest is characterized by comprising the following steps:
s1, collecting historical data influencing express delivery singularity to form a sample set;
s2, screening out feature data from the sample set to form a feature set;
s3, constructing an express day delivery prediction function by using a regression prediction method;
and S4, solving the delivery quantity of the express delivery day by using a random forest method based on the feature set established in the step S2 and the express delivery day delivery prediction function established in the step S3.
2. The random forest based express prediction method of claim 1, wherein the step S1 comprises the following steps:
s1.1, collecting express delivery data sets within a plurality of days of a certain unit;
s1.2, cleaning the sample set collected in the step S1.1 by adopting a mean value substitution method;
and S1.3, summing the express delivery number of each time period every day according to the cleaned sample set to calculate the daily delivery number of the unit.
3. The random forest based courier prediction method of claim 1, wherein the features in the feature set include order status, sender department, sender unit.
4. The random forest based express prediction method of claim 1, wherein in step S3, the expression of the express day delivery prediction function is as follows:
Figure FDA0003518308970000011
in the formula, alphanCentralizing feature x for featurenE is a random error,
Figure FDA0003518308970000012
the final predicted delivery amount on the express day.
5. The random forest based express prediction method of claim 1, wherein the step S4 comprises the following steps:
s4.1, randomly extracting M samples from M samples in a sample set by adopting a bootstrap method to serve as a sub-training set to construct a decision tree, wherein M is larger than M;
s4.2, synchronously constructing T-1 decision trees by adopting the method of the step S4.1;
s4.3, randomly selecting p features from the n features of the feature set as a node splitting subset, selecting 1 feature with the minimum error of the p features as a node splitting feature according to the square error, and keeping the node splitting until the decision tree can not be split again, wherein n is larger than p;
s4.4, splitting the T decision trees according to the method in the step S4.3 to form a random forest;
s4.5, training each split decision tree in M samples randomly based on an express day delivery prediction function to obtain an express day delivery value corresponding to each decision tree;
and S4.6, averaging the delivery values of the express days corresponding to each decision tree to obtain a predicted value of the delivery amount of the express days.
CN202210173732.5A 2022-02-24 2022-02-24 Express prediction method based on random forest Pending CN114548565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210173732.5A CN114548565A (en) 2022-02-24 2022-02-24 Express prediction method based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210173732.5A CN114548565A (en) 2022-02-24 2022-02-24 Express prediction method based on random forest

Publications (1)

Publication Number Publication Date
CN114548565A true CN114548565A (en) 2022-05-27

Family

ID=81677175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210173732.5A Pending CN114548565A (en) 2022-02-24 2022-02-24 Express prediction method based on random forest

Country Status (1)

Country Link
CN (1) CN114548565A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961165A (en) * 2017-12-25 2019-07-02 顺丰科技有限公司 Part amount prediction technique, device, equipment and its storage medium
CN112434847A (en) * 2020-11-17 2021-03-02 上海东普信息科技有限公司 Express delivery quantity prediction method, device, equipment and storage medium based on LSTM model
CN113240185A (en) * 2021-05-25 2021-08-10 天津大学 County carbon emission prediction method based on random forest

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961165A (en) * 2017-12-25 2019-07-02 顺丰科技有限公司 Part amount prediction technique, device, equipment and its storage medium
CN112434847A (en) * 2020-11-17 2021-03-02 上海东普信息科技有限公司 Express delivery quantity prediction method, device, equipment and storage medium based on LSTM model
CN113240185A (en) * 2021-05-25 2021-08-10 天津大学 County carbon emission prediction method based on random forest

Similar Documents

Publication Publication Date Title
CN110134786A (en) A kind of short text classification method based on theme term vector and convolutional neural networks
CN105956015A (en) Service platform integration method based on big data
CN107563645A (en) A kind of Financial Risk Analysis method based on big data
CN107958080A (en) A kind of big data report processing method based on ElasticSearch
CN111508598B (en) Respiratory disease outpatient service quantity prediction method
CN107577724A (en) A kind of big data processing method
US8996436B1 (en) Decision tree classification for big data
US9147168B1 (en) Decision tree representation for big data
CN107944465A (en) A kind of unsupervised Fast Speed Clustering and system suitable for big data
CN113240185A (en) County carbon emission prediction method based on random forest
CN115470962A (en) LightGBM-based enterprise confidence loss risk prediction model construction method
Nkongolo Using arima to predict the growth in the subscriber data usage
CN111062511B (en) Aquaculture disease prediction method and system based on decision tree and neural network
CN115794803A (en) Engineering audit problem monitoring method and system based on big data AI technology
CN115795131A (en) Electronic file classification method and device based on artificial intelligence and electronic equipment
Si et al. Optimization of regional forestry industrial structure and economic benefit based on deviation share and multi-level fuzzy comprehensive evaluation
CN111815485A (en) Sentencing prediction method and device based on deep learning BERT model
CN111353625B (en) Method, device, computer equipment and storage medium for predicting net point quantity
CN117314266B (en) Novel intelligent scientific and technological talent evaluation method based on hypergraph attention mechanism
CN110796485A (en) Method and device for improving prediction precision of prediction model
Sawalha et al. Towards an efficient big data management schema for IoT
CN112650948B (en) Information network construction method, system and application for education informatization evaluation
CN114548565A (en) Express prediction method based on random forest
CN116992350A (en) Industrial supply chain optimization method and system based on big data
CN110728127A (en) Automatic generation method of biodiversity assessment report

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination