CN114037122A - Flight delay prediction method based on big data mining processing analysis - Google Patents

Flight delay prediction method based on big data mining processing analysis Download PDF

Info

Publication number
CN114037122A
CN114037122A CN202111201726.8A CN202111201726A CN114037122A CN 114037122 A CN114037122 A CN 114037122A CN 202111201726 A CN202111201726 A CN 202111201726A CN 114037122 A CN114037122 A CN 114037122A
Authority
CN
China
Prior art keywords
flight
data
flight delay
model
delay prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111201726.8A
Other languages
Chinese (zh)
Inventor
张健翔
宋文贤
钟丹阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Civil Aviation Cares Co ltd
Original Assignee
Qingdao Civil Aviation Cares Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Civil Aviation Cares Co ltd filed Critical Qingdao Civil Aviation Cares Co ltd
Priority to CN202111201726.8A priority Critical patent/CN114037122A/en
Publication of CN114037122A publication Critical patent/CN114037122A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a flight delay prediction method based on big data mining processing analysis, which comprises the following steps: collecting multi-dimensional original data, B, combining preliminary characteristic variables to construct characteristic engineering data, obtaining importance weights of characteristics by adopting random forests, further obtaining a characteristic set, C, combining a mode of multiple grid searches and K-fold verification according to model parameter combinations based on different airport data adaptation optimization, D, inputting samples to be predicted to a final flight delay prediction model, and obtaining flight delay data. According to the method, multi-dimensional original data are comprehensively collected, feature combinations with high importance weight are selected, the optimized model parameter combinations are adapted according to different airport data through a mode of combining grid search and K-fold verification for multiple times, the prediction model is trained by means of a Catboost algorithm to obtain a final flight delay prediction model, and then a sample to be predicted is input to obtain a flight delay condition with high accuracy.

Description

Flight delay prediction method based on big data mining processing analysis
Technical Field
The invention relates to the field of flight delay management, in particular to a flight delay prediction method based on big data mining processing analysis.
Background
In recent years, with the rapid increase of the demand of Chinese flights, the flight delay phenomenon becomes more serious. The airport and the air traffic control department are urgently needed to pre-judge the large-area flight delay which possibly occurs, and effective measures are taken in time to reduce the flight delay loss; therefore, the flight delay can be accurately and effectively predicted, and the method has important significance for operation departments. Flight delay is influenced by various factors in a crossed manner, so that flight delay data are distributed irregularly, and flight delay time is difficult to predict accurately from the perspective of traditional statistics. In most of the current flight delay prediction technologies, although models such as time series, autoregressive, dynamic optimization, simulation and the like can realize prediction of flight services, the models have fewer consideration factors, the assumption tends to be ideal, the actual situation cannot be well reflected, as more potential influence factors exist for flight delay, and a mathematical model based on certain assumption and neglected certain conditions is more limited in delay prediction. The traditional method for processing the category type variables mostly uses simple one-hot encoding, label encoding or binary encoding and other methods to seriously lack the mining of the category type variables, a complete set of characteristic engineering is not established, and the model algorithm has the over-fitting problem.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a flight delay prediction method based on big data mining processing analysis, which comprehensively collects multi-dimensional original data, selects a feature combination with higher importance weight, and obtains a final flight delay prediction model by combining multiple grid searches and K-fold verification according to a model parameter combination based on different airport data adaptation optimization, wherein the prediction model is trained by means of a Catboost algorithm, and then a flight delay condition with high accuracy can be obtained by inputting a sample to be predicted.
The purpose of the invention is realized by the following technical scheme:
a flight delay prediction method based on big data mining processing analysis comprises the following steps:
A. data preprocessing: collecting multi-dimensional original data, wherein the original data comprises flight air position information, weather data, flight basic information, flight statistical information and airport related information, cleaning the original data to obtain cleaned original data, dividing the cleaned original data into a training set and a verification set, and performing frequency statistics, proportion calculation and labeling on the cleaned original data to obtain a primary characteristic variable;
B. combining the preliminary characteristic variables to construct characteristic engineering data, wherein the characteristic engineering data comprises a plurality of characteristics; constructing a flight delay prediction model, performing feature importance weight calculation on cleaned original data according to feature engineering data by adopting random forest training in the flight delay prediction model to obtain the importance weight of each feature, sequencing the importance weights of the features in sequence, selecting the features in sequence until the sum of the importance weights of the selected features is greater than or equal to 95%, stopping selection, and constructing the selected features to obtain a feature set;
C. selecting the best flight delay prediction model according to the model parameter set based on different airport data adaptation optimization by combining multiple grid search and K-fold verification; b, the selected flight delay prediction model adopts a Catboost algorithm to carry out model training weakening overfitting, and a feature set with the sum of feature importance weights being more than or equal to 95% is selected according to the method in the step B, so that a final flight delay prediction model is obtained;
D. and inputting a sample to be predicted into the final flight delay prediction model and obtaining flight delay data, wherein the flight delay data comprises flight delay time and probability.
Preferably, the feature engineering data is classified according to flight air position information, weather data, flight basic information, flight statistical information and airport related information as follows: the flight air position data is characterized by longitude, latitude, altitude, direction and speed; the weather data is characterized by wind speed, visibility, temperature, humidity, weather description and cloud height; the flight basic information corresponding characteristics are planned take-off time, planned arrival time, actual take-off time of a flight, actual landing time of the flight, predicted take-off time of the flight, predicted arrival time of the flight, flight time, flight mileage, three-character codes of a departure airport, three-character codes of an arrival airport, an airline company, age of the airplane, machine type, airway information and a passing navigation point; the flight statistical information corresponding characteristics are flight punctuality rate, flight passenger seat rate, physical aircraft frequent flight route and airway information, and flight quantity related to the same route and airway; the relevant airport information is characterized by longitude, latitude, altitude information and airport runway information of departure or arrival at the airport.
Preferably, the Catboost algorithm in step C of the present invention comprises the following method:
the C1 and Catboost algorithm adopts classifier serial iteration, and the strong learner obtained in the previous iteration is Ft-1(x) The loss function is L (y, F)t-1(x) The weak learner h for the CART regression tree model is found in the iteration of the roundtAnd the loss function of the current round is minimized;
weak learner:
Figure BDA0003305209290000031
the loss function uses a negative gradient to fit an approximation of the loss for each round:
Figure BDA0003305209290000032
obtaining the strong learner of the round: ft(x)=Ft-1(x)+ht
Preferably, the Catboost algorithm in step C of the present invention further comprises the following method:
c2, the Catboost algorithm randomly generates s +1 different pair sequences for the training set, wherein sigma1,σ2,...,σnComputation, σ, for defining a tree structure, splitting nodes0For selecting leaf node pair values for the spanning tree structure.
Preferably, the significance weight calculation method of the features in step B of the present invention is as follows:
b1, calculating out-of-bag data error by using out-of-bag data for each decision tree through random forest training and recording the error as err1
B2, randomly extracting the data outside the bag with the sample feature i, adding gaussian white noise, and calculating the error outside the bag again? Is denoted as err2
B3, obtaining the importance formula of the feature i as follows:
Figure BDA0003305209290000033
b4, further obtaining the importance weight of the feature i:
Figure BDA0003305209290000034
preferably, the model parameter combination in step C of the present invention includes model parameters and preventive overfitting parameters, the model parameters include loss _ function, iterations, and learning _ rate, and the preventive overfitting parameters include l2_ leaf _ reg, early _ stopping _ rounds, and use _ best _ model; and the selected flight delay prediction model is configured according to the model parameters of the verification set, over-fitting configuration is prevented, and the over-fitting degree of the obtained model is the lowest.
The technical problems to be solved by the invention are that firstly, feature engineering with universality and completeness is needed, and secondly, a model algorithm capable of deeply mining category type variable information and solving the overfitting problem is needed. Aiming at the first problem, the invention aims to construct flight delay influence factor characteristic engineering which covers five aspects of flight air position data, weather data, flight basic information, flight statistical information and airport related information. Aiming at the second problem, the solution of the invention is a CATBOOST algorithm, the categorical variables are deeply mined by means of Target Statistics, cross combination, numerical Statistics, Mean Encoding and the like on the categorical variables in the algorithm, overfitting is effectively prevented by adopting a sequencing and lifting technology, and the overfitting problem is solved by means of adding feature selection, regularization L2, early termination and the like in the model construction process. According to the above thought, the specific implementation includes the following five steps as shown in fig. 1, data processing, feature engineering, feature selection, parameter automatic setting, and catbios algorithm prediction, and finally obtains an optimized model and calculates the flight delay condition.
Compared with the prior art, the invention has the following advantages and beneficial effects:
according to the method, multi-dimensional original data are comprehensively collected, feature combinations with high importance weight are selected, the optimized model parameter combinations are adapted according to different airport data through a mode of combining grid search and K-fold verification for multiple times, the prediction model is trained by means of a Catboost algorithm to obtain a final flight delay prediction model, and then a sample to be predicted is input to obtain a flight delay condition with high accuracy.
Drawings
FIG. 1 is a schematic reference flow chart of the present embodiment;
FIG. 2 is a flowchart of an Ordered boosting routine in an embodiment;
FIG. 3 is a flowchart of a program algorithm for weakening overfitting by a base classifier in the Catboost algorithm in an embodiment;
FIG. 4 is a schematic diagram of a computer program for a tree building process in the Catboost algorithm in the embodiment.
Detailed Description
The present invention will be described in further detail with reference to the following examples:
examples
As shown in fig. 1 to 4, a flight delay prediction method based on big data mining processing analysis includes the following steps:
A. data preprocessing: collecting multi-dimensional original data, wherein the original data comprises flight air position information, weather data, flight basic information, flight statistical information and airport related information, cleaning the original data to obtain cleaned original data, dividing the cleaned original data into a training set and a verification set, and performing frequency statistics, proportion calculation and labeling on the cleaned original data to obtain a primary characteristic variable;
B. combining the preliminary characteristic variables to construct characteristic engineering data, wherein the characteristic engineering data comprises a plurality of characteristics; the characteristic engineering data is classified according to flight air position information, weather data, flight basic information, flight statistical information and airport related information as follows: the flight air position data is characterized by longitude, latitude, altitude, direction and speed; the weather data is characterized by wind speed, visibility, temperature, humidity, weather description (such as light rain and snow storm) and cloud height; the flight basic information corresponding characteristics are planned take-off time, planned arrival time, actual take-off time of a flight, actual landing time of the flight, predicted take-off time of the flight, predicted arrival time of the flight, flight time, flight mileage, three-character codes of a departure airport, three-character codes of an arrival airport, an airline company, age of the airplane, machine type, airway information and a passing navigation point; the flight statistical information corresponding characteristics are flight punctuality rate, flight passenger seat rate, physical aircraft frequent flight route and airway information, and flight quantity related to the same route and airway; the relevant airport information is characterized by longitude, latitude, altitude information and airport runway information of departure or arrival at the airport.
And constructing a flight delay prediction model, performing feature importance weight calculation on cleaned original data according to feature engineering data by adopting random forest training in the flight delay prediction model to obtain the importance weight of each feature, sequencing the importance weights of the features in sequence, selecting the features in sequence until the sum of the importance weights of the selected features is greater than or equal to 95%, stopping selection, and constructing the selected features to obtain a feature set.
According to a preferred embodiment of the present invention, the preferred importance weight calculation method in step B of this embodiment is as follows:
b1, random forest training, calculating out-of-bag data error by using out-of-bag data (namely, data which is not selected) for each decision tree and recording the error as err1
B2, randomly extracting the data outside the bag with the sample feature i, adding gaussian white noise, and calculating the error outside the bag again? Is denoted as err2
B3, obtaining the importance formula of the feature i as follows:
Figure BDA0003305209290000061
b4, further obtaining the importance weight of the feature i:
Figure BDA0003305209290000062
C. and selecting the best flight delay prediction model according to the model parameter set optimized based on different airport data adaptation by combining multiple grid searches and K-fold verification (preferably, the model parameter set of the embodiment comprises model parameters and over-fitting prevention parameters, wherein the model parameters comprise loss _ function, iteration and spare _ rate, the over-fitting prevention parameters comprise l2_ leaf _ reg, early _ stopping _ rounds and use _ best _ model, and the selected flight delay prediction model is the lowest over-fitting degree obtained by configuring the model parameters and preventing over-fitting according to the verification set). And B, carrying out model training weakening overfitting on the selected flight delay prediction model by adopting a Catboost algorithm, and selecting a feature set with the sum of feature importance weights being more than or equal to 95% according to the method in the step B, thereby obtaining the final flight delay prediction model.
The CATBOOST algorithm realizes the automatic optimization of algorithm parameters, and the specific method is to realize the model parameters based on the adaptation optimization of different airport data in a mode of combining multiple grid searches with K-fold verification, such as parameters like loss _ function, iteration, learning _ rate and the like, to obtain the optimal parameter combination through multiple searches, particularly, for the parameters preventing overfitting: l2_ leaf _ reg, early _ stopping _ rounds, and use _ best _ model, and the model with the lowest overfitting degree in the training process is obtained by using the evaluation of the verification set, so that the overfitting condition is avoided to the maximum extent. In the face of many features in feature engineering, one-hot encoding, label encoding and the like are mostly adopted in the traditional method to encode the class type features, and the association relations among the class type features, between the class type features and the numerical type features, and between the class type variables and the targets cannot be deeply mined. Aiming at the problem, the method deeply excavates the information of the class-type features through effective strategies such as target variable statistics, feature combination and the like on the class-type features by a Catboost algorithm, and can reduce the occurrence of overfitting.
D. And inputting a sample to be predicted into the final flight delay prediction model and obtaining flight delay data, wherein the flight delay data comprises flight delay time and probability.
According to one embodiment of the present embodiment, the Catboost algorithm in step C comprises the following method:
the C1 and Catboost algorithm adopts classifier serial iteration, and the strong learner obtained in the previous iteration is Ft-1(x) The loss function is L (y, F)t-1(x) The weak learner h for the CART regression tree model is found in the iteration of the roundtAnd the loss function of the current round is minimized;
weak learner:
Figure BDA0003305209290000071
the loss function uses a negative gradient to fit an approximation of the loss for each round:
Figure BDA0003305209290000072
obtaining the strong learner of the round: ft(x)=Ft-1(x)+ht
In each step of iteration, the gradient of the current model is obtained by using the same data set through the loss function, then the base learner is obtained through training, but the gradient estimation deviation can be caused, and the overfitting problem of the model can be caused, the Catboost algorithm replaces a gradient estimation method in the traditional algorithm by adopting a sequencing boosting (Ordered boosting) mode, so that the gradient estimation deviation is reduced, the generalization capability of the model is improved, and the flow of the Ordered boosting program algorithm is shown in FIG. 2.
According to an embodiment of the present embodiment, the castboost algorithm in step C further includes the following method:
c2, the Catboost algorithm randomly generates s +1 different pair sequences for the training set, wherein sigma1,σ2,...,σnComputation, σ, for defining a tree structure, splitting nodes0For selecting leaf nodes for spanning tree structuresPoint pair values.
In the Catboost algorithm, the classifier is a symmetrical tree, the tree is balanced, overfitting can be weakened, prediction is accelerated, the whole flow computer program is shown in figure 3, and the tree building flow computer program in the Catboost algorithm is shown in figure 4.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. A flight delay prediction method based on big data mining processing analysis is characterized by comprising the following steps: the method comprises the following steps:
A. data preprocessing: collecting multi-dimensional original data, wherein the original data comprises flight air position information, weather data, flight basic information, flight statistical information and airport related information, cleaning the original data to obtain cleaned original data, dividing the cleaned original data into a training set and a verification set, and performing frequency statistics, proportion calculation and labeling on the cleaned original data to obtain a primary characteristic variable;
B. combining the preliminary characteristic variables to construct characteristic engineering data, wherein the characteristic engineering data comprises a plurality of characteristics; constructing a flight delay prediction model, performing feature importance weight calculation on cleaned original data according to feature engineering data by adopting random forest training in the flight delay prediction model to obtain the importance weight of each feature, sequencing the importance weights of the features in sequence, selecting the features in sequence until the sum of the importance weights of the selected features is greater than or equal to 95%, stopping selection, and constructing the selected features to obtain a feature set;
C. selecting the best flight delay prediction model according to the model parameter set based on different airport data adaptation optimization by combining multiple grid search and K-fold verification; b, the selected flight delay prediction model adopts a Catboost algorithm to carry out model training weakening overfitting, and a feature set with the sum of feature importance weights being more than or equal to 95% is selected according to the method in the step B, so that a final flight delay prediction model is obtained;
D. and inputting a sample to be predicted into the final flight delay prediction model and obtaining flight delay data, wherein the flight delay data comprises flight delay time and probability.
2. The flight delay prediction method based on big data mining process analysis according to claim 1, characterized in that: the characteristic engineering data is classified according to flight air position information, weather data, flight basic information, flight statistical information and airport related information as follows: the flight air position data is characterized by longitude, latitude, altitude, direction and speed; the weather data is characterized by wind speed, visibility, temperature, humidity, weather description and cloud height; the flight basic information corresponding characteristics are planned take-off time, planned arrival time, actual take-off time of a flight, actual landing time of the flight, predicted take-off time of the flight, predicted arrival time of the flight, flight time, flight mileage, three-character codes of a departure airport, three-character codes of an arrival airport, an airline company, age of the airplane, machine type, airway information and a passing navigation point; the flight statistical information corresponding characteristics are flight punctuality rate, flight passenger seat rate, physical aircraft frequent flight route and airway information, and flight quantity related to the same route and airway; the relevant airport information is characterized by longitude, latitude, altitude information and airport runway information of departure or arrival at the airport.
3. The flight delay prediction method based on big data mining process analysis according to claim 1, characterized in that: the Catboost algorithm in the step C comprises the following steps:
the C1 and Catboost algorithm adopts classifier serial iteration, and the strong learner obtained in the previous iteration is Ft-1(x) The loss function is L (y, F)t-1(x) The weak learner h for the CART regression tree model is found in the iteration of the roundtAnd the loss function of the current round is minimized;
weak learner:
Figure FDA0003305209280000021
the loss function uses a negative gradient to fit an approximation of the loss for each round:
Figure FDA0003305209280000022
obtaining the strong learner of the round: ft(x)=Ft-1(x)+ht
4. A flight delay prediction method based on big data mining process analysis according to claim 3, characterized in that: the Catboost algorithm in step C further comprises the following method:
c2, the Catboost algorithm randomly generates s +1 different pair sequences for the training set, wherein sigma1,σ2,...,σnComputation, σ, for defining a tree structure, splitting nodes0For selecting leaf node pair values for the spanning tree structure.
5. The flight delay prediction method based on big data mining process analysis according to claim 1, characterized in that: the significance weight calculation method of the features in the step B is as follows:
b1, calculating out-of-bag data error by using out-of-bag data for each decision tree through random forest training and recording the error as err1
B2, randomly extracting the data outside the bag with the sample feature i, adding gaussian white noise, and calculating the error outside the bag again? Is denoted as err2
B3, obtaining the importance formula of the feature i as follows:
Figure FDA0003305209280000023
b4, further obtaining the importance weight of the feature i:
Figure FDA0003305209280000031
6. a flight delay prediction method based on big data mining process analysis according to any of claims 1 to 3, characterized in that: the model parameter combination in the step C comprises model parameters and preventive overfitting parameters, wherein the model parameters comprise loss _ function, iterations and learning _ rate, and the preventive overfitting parameters comprise l2_ leaf _ reg, early _ stopping _ rounds and use _ best _ model; and the selected flight delay prediction model is configured according to the model parameters of the verification set, over-fitting configuration is prevented, and the over-fitting degree of the obtained model is the lowest.
CN202111201726.8A 2021-10-15 2021-10-15 Flight delay prediction method based on big data mining processing analysis Pending CN114037122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111201726.8A CN114037122A (en) 2021-10-15 2021-10-15 Flight delay prediction method based on big data mining processing analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111201726.8A CN114037122A (en) 2021-10-15 2021-10-15 Flight delay prediction method based on big data mining processing analysis

Publications (1)

Publication Number Publication Date
CN114037122A true CN114037122A (en) 2022-02-11

Family

ID=80135034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111201726.8A Pending CN114037122A (en) 2021-10-15 2021-10-15 Flight delay prediction method based on big data mining processing analysis

Country Status (1)

Country Link
CN (1) CN114037122A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493053A (en) * 2022-04-18 2022-05-13 北京航空航天大学 Aviation network sweep effect inference method based on two-stage regression
CN118114576A (en) * 2024-01-19 2024-05-31 中国民用航空总局第二研究所 Flight delay figure model detection method based on Nataf transformation independence test

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345332A (en) * 2018-08-27 2019-02-15 中国民航信息网络股份有限公司 A kind of intelligent detecting method of Airline reservation malicious act
CN111612628A (en) * 2020-05-28 2020-09-01 深圳博普科技有限公司 Method and system for classifying unbalanced data sets
CN111652427A (en) * 2020-05-29 2020-09-11 航科院中宇(北京)新技术发展有限公司 Flight arrival time prediction method and system based on data mining analysis
CN112365095A (en) * 2020-12-03 2021-02-12 浙江汉德瑞智能科技有限公司 Flight delay analysis and prediction method based on weather and flow control influence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345332A (en) * 2018-08-27 2019-02-15 中国民航信息网络股份有限公司 A kind of intelligent detecting method of Airline reservation malicious act
CN111612628A (en) * 2020-05-28 2020-09-01 深圳博普科技有限公司 Method and system for classifying unbalanced data sets
CN111652427A (en) * 2020-05-29 2020-09-11 航科院中宇(北京)新技术发展有限公司 Flight arrival time prediction method and system based on data mining analysis
CN112365095A (en) * 2020-12-03 2021-02-12 浙江汉德瑞智能科技有限公司 Flight delay analysis and prediction method based on weather and flow control influence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周洁敏 等: "基于弹性神经网络的航班延误时间预测", 《航空计算技术》 *
苗丰顺 等: "基于CatBoost算法的糖尿病预测方法", 《计算机***应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493053A (en) * 2022-04-18 2022-05-13 北京航空航天大学 Aviation network sweep effect inference method based on two-stage regression
CN114493053B (en) * 2022-04-18 2022-07-08 北京航空航天大学 Aviation network sweep effect inference method based on two-stage regression
CN118114576A (en) * 2024-01-19 2024-05-31 中国民用航空总局第二研究所 Flight delay figure model detection method based on Nataf transformation independence test

Similar Documents

Publication Publication Date Title
CN110503245B (en) Prediction method for large-area delay risk of airport flight
CN109766583A (en) Based on no label, unbalanced, initial value uncertain data aero-engine service life prediction technique
CN108375808A (en) Dense fog forecasting procedures of the NRIET based on machine learning
CN108710623B (en) Airport departure delay time prediction method based on time series similarity measurement
CN114037122A (en) Flight delay prediction method based on big data mining processing analysis
CN111860989B (en) LSTM neural network short-time traffic flow prediction method based on ant colony optimization
CN110443448B (en) Bidirectional LSTM-based airplane position classification prediction method and system
CN111179592B (en) Urban traffic prediction method and system based on spatio-temporal data flow fusion analysis
CN110796315B (en) Departure flight delay prediction method based on aging information and deep learning
CN109344999A (en) A kind of runoff probability forecast method
CN110570693A (en) Flight operation time prediction method based on reliability
CN111950910B (en) Airport guarantee vehicle task scheduling method based on DBSCAN-GA
CN114118537A (en) Combined prediction method for carbon emission of airspace flight
CN115564114A (en) Short-term prediction method and system for airspace carbon emission based on graph neural network
CN117313931B (en) Subway platform passenger evacuation time prediction method based on fire scene
Yang et al. LSTM-based deep learning model for civil aircraft position and attitude prediction approach
CN116468186A (en) Flight delay time prediction method, electronic equipment and storage medium
CN112132366A (en) Prediction system for flight clearance rate
Schösser et al. On the Performance of Machine Learning Based Flight Delay Prediction–Investigating the Impact of Short-Term Features
CN113806857A (en) High-speed train energy-saving braking method based on variational graph self-encoder
CN110084413A (en) Safety of civil aviation risk index prediction technique based on PCA Yu depth confidence network
CN117953731A (en) Incoming flight flow flight plan prediction method for terminal area traffic simulation
CN113128769A (en) Intelligent flight delay prediction method based on deep learning
CN112365037A (en) Airport airspace flow prediction method based on long-term and short-term data prediction model
CN112101780A (en) Airport scene operation comprehensive evaluation method based on structure entropy weight method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Liu Xiaojiang

Inventor after: Ding Jicun

Inventor after: Zhang Jianxiang

Inventor after: Song Wenxian

Inventor after: Zhong Danyang

Inventor before: Zhang Jianxiang

Inventor before: Song Wenxian

Inventor before: Zhong Danyang

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220211