Disclosure of Invention
The invention aims to overcome the defect that flight delay early warning in the prior art can only give a prediction result hours before flight take-off, and provides a flight delay early warning method, a flight delay early warning system, electronic equipment and a medium.
The invention solves the technical problems through the following technical scheme:
the invention provides a flight delay early warning method, which comprises the following steps:
constructing a flight delay data set;
constructing characteristic data according to the flight delay data set;
generating an early warning model by adopting a mode of fusing a Gradient Boosting Decision Tree (GBDT) and a Logistic Regression (LR) model according to the characteristic data;
and generating early warning information according to the early warning model.
Preferably, the step of constructing the flight delay data set comprises:
and constructing a flight delay original data set, and performing data cleaning on the flight delay original data set to generate a flight delay data set.
Preferably, the step of constructing the characteristic data from the flight delay data set comprises:
and extracting basic flight attributes, delay information in a preset deadline, a preamble flight and weather features according to the flight delay data set to construct feature data.
Preferably, the step of generating the early warning model by adopting an XGBoost system framework and adopting a mode of fusing the gradient boost decision tree and the logistic regression model according to the feature data comprises:
the XGboost system framework constructs a binary vector according to the characteristic data, each element of the binary vector corresponds to a leaf node of a tree in the XGboost system framework, and the logistic regression model is trained according to the characteristic data and the binary vector to obtain the early warning model.
The invention also provides a flight delay early warning system which comprises a data set construction unit, a characteristic construction unit, a model generation unit and an early warning information generation unit;
the data set construction unit is used for constructing a flight delay data set;
the characteristic construction unit is used for constructing characteristic data according to the flight delay data set;
the model generation unit is used for generating an early warning model according to the characteristic data in a mode of fusion of a gradient lifting decision tree and a logistic regression model;
the early warning information generating unit is used for generating early warning information according to the early warning model.
Preferably, the data set constructing unit is further configured to construct a flight delay original data set, and perform data cleaning on the flight delay original data set to generate a flight delay data set.
Preferably, the feature construction unit is further configured to extract basic flight attributes, delay information within a preset deadline, a preamble flight, and weather features according to the flight delay data set to construct feature data.
Preferably, the gradient lifting decision tree adopts an XGboost system frame, the model generation unit is further used for adopting the XGboost system frame to construct a binary vector according to the feature data, each element of the binary vector corresponds to a leaf node of a tree in the XGboost system frame, and a logistic regression model is adopted to train according to the feature data and the binary vector to obtain the early warning model.
The invention also provides electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the computer program, the flight delay early warning method is realized.
The invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the flight delay warning method of the invention.
The positive progress effects of the invention are as follows: the invention realizes the prediction of the delay condition of the long-time flights in the future.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.
Example 1
The embodiment provides a flight delay early warning method. Referring to fig. 1, the flight delay early warning method includes the following steps:
and step S101, constructing a flight delay data set.
And S102, constructing characteristic data according to the flight delay data set.
And S103, generating an early warning model by adopting a mode of fusion of a gradient lifting decision tree and a logistic regression model according to the characteristic data.
And step S104, generating early warning information according to the early warning model.
In specific implementation, in step S101, a flight delay original data set is constructed, and the flight delay original data set is subjected to data cleaning to generate a flight delay data set. In the flight delay original data set, if the arrival delay time of a flight is more than 30 minutes, the flight is considered to be delayed and marked as a positive sample needing prediction and identification, and if the arrival delay time of the flight is less than or equal to 30 minutes, the flight is marked as a negative sample. The data cleaning comprises the following steps: and (4) the shared flights are normalized, namely the shared flights exist in part of flights, only the records of the main carrier flights are kept, and the records of the shared flights are deleted. The data cleaning further comprises: and canceling the flight processing, namely deleting the record of canceling the flight. The data cleaning further comprises: and (4) carrying out reserve landing and return flight processing, namely reserving the flight record of the reserve landing and return flight and marking the flight record as a positive sample.
In step S102, the reason for flight delay is classified into severe weather, advanced flight late arrival and airspace flow control 3, and the following 4 types of features are constructed in combination with the professional knowledge of the civil aviation industry: flight basic attributes: such as flight number, departure and landing airport, affiliated boss, month, etc. Secondly, short-term delay condition: such as a delay of approximately 14 days at the airport. Third, flight: such as the interval between the flight scheduled departure time and the scheduled arrival time of the preceding flight, etc. Weather characteristics: such as airport visibility, rainfall, wind speed and direction, etc.
In step S103, in order to delay prediction of a flight that takes off in a long period of time in the future, a model prediction date (predictdate), a flight planned departure date (takeoffdate), and a prediction gap (gap-predictdate) are defined. According to the difference of the prediction gap, the model is divided into 3 sub-models: the system comprises a sub-model I (gap is more than or equal to 0 and less than or equal to 2), a sub-model II (gap is more than or equal to 3 and less than or equal to 10) and a sub-model III (gap is more than or equal to 11), wherein the sub-model I uses the weather characteristics of the hour level, the sub-model II uses the weather characteristics of the day level, and the sub-model III does not use the weather characteristics.
In step S103, a method of fusion of a gradient lifting decision tree and a logistic regression model is adopted, wherein an XGBoost system framework is used for GBDT. The concrete implementation is as follows: the method includes the steps that firstly, original features are input into XGboost, new features are constructed by using a tree learned by the XGboost, the new features are binary vectors, and each element of the vectors corresponds to a leaf node of the tree in the XGboost. And secondly, putting the original features and the new features into LR for training to obtain a final early warning model.
The early warning model is an off-line model, the trained early warning model is deployed on a Zeus (a data development platform) big data development platform, the input of the sub-model I is updated once every 4 hours, the input of the sub-model II and the sub-model III is updated once every day, and the prediction results are gathered and fall into a hive (a data warehouse tool) table for production calling.
This embodiment fuses the data of a plurality of channels, a plurality of dimensions, and application big data and machine learning technique predict the delay condition of future longer time flight, and the better planning journey of user of help taking journey reduces because the loss that the flight delay caused, promotes user experience, if: (1) when the passenger carries the journey to shop for the ticket, the delay probability of the flight is displayed to the passenger. (2) When the passenger changes the sign due to the flight change, the passenger is recommended the flight with low delay probability.
Example 2
The invention also provides a flight delay early warning system. Referring to fig. 2, the flight delay early warning system includes a data set constructing unit 201, a feature constructing unit 202, a model generating unit 203, and an early warning information generating unit 204.
The data set construction unit 201 is used to construct flight delay data sets. The feature construction unit 202 is configured to construct feature data from the flight delay data set. The model generating unit 203 is configured to generate an early warning model according to the feature data by using a gradient lifting decision tree and logistic regression model fusion method. The warning information generating unit 204 is configured to generate warning information according to the warning model.
In specific implementation, the data set constructing unit 201 constructs a flight delay original data set, and performs data cleaning on the flight delay original data set to generate a flight delay data set. In the flight delay original data set, if the arrival delay time of a flight is more than 30 minutes, the flight is considered to be delayed and marked as a positive sample needing prediction and identification, and if the arrival delay time of the flight is less than or equal to 30 minutes, the flight is marked as a negative sample. The data cleaning comprises the following steps: and (4) the shared flights are normalized, namely the shared flights exist in part of flights, only the records of the main carrier flights are kept, and the records of the shared flights are deleted. The data cleaning further comprises: and canceling the flight processing, namely deleting the record of canceling the flight. The data cleaning further comprises: and (4) carrying out reserve landing and return flight processing, namely reserving the flight record of the reserve landing and return flight and marking the flight record as a positive sample.
The feature construction unit 202 classifies the reasons of flight delay into 3 types of severe weather, advanced flight late arrival and airspace flow control, and constructs the following 4 types of features by combining with the professional knowledge of the civil aviation industry: flight basic attributes: such as flight number, departure and landing airport, affiliated boss, month, etc. Secondly, short-term delay condition: such as a delay of approximately 14 days at the airport. Third, flight: such as the interval between the flight scheduled departure time and the scheduled arrival time of the preceding flight, etc. Weather characteristics: such as airport visibility, rainfall, wind speed and direction, etc.
In order to delay prediction of a flight taking off in a longer period of time in the future, the model generation unit 203 defines a model prediction date (predictdate), a flight planned take-off date (takeoffdate), and a prediction gap (gap-predictdate). According to the difference of the prediction gap, the model is divided into 3 sub-models: the system comprises a sub-model I (gap is more than or equal to 0 and less than or equal to 2), a sub-model II (gap is more than or equal to 3 and less than or equal to 10) and a sub-model III (gap is more than or equal to 11), wherein the sub-model I uses the weather characteristics of the hour level, the sub-model II uses the weather characteristics of the day level, and the sub-model III does not use the weather characteristics.
The model generation unit 203 adopts a method of fusion of a gradient lifting decision tree and a logistic regression model, wherein the GBDT adopts an XGboost system framework. The concrete implementation is as follows: the method includes the steps that firstly, original features are input into XGboost, new features are constructed by using a tree learned by the XGboost, the new features are binary vectors, and each element of the vectors corresponds to a leaf node of the tree in the XGboost. And secondly, putting the original features and the new features into LR for training to obtain a final early warning model.
The early warning model is an off-line model, the trained early warning model is deployed on a Zeus big data development platform, the input of the sub-model I is updated once every 4 hours, the input of the sub-model II and the input of the sub-model III are updated once every day, and the prediction results are gathered and fall into a hive table for production calling.
This embodiment fuses the data of a plurality of channels, a plurality of dimensions, and application big data and machine learning technique predict the delay condition of future longer time flight, and the better planning journey of user of help taking journey reduces because the loss that the flight delay caused, promotes user experience, if: (1) when the passenger carries the journey to shop for the ticket, the delay probability of the flight is displayed to the passenger. (2) When the passenger changes the sign due to the flight change, the passenger is recommended the flight with low delay probability.
Example 3
Fig. 3 is a schematic structural diagram of an electronic device provided in this embodiment. The electronic device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the flight delay early warning method of embodiment 1. The electronic device 30 shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
The electronic device 30 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of the electronic device 30 may include, but are not limited to: the at least one processor 31, the at least one memory 32, and a bus 33 connecting the various system components (including the memory 32 and the processor 31).
The bus 33 includes a data bus, an address bus, and a control bus.
The memory 32 may include volatile memory, such as Random Access Memory (RAM)321 and/or cache memory 322, and may further include Read Only Memory (ROM) 323.
Memory 32 may also include a program/utility 325 having a set (at least one) of program modules 324, such program modules 324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor 31 executes various functional applications and data processing, such as the flight delay warning method according to embodiment 1 of the present invention, by executing the computer program stored in the memory 32.
The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through input/output (I/O) interfaces 35. Also, model-generating device 30 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via network adapter 36. As shown, network adapter 36 communicates with the other modules of model-generating device 30 via bus 33. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the model-generating device 30, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 4
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the steps of the flight delay warning method of embodiment 1.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the invention can also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps of implementing the flight delay early warning method of example 1 when the program product runs on the terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.