CN119903967A

CN119903967A - Method and system for predicting risk behavior of freight insurance products on online freight platforms based on explainable machine learning

Info

Publication number: CN119903967A
Application number: CN202510087674.8A
Authority: CN
Inventors: 李明政; 高倩; 余玉刚; 冯雷
Original assignee: HEFEI WEITIAN YUNTONG INFORMATION TECHNOLOGY CO LTD; Institute Of International Finance University Of Science And Technology Of China; University of Science and Technology of China USTC
Current assignee: HEFEI WEITIAN YUNTONG INFORMATION TECHNOLOGY CO LTD; Institute Of International Finance University Of Science And Technology Of China; University of Science and Technology of China USTC
Priority date: 2025-01-20
Filing date: 2025-01-20
Publication date: 2025-04-29

Abstract

The method for predicting the risk behavior of freight protection products on online freight platforms based on explainable machine learning proposed by the present invention includes the following steps: S1, obtaining original sample data and corresponding historical risk results from the online freight platform; S2, preprocessing the original sample data and dividing the data set; S3, undersampling processing based on the preprocessed data; S4, feature selection based on the processed prediction sample data; S5, construction, training and testing of multiple machine learning prediction models; S6, comparative evaluation of multiple machine learning prediction models to determine the optimal prediction model; S7, processing the prediction request by the optimal prediction model, and returning the prediction result. The system for predicting the risk behavior of freight protection products on online freight platforms based on explainable machine learning disclosed by the present invention includes: a data receiving module and a prediction module, and the prediction module contains the optimal prediction model obtained in the above step S6.

Description

Method and system for predicting network freight platform freight guarantee product dangerous behavior based on interpretable machine learning

Technical Field

The invention relates to the technical field of prediction of dangerous behaviors of freight transportation platform freight transportation guarantee products, in particular to a network freight transportation platform freight transportation guarantee product dangerous behavior prediction method and system based on interpretable machine learning.

Background

With the application of Internet technology and big data, the logistics industry resources are integrated, and the development of a network freight platform is accelerated. On a network freight platform, owners, logistics enterprises and drivers can perform autonomous matching and autonomous pricing through the platform. In the process of cargo transportation, the problem of small freight loss often occurs due to the driving of a driver and the characteristics of the cargo, and the conventional insurance often suffers from larger freight loss due to the limit of claim free amount, thereby leading to the loss of the driver of the platform. In order to reduce the loss of the freight rate of the driver and establish a benign cooperative relationship between the driver and the logistics enterprises, and further to build a benign logistics ecological circle taking a truck driver as a core, the system is used as one of the largest network freight platforms nationwide, and the road song group combined insurance company particularly promotes the small freight rate guarantee service of the driver, namely the freight rate guarantee insurance. The platform integrates data of owners, logistics enterprises and drivers, the insurance products can be priced by detecting the behavior data of the drivers on the platform, and the drivers can make a decision on whether to purchase the products according to own requirements. If the driver purchases insurance, the platform can pay for the goods loss, otherwise, the driver can bear the goods loss by himself.

Considering the benefits of the freight guarantee service, at present, two problems exist in the operation of the service:

1. Risk factors affecting the risk of the small-amount goods loss guarantee order;

2. The vehicle's dangerous behavior of a certain order cannot be predicted.

In addition, current vehicle risk prediction utilizes historical data and statistical methods to predict the probability that a vehicle may have an insurance claim event in a future period of time. Current research techniques are mainly focused on the fields of data mining, machine learning, big data analysis, etc. Researchers typically use multi-source data such as historical risk records of vehicles, driver behavior data, vehicle state information, environmental factors, and traffic conditions to identify high risk individuals or groups by constructing risk assessment models (patent literature number: CN 109784586 A;CN 112381314A;CN 118395262A). These models may include logistic regression, decision trees, random forests, support vector machines, neural networks, ensemble learning methods, and the like.

Therefore, the current vehicle risk prediction method generally needs to use multi-source data such as historical risk records of vehicles, driver behavior data, vehicle state information, environmental factors, traffic conditions and the like to identify high-risk individuals or groups by constructing a risk assessment model, but in the practical application field, the data needs to be a premise that the vehicles are already provided with a certain driving behavior detection recording device to obtain the information, but in the practical application field, only a few vehicles are provided with similar recording devices, so that enterprises who push out freight guarantee services are difficult to acquire related data and thus predict vehicle risk. And the data characteristics of the platform such as the characteristics of the item, the characteristics of the goods, the history characteristics of the driver and the like are not considered in the current vehicle risk prediction, so that the made vehicle risk prediction cannot completely and comprehensively describe the risk behaviors possibly caused by the item characteristics or the order environment factors such as the characteristics of the goods.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides the network freight platform freight guarantee product risk-escaping behavior prediction method and system based on the interpretable machine learning.

The invention provides a network freight platform freight guarantee product risk-escaping behavior prediction method based on interpretable machine learning, which comprises the following steps:

s1, acquiring original sample data and corresponding historical risk results from a network freight platform, and classifying the acquired original sample data to serve as an alternative field of a subsequent feature selection process;

S2, preprocessing original sample data and dividing data sets

Extracting orders of purchased freight guarantee products according to the original sample data obtained in the step S1, cleaning and filtering the extracted order data to delete the order data with missing values, performing numerical type conversion processing on the data in the form of text classification to enable the data to be accepted by a machine learning algorithm, and performing binary classification marking processing on corresponding historical dangerous results;

s3, undersampling processing is carried out based on the preprocessed data

Randomly undersampling to ensure that the ratio of the number of samples to be processed to the number of samples not to be processed is 1:1, establishing a prediction sample data set with complete data and standard format by the results, extracting part of sample data from the formed prediction sample data set to be used as a training set, and taking the rest of sample data as a test set;

S4, performing feature selection based on the processed prediction sample data

Constructing a prediction model by adopting a plurality of machine learning algorithms based on the prediction sample data set, calculating marginal contribution of each characteristic parameter to a prediction result of the machine learning model, obtaining characteristic importance ranking, and selecting a plurality of characteristics with highest importance to be reserved as input characteristics of a subsequent prediction model;

S5, constructing, training and testing various machine learning prediction models

Extracting a new prediction sample data set based on the characteristics selected in the step S4, extracting part of sample data from the new prediction sample data set to serve as a new training set, and taking the rest part of sample data as a new testing set;

s6, comparing and evaluating multiple machine learning prediction models

According to a preset model evaluation method, the prediction accuracy and generalization of various machine learning prediction models are compared and analyzed, and an optimal prediction model is determined;

S7, processing a prediction request by the optimal prediction model obtained in the step S6, and returning a prediction result, wherein the returned prediction result comprises prediction of an order risk-out result and independent prediction of each characteristic risk-out behavior factor in the order.

Preferably, in S1, the feature classification includes at least three major categories of "driver feature", "vehicle feature", "order feature".

Preferably, the 'driver characteristic' class at least comprises a field of a mobile phone number of a driver and a name of the driver, the 'vehicle characteristic' class at least comprises a field of a license plate number and a vehicle type, and the 'order characteristic' class at least comprises a field of an item ID, a cargo secondary classification, a starting place, a destination, mileage and a time of building a bill.

Preferably, in S2, the classification feature in text form encodes the category type data of the character string type into numerical data using LabelEncoder so that it can be accepted by the machine learning algorithm.

Preferably, in S3, random undersampling is accomplished using RandomUnderSampler functions in the imbalanced-learn library.

Preferably, in S3, the extraction of the training set sample data is completed by using a train_test_split function.

Preferably, the duty cycle of the training set in the sample data is greater than the duty cycle of the test set in the sample data.

Preferably, in S4, the marginal contribution of each feature parameter to the prediction result of the machine learning model is calculated by a feature_ importances _function built in the model.

Preferably, in S5, the extraction of the new training set sample data is done using the train_test_split function.

Preferably, the new training set has a larger duty cycle in the new sample data than the new test set.

Preferably, in S6, the model evaluation method includes:

the method comprises the steps of obtaining a plurality of prediction results of a plurality of machine learning prediction models, determining the accuracy rate, the precision rate, the recall rate and the F1 score of each prediction model based on the plurality of prediction results, and determining an optimal prediction model based on the accuracy rate, the precision rate, the recall rate and the F1 score of each prediction model.

Preferably, the method also comprises S8, multi-method interpretability comparison analysis and risk factor weighting based on an optimal prediction model, wherein the specific method is as follows:

The method comprises the steps of respectively obtaining importance sorting results of all characteristic parameters in the aspect of predicting the risk of the freight guarantee product by using a plurality of interpretability analysis methods, comparing and analyzing, and if the comparison analysis results meet a fixed condition, determining that the importance sorting results of the characteristic parameters obtained by different methods are similar, thereby determining a final importance sorting result of the characteristic parameters, generating a risk factor weighting result of the risk of the freight guarantee product on the basis of the final importance sorting result, and feeding back to a pricing strategy formulation department of a network freight platform for feedback guidance based on formulation optimization of the integral basic price of the freight guarantee product on the project side.

The invention discloses an interpretable machine learning-based network freight platform freight guarantee product risk-escaping behavior prediction system, which comprises a data receiving module and a prediction module, wherein the data receiving module is used for receiving a data;

The data receiving module is used for receiving data input by a user, checking the validity of the received data and transmitting the checked valid data to the prediction module;

The prediction module comprises the optimal prediction model obtained in the step S6, and the prediction module automatically calls the prediction model in the prediction module to process a prediction request according to the received data and returns a prediction result, wherein the returned prediction result comprises the prediction of the order risk-out result and the independent prediction of each characteristic risk-out behavior factor in the order.

The invention discloses a freight guarantee product risk-escaping behavior prediction program development method based on an optimal prediction model, which comprises the following steps:

And (3) saving the optimal prediction model obtained in the step (S6) as a model file, loading the model file by using an HTML program code to obtain a freight guarantee product risk prediction webpage end program, enabling the program to automatically call the saved prediction model according to parameter data input by a user, and obtaining a freight guarantee product risk prediction result after calculation by the prediction model, wherein the obtained freight guarantee product risk prediction result comprises prediction of an order risk result and independent prediction of risk behavior factors of various characteristics in the order.

The invention can identify risk factors and predict the risk of the order by using a machine learning algorithm based on the order characteristics recorded by the platform, including project ID, order ID, driver ID, project_driver_history accumulated quantity of fortune, cargo name, cargo secondary classification, vehicle type, mileage, order construction time, starting place, destination and the like, and can accurately identify the risk of the order by using a machine learning algorithm to explain and analyze the characteristic importance. Compared with the prior art, the invention has the following advantages:

1. The quick and accurate prediction of the network freight platform freight guarantee product risk of the order can be realized by using a machine learning method, other experiments are not needed, the risk of the order can be determined, and the labor cost and the time cost are greatly saved;

2. The characteristic parameters recorded on the platform level are subjected to characteristic selection by utilizing the machine learning characteristic engineering, so that the requirement of special hardware equipment for recording the characteristic data of drivers and vehicles in the traditional risk prediction scheme is avoided, and the method has higher practical value;

3. Aiming at the problem of weak interpretability of the traditional machine learning method, the invention combines the freight guarantee product risk prediction model with various interpretability analysis methods, overcomes the problem of weak interpretability of the traditional machine learning method, fully understands the network freight platform freight guarantee product prediction result of the machine learning model, acquires the importance ranking of the characteristic parameters and enhances the interpretability of the model;

4. The importance degree of the order feature in the prediction model can be determined from a qualitative angle, so that effective feedback guidance is carried out on the optimization of the pricing strategy of the network freight platform in practical application.

Drawings

FIG. 1 is a flow chart of a method for predicting the risk-escaping behavior of a network freight platform freight guarantee product based on interpretable machine learning;

FIG. 2 is a schematic diagram of a system for predicting the risk-escaping behavior of a network freight platform freight guarantee product based on interpretable machine learning;

fig. 3 is a wage keeping schematic diagram of the network freight platform freight guarantee product risk-out behavior prediction system based on the interpretable machine learning.

Detailed Description

Referring to fig. 1, the network freight platform freight guarantee product risk-escaping behavior prediction method based on interpretable machine learning provided by the invention comprises the following steps:

S1, acquiring original sample data and a corresponding historical risk-out result from a network freight platform, and dividing data fields into three major categories of 'driver features', 'vehicle features', 'order features', and the like, wherein the 'driver features' category at least comprises fields such as a driver mobile phone number and a driver name, the 'vehicle features' category at least comprises fields such as a license plate number and a vehicle type, the 'order features' category at least comprises fields such as an item ID, a cargo secondary classification, an origin, a destination, a mileage, a construction time and the like, and the data fields are used as alternative fields of a subsequent feature selection process. The method comprises the steps of acquiring original sample data from a network freight platform to form original sample data for predicting the dangerous behavior of freight guarantee products, and taking collected corresponding historical dangerous results as target parameter data, namely data labels, of a data set.

S2, preprocessing original sample data

And (3) extracting orders of purchased freight guarantee products according to the original sample data obtained in the step (S1), cleaning and filtering, and deleting order data where the missing values are located. And coding the type data of the character string type into numerical data by utilizing LabelEncoder aiming at the classification characteristics of the character form so as to enable the numerical data to be accepted by a machine learning algorithm, and carrying out binary classification marking processing (namely, the risk is marked as 1 and the risk is not marked as 0) on the corresponding historical risk results.

S3, undersampling processing and data set division are carried out based on the preprocessed data

Because the network freight platform freight insurance product dangerous behavior distribution has obvious unbalance that dangerous samples (2.65%) are far smaller than non-dangerous samples (97.35%), randomUnderSampler functions in imbalanced-learn libraries are adopted for random undersampling, the ratio of the number of dangerous samples to the number of non-dangerous samples is 1:1, and the sampled data are established to form a prediction sample data set with complete data and standard format.

70% Of sample data is extracted from the set of established predicted sample data as a training set and the remaining 30% of sample data is used as a test set using the train_test_split function. the train_test_split function method can automate the process of data set division, greatly saves time, and can realize random division by setting random seeds, thereby avoiding artificial preference and ensuring the randomness of the data set. the train_test_split function style is as follows:

x_train,x_test,y_train,y_test=

train_test_split(train_data,train_target,test_size,random_state,shuffle)

The method comprises the steps of dividing a training set into a plurality of test sets, wherein x_train is a characteristic parameter of the training set after division, x_test is a characteristic parameter of the test set after division, y_train is a target parameter of the training set after division, y_test is a target parameter of the test set after division, train_data is a sample characteristic parameter data set to be divided, train_target is a sample target parameter data set to be divided, test_size is the data volume of the test set, if the test set is a floating point number between 0 and 1, the test set accounts for a ratio, if the test set is an integer, the number of data in the test set is directly represented, the rest data are counted into the training set, random_state is a seed of a random number and is used for determining how the data set is randomly disturbed and divided, if different random number seeds are set each time, train_test_split functions divide the data set into the training set and the test set in different modes, if the same random number seeds are set each time, the same training set and the test set are obtained each time, if the data are used for carrying out random sequence before the data are divided, if the data are random samples, and if the data are not random, the sequence is not the data is the data, and if the sequence is the data.

In this embodiment, the training set and the test set are divided by adopting the train_test_split function method, and the related parameter setting conditions of the functions are as follows:

the trace_data is characteristic parameter data selected from a data set;

the train_target is target parameter data, namely a historical risk-out result;

test_size is 0.3 and random_state is 42.

S4, performing feature selection based on the processed experimental sample data

The prediction sample data set based on the processing is formed by adopting a Decision Tree (Decision Tree) algorithm, a random forest (RandomForest) algorithm, a K Nearest Neighbor (KNN) algorithm and a fastest gradient elevator (XGBoost) algorithm to respectively construct prediction models, calculating marginal contribution of each characteristic parameter to a machine learning model prediction result by using a feature_ importances _function built in the model, obtaining feature importance ranking, and selecting a plurality of features with highest average weight selection importance by using feature importance given by four prediction models to be used as input features of a subsequent prediction model. The specific number of feature choices may be 3, 4,5, or 6.

A new predicted sample data set is extracted based on the features selected through step S4, 70% (or 80%) of sample data is extracted from the new predicted sample data set as a new training set using the train_test_split function, and the remaining 30% (or 20%) of sample data is used as a new test set. And constructing a plurality of machine learning prediction models based on the new training set and the new testing set, wherein the machine learning prediction models comprise a Decision Tree (Decision Tree) prediction model, a random forest (RandomForest) prediction model, a K Nearest Neighbor (KNN) prediction model and a fastest gradient elevator (XGBoost) prediction model, and when determining the optimal super-parameter combination of each model, grid search super-parameter optimization is adopted, and the model performance corresponding to each super-parameter combination is determined by combining ten-fold cross verification evaluation. And respectively training and testing each prediction model by utilizing the determined optimal super-parameter combination of each machine learning prediction model, and finally outputting the freight guarantee product risk prediction result of each model.

S6, comparing and evaluating multiple machine learning prediction models

According to the two-classification model evaluation method, the prediction accuracy and generalization of various machine learning prediction models are compared and analyzed, an optimal prediction model is determined, and model evaluation indexes comprise:

Accuracy (Accuracy) the Accuracy refers to the proportion of correctly classified samples to the total number of samples.

Precision, the Precision, refers to the ratio of the number of samples correctly classified as positive by the classifier to the total number of samples output as positive by the classifier.

Recall (Recall) which refers to the ratio of the number of samples correctly classified as positive by the classifier to the total number of samples actually positive.

F1 Score (F1-Score) F1 Score considers both accuracy and recall, which is the harmonic mean of accuracy and recall.

The evaluation index calculating method comprises the following steps:

For the two classification problem, the confusion matrix is 2x2, 0 and 1 respectively, where each row represents a true value and each column represents a predicted value.

And selecting the prediction model with the highest comprehensive performance of the evaluation index as an optimal prediction model.

The traditional machine learning method is more similar to a black box model, and the problem of opacity and weak interpretability exists in the prediction result, so the method is used for carrying out the interpretability analysis on the prediction model, and has very important significance for users to know the specific prediction process of the model and determine the importance ranking of each characteristic parameter.

In model interpretability analysis, the key to determining the ranking of importance of feature parameters is to understand the degree of contribution of each feature parameter to model prediction, which is typically measured by different metrics, which correspond to different model interpretability analysis methods. The model interpretability analysis method comprises a feature_ importances _function method and a SHAP (SHAPLEY ADDITIVE exPlanations) method which are built in the model.

Feature_ importances _function methods are commonly used for tree-based models, such as decision trees, random forests, gradient lifting trees and the like, and the importance of the feature parameters is calculated by means of Gini importance or information gain in the training process of the models, so that the importance ranking of the feature parameters can be obtained through feature_ importances _attribute or method after model training is completed, and the feature parameters with the greatest influence on model prediction are identified, so that the prediction process of the models can be understood.

The SHAP method is a game theory-based method for interpreting model output results of a single prediction. The marginal contribution of each characteristic parameter to the model prediction result is defined by calculating the SHAP value corresponding to each characteristic parameter in the sample data set. SHAP values are the values assigned to each characteristic parameter in the sample data set. Because for each sample in the sample dataset, the predictive model generates a predicted value that is derived from the reference value plus the sum of the corresponding SHAP values for each of the characteristic parameters. The SHAP values corresponding to the characteristic parameters obey the following expression:

y_i＝y_base+f(x_i1)+f(x_i2)+……+f(x_ik)

Where y _i is the final model predictor for the ith sample, y _base is the baseline value of the model, i.e., the mean of all sample target parameter data in the sample dataset, x _i1 is the 1 st feature parameter of the ith sample, x _i2 is the 2 nd feature parameter of the ith sample, and so on, f (x _i1) is the SHAP value of x _i1, i.e., the contribution of the 1 st feature parameter of the ith sample to the final model predictor, f (x _i2) is the SHAP value of x _i2, i.e., the contribution of the 2 nd feature parameter of the ith sample to the final model predictor, and so on.

The SHAP method has the advantage that it can provide an intuitive explanation about the extent of influence of a characteristic parameter on model prediction, while indicating that the characteristic parameter acts positively on the final model prediction value when f (x _ik) >0, whereas indicating that the characteristic parameter acts negatively on the final model prediction value when f (x _ik) <0, indicating the positive and negative of the influence. In addition, the SHAP method can be interpreted at both global and local levels, and is applicable to various types of models, including tree models, linear models, neural networks, and the like.

In this embodiment, since the determined freight guarantee product risk prediction model is the fastest gradient elevator model, which calculates the importance of the feature parameters during training, has a built-in feature_ importances _function method, therefore, two methods, namely a feature_ importances _function and SHAP, are selected for carrying out the interpretation analysis of the model, so that the model is helped to quickly determine the risk of the order and the importance ranking of the characteristic parameters, and the pricing strategy formulation of the freight guarantee product is guided based on the result.

S8, multi-method interpretability comparison analysis and risk factor weighting based on an optimal prediction model, wherein the specific method is as follows:

Referring to fig. 2-3, the network freight platform freight guarantee product risk-escaping behavior prediction system based on interpretable machine learning disclosed by the invention comprises a data receiving module and a prediction module;

The data receiving module is used for receiving data input by a user, checking the validity of the received data and transmitting the checked valid data to the prediction module, wherein the data input by the user comprises one or more of driver characteristic data, vehicle characteristic data and order characteristic data.

And (3) saving the optimal prediction model obtained in the step (S6) as a model file, and loading by using an HTML program code to obtain a freight guarantee product risk prediction webpage end program, wherein the specific process is as follows:

The method comprises the steps of creating a Web application program and an API interface by using a lightweight Web framework Flask, storing a trained optimal prediction model into a back-end service framework for processing a prediction request, defining the prediction interface by using an @ app. Route decorator, processing a POST request, acquiring feature data input by a user, performing validity check and encoding, performing prediction by using the trained model, and returning a prediction result. The program can automatically call a prediction model according to parameter data input by a user, obtain a freight guarantee product risk prediction result after calculation, and enable the obtained freight guarantee product risk prediction result to comprise prediction of an order risk result and independent prediction of each characteristic risk behavior factor in the order.

In the steps, the model reliability is verified to be qualified, so that the developed freight guarantee product risk prediction program is used as an effective prediction tool, an intuitive and convenient interface is provided for the network freight platform, the network freight platform is helped to quickly determine the risk of the order and the importance ranking of the characteristic parameters, and the pricing strategy formulation of the freight guarantee product is guided based on the result. Meanwhile, the freight guarantee product risk-escaping behavior prediction system has the advantages of convenience, rapidness and high accuracy, a model is not required to be retrained, and a large amount of time cost is saved.

In summary, the invention can utilize a machine learning algorithm to identify risk factors and predict the risk of the order, and interpret and analyze the related feature importance, based on the order features recorded by the platform, including item ID, order ID, driver ID, item_driver_history accumulated quantity of fortune, cargo name, cargo secondary classification, vehicle type, mileage, time of order construction, starting place, destination and the like, so that the platform can accurately identify the risk of the order. Compared with the prior art, the invention has the following advantages:

1. The network freight platform is selected based on the characteristic engineering to collect other characteristic data of the order records so as to predict the order risk, so that the problem that the traditional prediction method is difficult to collect the vehicle driving data is solved;

2. the quick and accurate prediction of the network freight platform freight guarantee product risk of the order can be realized by using a machine learning method, other experiments are not needed, the risk of the order can be determined, and the labor cost and the time cost are greatly saved;

3. the characteristic parameters recorded on the platform level are subjected to characteristic selection by utilizing the machine learning characteristic engineering, so that the requirement of special hardware equipment for recording the characteristic data of drivers and vehicles in the traditional risk prediction scheme is avoided, and the method has higher practical value;

4. Aiming at the problem of weak interpretability of the traditional machine learning method, the invention combines the freight guarantee product risk prediction model with various interpretability analysis methods, overcomes the problem of weak interpretability of the traditional machine learning method, fully understands the network freight platform freight guarantee product prediction result of the machine learning model, acquires the importance ranking of the characteristic parameters and enhances the interpretability of the model;

5. The importance degree of the order feature in the prediction model can be determined from a qualitative angle, so that effective feedback guidance is carried out on the optimization of the pricing strategy of the network freight platform in practical application.

6. The HTML program for predicting the risk of the freight guarantee product can automatically predict the risk of the order according to the imported order characteristic data by one key, avoids redundant experiments, and is convenient for popularization and application of a trained prediction model in industrial practice.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. A method for predicting risk behavior of freight protection products on a network freight platform based on explainable machine learning, characterized in that it includes the following steps:

S1. Obtain original sample data and corresponding historical accident results from the online freight platform, and classify the obtained original sample data into features as candidate fields for the subsequent feature selection process;

S2. Preprocessing of original sample data and division of data sets

Extract the orders of the freight insurance product purchased from the original sample data obtained in step S1, clean and filter the extracted order data to delete the order data with missing values, convert the text classification data into numeric type so that it can be accepted by the machine learning algorithm, and perform binary classification marking on the corresponding historical insurance results;

S3. Undersampling based on preprocessed data

Perform random undersampling so that the ratio of accident samples to non-accident samples is 1:1, and use the above results to form a complete and standard-formatted prediction sample data set, and extract part of the sample data from the formed prediction sample data set as a training set, and use the rest of the sample data as a test set;

S4. Feature selection based on the processed prediction sample data

Based on the above prediction sample data set, a variety of machine learning algorithms are used to build a prediction model, and the marginal contribution of each feature parameter to the prediction result of the machine learning model is calculated to obtain the feature importance ranking, and the most important features are selected to be retained as the input features of the subsequent prediction model;

S5. Construction, training and testing of various machine learning prediction models

A new prediction sample data set is extracted based on the features selected in step S4, and part of the sample data is extracted from the new prediction sample data set as a new training set, and the rest of the sample data is used as a new test set; and based on the new training set and test set, multiple machine learning prediction models are constructed, and then the optimal hyperparameter combination of each prediction model is determined, and then the training and testing of each prediction model are respectively performed, and finally the prediction results of the freight insurance product of each model are output;

S6. Comparative evaluation of multiple machine learning prediction models

Based on the preset model evaluation method, compare and analyze the prediction accuracy and generalization of various machine learning prediction models to determine the optimal prediction model;

S7. The optimal prediction model obtained in step S6 processes the prediction request and returns the prediction result. The returned prediction result includes the prediction of the accident outcome of the order and the separate prediction of each characteristic accident behavior factor in the order.

2. The method for predicting risk behavior of freight protection products on online freight platforms based on explainable machine learning according to claim 1 is characterized in that, in S1, the feature classification includes at least three categories: "driver features", "vehicle features" and "order features";

Preferably, the "driver characteristics" class includes at least the following fields: driver's mobile phone number, driver's name; the "vehicle characteristics" class includes at least the following fields: license plate number, vehicle model; the "order characteristics" class includes at least the following fields: project ID, secondary classification of goods, origin, destination, mileage, and order creation time.

3. According to the method for predicting the risk behavior of freight insurance products on an online freight platform based on explainable machine learning in claim 1, it is characterized in that in S2, the classification features in text form use LabelEncoder to encode the categorical data of the string type into numerical data so that it can be accepted by the machine learning algorithm.

4. The method for predicting risk behavior of freight guarantee products on a network freight platform based on explainable machine learning according to claim 1 is characterized in that, in S3, random undersampling is completed using the RandomUnderSampler function in the imbalanced-learn library;

Preferably, the extraction of training set sample data is completed using the train_test_split function;

Preferably, the proportion of the training set in the sample data is greater than the proportion of the test set in the sample data.

5. According to the method for predicting risk behavior of freight insurance products on online freight platforms based on explainable machine learning in claim 1, it is characterized in that, in S4, the marginal contribution of each feature parameter to the prediction result of the machine learning model is calculated by the feature_importances_ function built into the model.

6. The method for predicting risk behavior of freight insurance products on a network freight platform based on explainable machine learning according to claim 1 is characterized in that, in S5, the extraction of new training set sample data is completed using the train_test_split function;

Preferably, the proportion of the new training set in the new sample data is greater than the proportion of the new test set in the new sample data.

7. The method for predicting risk behavior of freight protection products on a network freight platform based on explainable machine learning according to claim 1, characterized in that in S6, the model evaluation method includes:

Obtain multiple prediction results of multiple machine learning prediction models; determine the accuracy, precision, recall and F1 score of each prediction model based on the multiple prediction results; determine the optimal prediction model based on the accuracy, precision, recall and F1 score of each prediction model.

8. The method for predicting risk behavior of freight protection products on a network freight platform based on interpretable machine learning according to claim 1 is characterized in that it also includes: S8, multi-method interpretability comparative analysis and risk factor weighting based on the optimal prediction model, the specific method is as follows:

A variety of explainable analysis methods are used to obtain the importance ranking results of each feature parameter in predicting the risk of freight insurance products, and a comparative analysis is performed. If the comparative analysis results meet certain conditions, it is determined that the importance ranking results of feature parameters obtained by different methods are similar, thereby determining a final feature parameter importance ranking result, and based on this result, the weighted result of the risk factor of the freight insurance product risk behavior is generated and fed back to the pricing strategy formulation department of the online freight platform for feedback guidance on the formulation and optimization of the overall basic price of freight insurance products based on the project side.

9. A network freight platform freight insurance product risk behavior prediction system based on explainable machine learning, characterized in that it includes: a data receiving module and a prediction module;

The data receiving module is used to receive the data input by the user, and to check the validity of the received data, and to transmit the checked valid data to the prediction module;

The prediction module contains the optimal prediction model obtained in step S6 of the method for predicting the risk behavior of freight insurance products on an online freight platform based on explainable machine learning as described in any of claims 1-7. The prediction module automatically calls its internal prediction model to process the prediction request according to the received data, and returns the prediction result. The returned prediction result includes the prediction of the risk outcome of the order and the separate prediction of each characteristic risk behavior factor in the order.

10. A method for developing a risk behavior prediction program for a freight insurance product based on an optimal prediction model, characterized in that it comprises the following steps:

The optimal prediction model obtained in step S6 of the method for predicting the risk behavior of freight insurance products on a network freight platform based on explainable machine learning as described in any of claims 1-7 is saved as a model file, and loaded with HTML program code to obtain a freight insurance product risk prediction web page program, and the program is enabled to automatically call the saved prediction model according to the parameter data input by the user, and obtain the freight insurance product risk prediction result after calculation by the prediction model, and the obtained freight insurance product risk prediction result includes the prediction of the risk result of the order and the separate prediction of each characteristic risk behavior factor in the order.