CN120178700B

CN120178700B - Intelligent driving simulation test method and system for internal combustion locomotives

Info

Publication number: CN120178700B
Application number: CN202510355217.2A
Authority: CN
Inventors: 李帝呈; 张鹏; 高天伦; 王晶晶; 鲍雅轩
Original assignee: Beijing Shenzhou High Speed Rail Transit Technology Co ltd
Current assignee: Beijing Shenzhou High Speed Rail Transit Technology Co ltd
Priority date: 2025-03-25
Filing date: 2025-03-25
Publication date: 2025-11-21
Anticipated expiration: 2045-03-25
Also published as: CN120178700A

Abstract

This invention provides a simulation testing method and system for intelligent driving of internal combustion locomotives, relating to the field of intelligent driving technology. The method includes constructing a virtual simulation environment to collect driving status information and scene complexity data of the internal combustion locomotive; utilizing virtual sensors to acquire and fuse multimodal information about obstacles; constructing a deep reinforcement learning network including a predictive branch network and a dual-channel control network; and dynamically adjusting the weights of safety, comfort, and efficiency indicators based on scene complexity to generate driving control parameters. This invention can achieve a dynamic balance between efficiency and comfort in intelligent driving of internal combustion locomotives while ensuring safety, thus improving the adaptability and reliability of the intelligent driving system.

Description

Intelligent driving simulation test method and system for diesel locomotive

Technical Field

The invention relates to an intelligent driving technology, in particular to an intelligent driving simulation test method and system for an internal combustion locomotive.

Background

At present, research and development and testing of intelligent driving systems of diesel locomotives mainly depend on real-lane road testing. Although the method can obtain real driving data, the method has the problems of high cost, low efficiency, high safety risk and the like. Especially in complex scenarios and under extreme conditions, it is often difficult to fully cover in a practical environment. The existing intelligent driving simulation test method for the diesel locomotive mainly has the defects that the reality and complexity of a simulation environment are insufficient, and various complex situations in an actual driving scene are difficult to comprehensively simulate. The simulation of the perception data is not accurate enough, and particularly, the simulation data has defects in the aspect of multi-sensor fusion, so that the reliability of simulation test results is affected. The generation of driving strategies lacks dynamic adaptability to scene complexity, and optimal decisions are difficult to be made for different scenes.

Disclosure of Invention

The embodiment of the invention provides an intelligent driving simulation test method and system for an internal combustion locomotive, which can solve the problems in the prior art.

In a first aspect of an embodiment of the present invention,

The intelligent driving simulation test method for the diesel locomotive comprises the following steps:

Acquiring scene complexity data, wherein the scene complexity data comprises road curvature change rate, the number of obstacles in a scene and obstacle motion complexity;

Acquiring three-dimensional point cloud data of an obstacle through a virtual laser radar in the virtual simulation environment, acquiring distance information and relative speed information of the obstacle through a virtual millimeter wave radar, simulating measurement errors caused by multipath effects, acquiring image characteristic information of the obstacle through a virtual vision sensor, and simulating imaging quality fluctuation caused by illumination change;

The method comprises the steps of constructing a deep reinforcement learning network, setting a safety index as a network constraint condition for a conventional driving strategy, wherein the comfort and smoothness index weight is increased along with the increase of scene complexity, and the efficiency index weight is correspondingly reduced;

And inputting the driving state information, the standardized perception data and the scene complexity data into the deep reinforcement learning network to generate driving control parameters.

In an alternative embodiment of the present invention,

The generation of the standardized sensory data includes:

Three-dimensional point cloud data of the obstacle are obtained through the virtual laser radar, and filtering processing is carried out on the three-dimensional point cloud data according to the local point cloud center and the standard deviation, so that filtered point cloud data are obtained;

obtaining distance information and relative speed information of an obstacle through transmitting frequency modulation continuous wave signals by a virtual millimeter wave radar, and performing multipath effect error compensation on the distance information and the relative speed information according to time delay, attenuation coefficients and Doppler frequency shift of a plurality of propagation paths;

Obtaining image characteristic information of the obstacle through a virtual vision sensor, and carrying out illumination change compensation on the image characteristic information according to the local illumination intensity to obtain image characteristic data;

carrying out space-time alignment on the filtered point cloud data, the compensated distance information, the compensated relative speed information and the image characteristic data, and uniformly converting the data into a global coordinate system according to a time stamp;

carrying out multi-mode fusion on the data after space-time alignment, carrying out weighted feature combination according to the feature weights of all modes to obtain fusion features, and carrying out information fusion on the fusion features according to basic probability distribution and conflict factors;

And carrying out standardization processing and quality evaluation on the fused data, calculating information entropy according to the feature probability distribution, and generating standardized perception data.

In an alternative embodiment of the present invention,

The deep reinforcement learning network includes:

the prediction branch network receives position, speed and acceleration information of an obstacle in standardized perception data, track characteristics are extracted through a multi-head self-attention encoder, and a multi-mode prediction track is generated based on a cyclic neural network decoder;

Constructing a multi-mode prediction track, running state information and scene complexity data into a state space vector input dual-channel control network, wherein the dual-channel control network comprises a conventional driving channel and an emergency risk avoidance channel;

the conventional driving channel generates a conventional driving strategy based on an Actor-Critic network structure, sets a safety index as a constraint condition of the Actor-Critic network, increases comfort index weight and smoothness index weight along with scene complexity, correspondingly reduces efficiency index weight, and optimizes the conventional driving strategy through time sequence difference errors; the emergency risk avoidance channel generates a risk avoidance driving strategy based on collision risk assessment, and optimizes the risk avoidance driving strategy through safety constraint;

And carrying out weighted fusion on the conventional driving strategy and the risk avoidance driving strategy according to the dynamic fusion weight to generate driving control parameters.

In an alternative embodiment of the present invention,

Evaluating the confidence level of the conventional driving strategy and the risk avoidance driving strategy based on the space-time collision risk and the minimum safety distance, and mapping the confidence level to a dynamic fusion weight comprises:

Calculating time dimension collision probability based on a multi-mode prediction track, substituting the deviation between the time collision point of the multi-mode prediction track and expected collision time into a Gaussian distribution function, and generating a time dimension collision risk value according to track weight;

Calculating a dynamic safety distance according to the speed of the vehicle, the speed of the front vehicle, the reaction time, the maximum deceleration and the minimum deceleration, and determining the ratio of the actual vehicle distance to the dynamic safety distance as a safety margin coefficient;

the time dimension collision risk value, the reverse value of the space dimension collision risk value and the safety margin coefficient are subjected to weighted combination to generate a conventional driving strategy confidence coefficient;

substituting the conventional driving strategy confidence coefficient and the risk avoidance driving strategy confidence coefficient into a Softmax mapping function for normalization processing to generate an initial fusion weight, and performing exponential moving average processing on the initial fusion weight to generate a dynamic fusion weight.

In an alternative embodiment of the present invention,

Training of the deep reinforcement learning network includes:

Constructing training courses based on scene complexity indexes, normalizing the change rate of road curvature, the number of obstacles in the scene and the complexity of obstacle movement to generate scene difficulty scores, classifying training samples according to the scene difficulty scores, and establishing course sequences from simple to complex;

adopting a double-stage training strategy, performing behavior clone training based on demonstration data of a human driver in a pre-training stage to enable a network to obtain basic driving capability, performing training based on a course sequence in a reinforcement learning stage, and introducing an experience playback mechanism;

Constructing a dynamic course adjustment mechanism, setting training efficiency evaluation indexes including convergence speed of a strategy network and evaluation index lifting amplitude, adaptively adjusting course difficulty increasing step length and training turn of each difficulty level based on training efficiency evaluation results, and automatically backing to a training scene with difficulty level reduced by 1 to carry out supplementary training when the training efficiency is lower than a preset efficiency threshold;

setting a multi-dimensional grading standard, evaluating the control stability and the operation efficiency of a conventional driving channel under different scene complexity, evaluating the timeliness and the safety margin of a risk avoiding strategy of the emergency risk avoiding channel, and switching to a training scene with the difficulty level added with 1 when the weighted result of the multi-dimensional grading is larger than a preset grading threshold value.

In an alternative embodiment of the present invention,

The dynamic course adjustment mechanism includes:

constructing a multi-dimensional training efficiency evaluation system, taking convergence speed and evaluation index lifting amplitude as short-term efficiency indexes, and constructing long-term efficiency indexes based on generalized performance and anti-interference capacity of a strategy network in a verification scene;

establishing a distributed training architecture, distributing training tasks of the course sequence to a plurality of parallel training environments, setting different random disturbance parameters in each training environment, carrying out parallel training based on an asynchronous gradient updating mode, and comparing performance performances of strategy networks under different parallel training environments to generate a stability evaluation value of the strategy network;

Determining a course difficulty adjustment direction according to the comprehensive training efficiency value and the stability evaluation value, and increasing the course difficulty level according to the self-adaptive step length when the comprehensive training efficiency value is larger than a preset upper efficiency limit and the stability evaluation value is larger than a preset stability threshold value, and decreasing the course difficulty level by 1 when the comprehensive training efficiency value is smaller than the preset efficiency threshold value;

in the process of performing the supplementary training of the difficulty level minus 1, importance sampling is performed from the experience pool corresponding to the difficulty level to generate a supplementary training sequence;

Setting a course difficulty smooth transition mechanism, calculating the mixed weight of adjacent difficulty grades based on an exponential decay function in the difficulty grade conversion process, and dynamically combining training samples of different difficulty grades according to the mixed weight to generate a transition training sample set.

In an alternative embodiment of the present invention,

The building of the virtual simulation environment comprises the following steps:

determining geometric features and gradient features of the road based on the road parameters, setting type parameters and layout position parameters of the road facility features, setting environmental parameters of weather features;

Establishing a feature combination constraint rule base, wherein the feature combination constraint rule base comprises a matching rule of road geometric features and gradient features, a layout rule of road facility features and a combination rule of weather features;

screening out feature combinations meeting constraint conditions based on the feature combination constraint rule library, and constructing a feature adaptation degree matrix, wherein an adaptation degree value of the feature adaptation degree matrix represents the matching degree of the corresponding feature combinations;

The method for automatically combining the characteristics by adopting the layered combination method comprises the steps of selecting the geometric characteristics and gradient characteristics of the road with the highest adaptation degree value to be combined to generate a basic road section, selecting the road facility characteristics with the highest adaptation degree value on the basic road section to be laid, and selecting the weather characteristics with the highest adaptation degree value to be overlapped to generate a complete scene;

And randomly sampling the feature combination result by adopting a Monte Carlo method, and carrying out feature transfer by adopting a Markov chain to generate a composite test scene.

In an alternative embodiment of the present invention,

Randomly sampling the feature combination result by adopting a Monte Carlo method, performing feature transfer by applying a Markov chain, and generating a composite test scene comprises the following steps:

Calculating a sampling temperature parameter according to the scene importance, substituting the scene importance and the sampling temperature parameter into a Softmax function, and generating a scene sampling probability;

performing Monte Carlo sampling on the feature combination result according to the scene sampling probability to acquire scene features; dividing the scene characteristics and the scene condition constraint input condition changes from an encoder to generate mean value parameters and variance parameters;

establishing a Markov transfer matrix based on the hidden variables, and performing characteristic migration on the hidden variables based on the Markov transfer matrix to generate migrated hidden variables;

And calculating the KL divergence of the migrated scene features and the original scene features, and determining that the migrated scene features are effective composite test scenes when the KL divergence is smaller than a preset divergence threshold value.

In a second aspect of an embodiment of the present invention,

The intelligent driving simulation test system of the diesel locomotive comprises:

The system comprises a first unit, a second unit, a third unit and a fourth unit, wherein the first unit is used for constructing a virtual simulation environment, collecting running state information of an internal combustion locomotive, and acquiring scene complexity data, wherein the scene complexity data comprises a road curvature change rate, the number of obstacles in a scene and obstacle motion complexity;

The system comprises a first unit, a second unit, a third unit, a fourth unit, a fifth unit, a sixth unit, a seventh unit and a seventh unit, wherein the first unit is used for acquiring three-dimensional point cloud data of an obstacle through a virtual laser radar in the virtual simulation environment, acquiring distance information and relative speed information of the obstacle through a virtual millimeter wave radar, simulating measurement errors caused by multipath effects, acquiring image characteristic information of the obstacle through a virtual vision sensor, simulating imaging quality fluctuation caused by illumination change, and performing multi-mode fusion processing on the three-dimensional point cloud data, the distance information, the relative speed information and the image characteristic information to generate standardized perception data;

The third unit is used for constructing a deep reinforcement learning network, and comprises a prediction branch network for generating a multi-mode prediction track and a dual-channel control network for generating a driving strategy, wherein for the conventional driving strategy, a safety index is set as a network constraint condition, the comfort index weight and the smoothness index weight are increased along with the increase of scene complexity, the efficiency index weight is correspondingly reduced, for the risk avoidance driving strategy, safety constraint optimization is carried out based on collision risk assessment, and the driving state information, the standardized perception data and the scene complexity data are input into the deep reinforcement learning network to generate driving control parameters.

By constructing the virtual simulation environment and the multi-sensor fusion system, the invention can comprehensively and accurately acquire various information of the running environment of the diesel locomotive, improves the accuracy of obstacle perception and scene understanding, and provides a reliable data base for intelligent driving decision.

According to the invention, a deep reinforcement learning network architecture is adopted, and through the cooperative coordination of the prediction branch network and the dual-channel control network, the accurate prediction of the multi-mode track and the dynamic optimization of the driving strategy are realized, so that the system can adaptively adjust the control parameters according to different scene complexity, and the comfort and the efficiency are both ensured while the safety is ensured.

According to the intelligent driving system, the control strategy weight is dynamically adjusted based on the scene complexity, and the collision risk assessment mechanism is introduced to conduct safety constraint optimization, so that the driving safety and adaptability of the diesel locomotive in a complex environment are remarkably improved, meanwhile, the smoothness of driving experience is guaranteed, and the comprehensive balance of the intelligent driving system in the aspects of safety, comfort, efficiency and the like is realized.

Drawings

FIG. 1 is a schematic flow chart of an intelligent driving simulation test method for an internal combustion locomotive according to an embodiment of the invention;

FIG. 2 is a diagram of simulation results of the autopilot system of the present invention in a complex scenario;

FIG. 3 is a comparison of security performance in different scenarios;

FIG. 4 is a graph comparing a reward curve and a difficulty adjustment path for a training process.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 is a flow chart of an intelligent driving simulation test method for an internal combustion locomotive according to an embodiment of the invention, as shown in fig. 1, the method includes:

In an alternative embodiment, the generating of the normalized perceptual data comprises:

Illustratively, three-dimensional point cloud data of the obstacle is acquired by a virtual lidar. The virtual laser radar can simulate the scanning process of the actual laser radar, and point cloud data containing information such as the position, the shape and the like of the obstacle is generated. For example, for a moving car, a point cloud containing features of the body contour, windows, wheels, etc. may be obtained. And then filtering the obtained point cloud data. Specifically, the center coordinates and standard deviation of the local point cloud are calculated, and abnormal points which are too far away from the center are removed according to a set threshold value. For example, 2 times the standard deviation may be set as the threshold value, and points that are off-center beyond the threshold value may be eliminated. Therefore, noise points generated by measurement errors or environmental interference can be effectively removed, and the quality of point cloud data is improved.

Distance and relative speed information of the obstacle are acquired through the virtual millimeter wave radar. The virtual millimeter wave radar simulates and transmits a frequency modulation continuous wave signal, and measures the distance and the speed of a target by receiving an echo signal. For example, for a vehicle approaching at 60km/h 100 meters in front, measurements of distance 100m and relative speed-16.7 m/s can be obtained. And performing multipath error compensation on the acquired distance and speed information. Since electromagnetic waves are reflected and scattered during the propagation process, a plurality of propagation paths are formed, resulting in measurement errors. And in the compensation process, the time delay, the attenuation coefficient and the Doppler frequency shift of each path are considered, and the original measurement result is corrected. For example, the primary reflection path may be estimated from the environmental model, the error introduced by it calculated, and the error subtracted from the measurement.

Image characteristic information of the obstacle is acquired through the virtual vision sensor. The virtual vision sensor simulates a camera capturing an image of the surrounding environment. For the obstacle, the characteristics of color, texture, edge and the like thereof are extracted as image characteristic information. For example, for a red car, features such as car body color, car window position, car lamp shape and the like can be extracted. And performing illumination change compensation on the extracted image features. Since variations in natural illumination conditions affect the brightness and contrast of the image, compensation is required to improve the stability of the features. Specifically, the image can be subjected to brightness equalization processing according to the average brightness of the local area. For example, the image may be segmented and the brightness of each block normalized to maintain its mean within a fixed range.

And carrying out space-time alignment on the processed point cloud data, distance speed information and image characteristic data. The data is first aligned to the same instant according to the time stamps of the sensors. And then uniformly converting the data into a global coordinate system according to the installation position and the posture of each sensor relative to the vehicle body. For example, the center of the rear axle of the vehicle may be selected as the origin, a right-hand coordinate system may be established, and all data may be converted into the coordinate system.

And carrying out multi-mode data fusion. The weight is determined according to the reliability and accuracy of each mode data, and for example, the weight of 0.5 of the point cloud data, the weight of 0.3 of the distance speed information and the weight of 0.2 of the image feature can be given empirically. And then combining the weighted features to obtain a fusion feature.

And (5) carrying out information fusion on the fusion characteristics, and adopting a basic probability distribution method. And calculating the basic probability according to the confidence coefficient of each feature, and introducing a conflict factor to process contradictions of information of different sources. For example, when the point cloud displays an obstacle in front of it and the radar does not detect it, the contradiction can be mediated according to the reliability of both.

And finally, carrying out standardization processing and quality evaluation on the fused data. The dimensional features are normalized to a uniform range of values, such as the [0,1] interval. Information entropy is then calculated from the probability distribution of the features for evaluating the uncertainty of the data. The lower the information entropy, the higher the data quality. For example, data with entropy values less than 0.1 may be marked as high quality, medium quality between 0.1 and 0.5, and low quality greater than 0.5.

Through the steps, standardized perception data is finally generated, and the standardized perception data comprises comprehensive information such as the position, the speed, the shape, the appearance and the like of the obstacle and quality evaluation results of the data.

According to the invention, through multi-sensor virtual simulation and data fusion, the omnibearing and multi-angle sensing of the obstacle is realized, and the comprehensiveness and accuracy of sensing data are improved. The virtual laser radar provides accurate three-dimensional geometric information, the virtual millimeter wave radar provides reliable distance and speed measurement, the virtual vision sensor provides rich appearance characteristics, and the sensing capability is greatly enhanced by fusion of multi-source information. A series of data processing and optimizing technologies are adopted, and the quality and reliability of the perceived data are effectively improved. Abnormal points are removed by point cloud filtering, radar measurement accuracy is improved by multipath effect compensation, and stability of visual characteristics is enhanced by illumination change compensation. These processes ensure that the subsequently fused data is of higher quality. Through space-time alignment, multi-mode fusion and standardization processing, unified expression of heterogeneous data is realized, and development and application of a subsequent algorithm are facilitated. The quality assessment provides a quantization index for the reliability of the data, which is helpful to improve the accuracy and robustness of the decision. Standardized data formats also facilitate data exchange and interoperability between different systems.

In an alternative embodiment, the deep reinforcement learning network includes:

The invention provides an automatic driving control method based on deep reinforcement learning, the method comprises a prediction branch network, a dual-channel control network and a strategy fusion module.

The predicted branch network receives the normalized sensing data including the position, velocity and acceleration information of surrounding obstacles. The network employs encoders of a multi-headed self-attention mechanism to extract the trajectory characteristics of the obstacle. Specifically, the input data is divided into a plurality of subspaces, each subspace independently calculates the attention weight, and then the results of the subspaces are combined. Thus, feature information of different scales can be captured. Next, a plurality of possible predicted trajectories are generated using a recurrent neural network decoder based on long-short-time memory (LSTM) cells. For example, for a certain obstacle, various trajectory prediction results such as straight, left-turn, right-turn, etc. may be generated.

The multi-mode prediction track is combined with running state information (such as speed, acceleration, steering wheel angle and the like) of the current vehicle and scene complexity data (such as road type, traffic flow, obstacle motion complexity determined according to the curvature of the obstacle track, weighted standard deviation of speed and acceleration change and distance inverse value and the like) to construct a state space vector. The vector is passed as input to a two-channel control network, including a conventional driving channel and an emergency evacuation channel.

The conventional driving channel adopts an Actor-Critic network structure. The Actor network generates action strategies such as acceleration, deceleration, steering, etc. according to the current state. Critic networks evaluate the value of an action. During the training process, a safety index (e.g., minimum distance from other vehicles) is set as a constraint. As scene complexity increases, the weight of the comfort index (e.g., acceleration rate of change) and the smoothness index (e.g., steering angle rate of change) correspondingly increases, while the weight of the efficiency index (e.g., average speed) decreases. The conventional driving strategy is continuously optimized by calculating the time-series differential error between the actual and expected rewards.

The emergency risk avoidance channel evaluates collision risk based on the current state and generates an risk avoidance driving strategy. Specifically, using the trajectories generated by the predicted branching network, a minimum spatio-temporal distance to surrounding obstacles is calculated. And triggering the generation of the risk avoiding strategy when the distance is smaller than a preset threshold value. The optimization objective of the risk avoidance strategy is to maximize the safe distance while taking into account vehicle dynamics constraints.

The invention realizes multi-scale extraction and multi-mode track prediction of the obstacle motion characteristics through the prediction branch network combining the multi-head self-attention mechanism and the LSTM structure. The network can pay attention to the characteristic information of different time-space scales at the same time and generate a plurality of possible motion tracks including straight running, steering and the like. For example, in a complex intersection scene, the system can predict the possible steering intentions of a plurality of vehicles at the same time, and more comprehensive information support is provided for subsequent decision control. The generation and evaluation of the conventional driving strategy are realized through the Actor-Critic structure by adopting a double-channel control network architecture, and various index weights are dynamically adjusted according to scene complexity. When scene complexity increases, the system can automatically improve the weight of comfort and smoothness indexes, and reduce the efficiency index weight, so that control is more conservative. And meanwhile, an independent emergency risk avoiding channel is arranged, so that a safety-oriented risk avoiding strategy can be rapidly generated under the dangerous condition. A policy fusion mechanism based on space-time collision risk is innovatively proposed.

In an alternative embodiment, evaluating the confidence of the conventional driving strategy and the risk avoidance driving strategy based on the space-time collision risk and the minimum safety distance, and mapping the confidence to the dynamic fusion weight comprises:

Illustratively, the time-dimensional collision risk is evaluated. By performing track prediction on surrounding vehicles, a predicted track with multiple stripe weights is generated. And calculating a possible collision time point with the vehicle for each predicted track, and substituting the time difference between the time point and the expected collision time into a Gaussian distribution function. In particular, the mean value of the gaussian distribution may be set to 3 seconds and the standard deviation may be set to 1 second. And multiplying and summing the collision probability of each track with the corresponding weight to obtain a time dimension collision risk value. For example, the predicted collision time of a certain track is 2.5 seconds, the weight is 0.6, and the collision risk of the track is calculated by substituting the predicted collision time of the certain track into the predicted collision time of the track to be 0.8.

And calculating the space dimension collision risk. And establishing a position probability density function of the two-dimensional Gaussian distribution based on the current state of the vehicle, and simultaneously establishing a speed constraint function in consideration of the kinematic constraint of the vehicle. And carrying out numerical integration on the two functions in the predefined collision risk area to obtain a space dimension collision risk value. In practical applications, the dangerous area may be set to a rectangular area 20 meters in front of the vehicle, each 2 meters in the left and right, and when detecting that there is a vehicle in the area, risk calculation is performed.

And calculating a dynamic safety distance. The safe following distance is calculated to be 50 meters according to parameters such as the speed of the vehicle of 80 kilometers per hour, the speed of the front vehicle of 60 kilometers per hour, the response time of the driver of 1 second, the maximum deceleration of 8 meters per square second, the minimum deceleration of 2 meters per square second and the like. The ratio of the actual distance to the safety distance is used as a safety margin coefficient, for example, the safety margin coefficient is 0.8 when the actual distance is 40 meters.

And generating a driving strategy confidence. And carrying out weighted combination on the time dimension collision risk value 0.8, the reverse value 0.3 of the space dimension collision risk value and the safety margin coefficient 0.8, wherein weights are respectively 0.4, 0.3 and 0.3, so as to obtain the confidence coefficient 0.64 of the conventional driving strategy. And carrying out exponential decay treatment on the safety margin coefficient and carrying out weighted summation on the safety margin coefficient and the space-time dimension collision risk value to obtain the confidence coefficient of the risk avoidance driving strategy of 0.75.

Finally, weight mapping is performed. And substituting the confidence coefficient of the two strategies into the Softmax function for normalization to obtain initial fusion weights of 0.45 and 0.55 respectively. And smoothing the initial weight by adopting an exponential sliding average method with the attenuation coefficient of 0.8, and finally obtaining the dynamic fusion weights of 0.42 and 0.58.

Fig. 2 illustrates the simulation effect of the autopilot control method of the present invention in a complex intersection scenario, wherein during the simulation, the left (darker) vehicle represents the host vehicle and the lower (lighter) vehicle represents the surrounding traffic participants. The upper left corner of the figure shows the multi-modal trajectory prediction results, including the different possibilities of the travel path and its corresponding probabilities (straight 0.65, left turn 0.25, right turn 0.10). The upper right corner shows the space-time dimension risk assessment results with a time dimension collision risk value of 0.32, a space dimension collision risk value of 0.28, and a composite risk score of 0.30. The bottom shows the output of the dual-channel control network and the fusion result, the conventional driving control weight is 0.72, the emergency risk avoidance control weight is 0.28, and the fused control instruction comprises acceleration-0.5 m/s2 and steering wheel rotation angle of 2.3 degrees. Simulation results show that the method can accurately identify potential risks in complex scenes, generate safe and stable control strategies, and effectively cope with challenges of various driving scenes.

Fig. 3 shows the safety performance comparison of three different autopilot control methods in various scenarios, and the evaluation is performed by using the key index of the collision accident rate (percentage). The graph clearly shows the safety advantages of the dual channel control architecture (square mark) of the present invention over DDPG single control strategy (triangle mark) and SAC single control strategy (circle mark) in three typical scenarios. In a highway environment, the collision accident rate of the control architecture of the present invention is only 0.8%, significantly lower than 2.5% of DDPG and 2.1% of SAC. This advantage is more pronounced as scene complexity increases. Under urban road conditions, the method of the invention maintains a low collision rate of 1.5%, whereas DDPG and SAC are 4.2% and 3.8%, respectively. In the most challenging complex intersection scenario, the collision accident rate of the inventive method is 2.1%, only 28.8% of DDPG method (7.3%) and 32.3% of SAC method (6.5%).

The method realizes accurate quantification of the time-to-time collision risk through multi-mode track prediction and probability density analysis, improves accuracy and reliability of risk assessment, realizes self-adaptive assessment of the safety state of the driving environment based on a calculation method of a dynamic safety distance and a safety margin coefficient, enhances the adaptability of the system to complex traffic scenes, adopts a strategy confidence level mapping and dynamic weight fusion mechanism, realizes smooth switching of conventional driving and risk avoidance driving strategies, and improves safety and comfort of an automatic driving system.

In an alternative embodiment, the training of the deep reinforcement learning network includes:

Illustratively, a scene complexity assessment system is constructed. The curvature change rate of the road is calculated by the curvature standard deviation in each 100-meter road section, the value range is 0 to 1, wherein the curvature change rate of the straight road section is close to 0, and the curvature change rate of the sharp turning road section is close to 1. The number of obstacles in the scene is normalized by taking the average number of obstacles per kilometer as a reference, the value is 0 when the number of the obstacles is 0, and the value is 1 when the number of the obstacles reaches 20/kilometer. The obstacle motion complexity is calculated based on the speed change and trajectory curvature of the obstacle, the stationary obstacle complexity is 0, and the obstacle complexity of fast moving and frequently turning approaches 1. And (3) weighting and summing the three indexes to obtain scene difficulty scores, wherein the weights are respectively 0.3, 0.4 and 0.3.

And dividing the training samples into five difficulty levels according to the scene difficulty scores. The difficulty score is 0-0.2, which is a primary scene comprising a straight road barrier-free basic scene, 0.2-0.4, which is a middle-level scene comprising a simple curve and a static barrier, 0.4-0.6, which is a high-level scene comprising a continuous curve and a low-speed moving barrier, 0.6-0.8, which is an expert scene comprising a sharp curve and a fast moving barrier, and 0.8-1.0, which is a challenge-level scene comprising a complex road condition and a plurality of interactive barriers.

During the pre-training phase, 500 hours of human driver demonstration data were collected for behavioral cloning. The demonstration data cover various typical scenes, including standard driving behaviors such as following a straight road, overtaking a lane change, avoiding pedestrians and the like. The basic vehicle control capability is mastered by the network through imitation learning, and the completion rate of the pre-training is more than 90% in the primary scene test after the pre-training is completed.

The reinforcement learning stage is based on course sequence to gradually increase difficulty. Each difficulty level is provided with 1000 initial training rounds and 10 ten thousand frame data of experience playback pool capacity. The training efficiency evaluation period is 100 rounds, and the evaluation indexes comprise the average return promotion rate and the promotion amplitude of successful completion rate of the strategy network on the verification set. When the training efficiency of 3 continuous evaluation periods is lower than a preset threshold, the training difficulty is automatically reduced, and the training turn of the current difficulty level is increased.

And setting grading standards of three dimensions of safety, stability and efficiency. The safety score is based on a minimum distance from the obstacle, requiring more than 3 meters, the stationarity score is based on lateral acceleration, requiring less than 2 meters per second square, and the efficiency score is based on average travel speed, requiring no less than 80% of the target speed. And when the grading weighted result of the three dimensions is greater than 0.85, the training difficulty level is improved.

The training system and the training method have the advantages that through the quantitative evaluation of scene complexity and the course learning strategy, the deep reinforcement learning network can grasp the driving skills from simple to complex step by step, the problem of unstable training when the training system directly faces high-difficulty scenes is avoided, the training efficiency and the success rate are remarkably improved, the two-stage training and dynamic course adjustment mechanism is adopted, the problems of slow convergence of the reinforcement learning and easy sinking into local optimum are fully utilized, the reliability of the stability and the final performance of the training process is ensured, the performance of an automatic driving system in the aspects of safety, comfort, efficiency and the like is comprehensively considered based on the training scheme of a multidimensional scoring standard, the trained driving strategy can be ensured to cope with complex working conditions, the actual application requirement can be met, and the practical value is high.

In an alternative embodiment, the dynamic course adjustment mechanism includes:

Illustratively, a multi-dimensional training efficiency assessment system is constructed. The assessment system includes a short-term efficiency index and a long-term efficiency index. The short-term efficiency index mainly considers the convergence speed and the evaluation index lifting amplitude. Specifically, the convergence rate can be measured by calculating the rate of change of the policy network parameters in each training period, and the percentage of improvement of the key evaluation index (such as the reward value, the success rate, etc.) is recorded. The long-term efficiency index is based on the generalized performance and the anti-interference capability of the strategy network in the verification scene. Multiple verification scenarios with different environmental parameters can be set, the average performance score of the test policy network in these scenarios, and the degree of fluctuation of the performance after adding random interference. The short-term efficiency index and the long-term efficiency index are combined in a weighted average mode to generate a comprehensive training efficiency value. For example, a short term index weight of 0.4 and a long term index weight of 0.6 may be set to obtain a final overall training efficiency value.

And establishing a distributed training architecture. The training tasks of the course sequence are distributed to a plurality of parallel training environments, and each environment is provided with different random disturbance parameters. For example, 5 parallel training environments may be provided, with 5%, 10%, 15%, 20%, 25% random noise added, respectively. And carrying out parallel training by adopting an asynchronous gradient updating mode, independently calculating gradients by each environment and asynchronously updating global strategy network parameters. And (5) periodically comparing performance performances of the strategy network under different parallel training environments, and calculating the performance variance as a stability evaluation value. The smaller the stability assessment value, the more stable the policy network performs under different circumstances.

And determining the course difficulty adjustment direction according to the comprehensive training efficiency value and the stability evaluation value. The upper limit of the preset efficiency is set to 0.8, and the preset stability threshold is set to 0.1. And when the comprehensive training efficiency value is greater than 0.8 and the stability evaluation value is less than 0.1, the course difficulty level is improved according to the self-adaptive step length. The self-adaptive step size can be dynamically adjusted according to the difference between the current efficiency value and the upper limit, and the larger the difference is, the larger the step size is. And when the comprehensive training efficiency value is smaller than the preset lower efficiency limit (such as 0.5), subtracting 1 from the course difficulty level.

A hierarchical storage structure is built for the experience playback mechanism. The training samples are classified and stored to different experience pool levels according to the difficulty level. For example, 5 difficulty levels may be set, corresponding to 5 experience pool levels. And in the process of performing the supplementary training of subtracting 1 from the difficulty level, importance sampling is performed from the experience pool corresponding to the difficulty level, and a supplementary training sequence is generated. Importance samples may be weighted based on the temporal decay coefficient of the sample and the prize value size.

Setting a course difficulty smooth transition mechanism. In the difficulty level conversion process, the mixed weight of adjacent difficulty levels is calculated based on an exponential decay function. For example, the attenuation coefficient may be set to 0.9, then the mixing weights for the nth training batch are 0.9 ⁿ and 1-0.9 ⁿ. And dynamically combining training samples with different difficulty levels according to the mixed weights to generate a transition training sample set. Thus, smooth transition of difficulty level can be realized, and severe fluctuation of policy network performance is avoided.

FIG. 4 is a graph of a training process reward curve versus a difficulty adjustment path, wherein the upper graph shows a comparison of performance of three course learning methods during reinforcement learning training. From the rewarding curve, the dynamic course adjustment mechanism of the invention always keeps higher learning efficiency in the training process, finally achieves the verification rewarding of 210, and is obviously superior to 95 of fixed-step course learning and 120 of course learning based on performance threshold. In terms of convergence speed, the present invention can converge only with 124k steps, while the fixed step size and performance threshold methods require 356k and 298k steps, respectively. The difficulty level change curve of the lower graph shows that the training method can dynamically adjust course difficulty according to training efficiency to form a reasonable difficulty lifting curve, and the fixed step length method is mechanically adjusted according to a preset interval, and the performance threshold method is stopped at a medium difficulty stage. The stability index of the invention reaches 0.87, which is far higher than 0.52 and 0.61 of the other two methods, thus showing that the invention has stronger robustness under different training environments.

The dynamic course adjustment mechanism can effectively improve the efficiency and stability of reinforcement learning training. Through a multidimensional training efficiency evaluation system, short-term and long-term training effects of a strategy network are comprehensively measured, and a reliable basis is provided for course difficulty adjustment. A distributed training architecture and stability assessment mechanism helps to enhance the robustness and generalization ability of the policy network. The self-adaptive course difficulty adjustment strategy can dynamically adjust the difficulty level according to the actual condition of the training process, and avoids the training from being trapped into local optimum or diverging. Layered experience playback and supplemental training mechanisms can purposefully enhance the performance of the policy network at different difficulty levels. The course difficulty smooth transition mechanism can realize gradual adjustment of difficulty level, reduce severe fluctuation of strategy network performance and improve training continuity and stability. In the whole, the mechanism can remarkably improve the training efficiency, generalization capability and stability of the reinforcement learning algorithm, and provides powerful support for intelligent decision making in a complex environment.

In an alternative embodiment, building a virtual simulation environment includes:

The invention provides a method for constructing a virtual simulation environment, which creates a complex and real road scene through systematic steps. First, geometric and grade characteristics of the road are determined based on road parameters. Road geometry characteristics include road width, number of lanes, curve radius, etc., and gradient characteristics include longitudinal slopes, transverse slopes, etc. For example, a two-way four-lane highway may be provided with a width of 28 meters, a curve radius of 1000 meters, a longitudinal slope of 2%, and a lateral slope of 1.5%.

Setting type parameters and layout position parameters of the road facility characteristics. The road facilities include traffic signs, guardrails, street lamps, etc. For example, a traffic sign with a speed limit of 80 km/h may be set 500 m from the start of the road, and a street lamp may be set every 50m on both sides of the road. Meanwhile, environmental parameters of weather features, such as temperature, humidity, visibility and the like, are set. For example, the temperature may be set at 25 ℃, the humidity at 60% and the visibility at 1000 meters.

In order to ensure that the generated scene is reasonable and reliable, a feature combination constraint rule base is established. The rule base comprises matching rules of road geometric features and gradient features, layout rules of road facility features and combination rules of weather features. For example, the maximum longitudinal slope of the expressway should not exceed 4%, the transverse slope at the curve should be matched with the radius of the curve, the setting of the traffic sign should accord with the traffic rule, for example, the warning sign of the curve should be set before the curve, and the visibility under the heavy fog weather should be lower than 200 meters.

And screening out feature combinations meeting constraint conditions based on a feature combination constraint rule base, and constructing a feature adaptation degree matrix. The fitness value represents the degree of matching of the corresponding feature combination, ranging from 0 to 1. For example, for a highway scene, a 28 meter width may be 0.9 for a two-way four lane adaptation and 0.6 for a two-way six lane adaptation.

And automatically combining the features by adopting a layered combination method. Firstly, selecting the road geometric feature and gradient feature combination with the highest adaptation value to generate a basic road section. For example, a combination of two-way four lanes, 28 meters width, 1000 meters curve radius, 2% longitudinal slope, and 1.5% lateral slope is selected. And then selecting the road facility features with the highest adaptation value on the basic road section for layout, for example, setting speed limit signs at the position of 500 meters at the starting point, and setting street lamps at each 50 meters at two sides. And finally, the weather features with the highest adaptation degree value are selected for superposition, such as 25 ℃ temperature, 60% humidity and 1000 m visibility, and a complete scene is generated.

In order to increase the diversity and the authenticity of scenes, a Monte Carlo method is adopted to randomly sample the feature combination result. For example, the road width may be randomly sampled between 26 meters and 30 meters and the longitudinal slope may be randomly sampled between 1% and 3%. And then, performing feature transfer by using a Markov chain to generate a composite test scene. The Markov chain may model a transition probability between features, such as 0.3 for a sunny day to a cloudy day and 0.4 for a cloudy day to a rainy day. In this way, a series of continuously changing scenes can be generated, such as a process of changing from a sunny day to a cloudy day to a rainy day.

The invention carries out standardized management on road geometric features, gradient features, road facility features and weather features by establishing a feature combination constraint rule base. Based on the rule base, rationality test of the feature combination is realized, and the generated scene is ensured to accord with engineering practice and traffic specification, for example, the rule base limits that the maximum longitudinal slope of the expressway is not more than 4%, and the transverse slope at the curve is matched with the radius of the curve. The feature fitness matrix is constructed to realize quantitative evaluation of feature combination, and the optimization combination of the features is guided through the fitness value. By adopting a layered combination method, a feature combination with high adaptation degree is preferentially selected to construct a basic road section, and then road facilities and weather features are overlapped, so that the layering property and the rationality of scene construction are ensured.

In an alternative embodiment, the random sampling of the feature combination result by adopting a Monte Carlo method, and the feature transfer by adopting a Markov chain, the generating of the composite test scene comprises:

The method includes the steps of calculating the sum of the product of the scene coverage rate and a first preset weight and the product of the probability of a rare event and a second preset weight to obtain the scene importance, wherein the rare event refers to a special scene type with the occurrence frequency lower than a preset threshold in historical test scene data, such as abnormal working conditions of extreme weather, sudden obstacles and the like. Specifically, the first preset weight may be set to 0.6 and the second preset weight may be set to 0.4. Assuming that the coverage of a scene is 0.8 and the rare event probability is 0.7, the importance of the scene is calculated as 0.8x0.6+0.7x0.4=0.76.

And calculating a sampling temperature parameter according to the scene importance. The sampling temperature parameter may be determined by a mapping function, for example, scene importance 0-0.3 may be mapped to temperature parameters 1.5-2.0,0.3-0.6 to 1.0-1.5,0.6-1.0 to 0.5-1.0. In this example, the scene importance is 0.76, and the corresponding sampling temperature parameter may be 0.8.

Substituting the scene importance and the sampling temperature parameter into a Softmax function to generate scene sampling probability. The Softmax function may convert an input into a probability distribution. Assume that the scene sampling probability obtained after Softmax function calculation is 0.65.

And carrying out MonteCarlo sampling on the feature combination result according to the scene sampling probability to acquire scene features. The Montecello method approximates a true distribution by multiple random samplings. Assume that the sampled scene feature is [0.9,0.3,0.7,0.5], representing feature values of four different dimensions.

The scene characteristics and the scene condition constraint input condition changes are divided into the encoder, and the mean parameter and the variance parameter are generated, wherein the scene condition constraint refers to the constraint conditions applied to scene generation, and the constraint conditions comprise the range of road geometric parameters, traffic rules, physical laws and the like which must be met. The conditional variable self-encoder is a generative model that can learn an implicit representation of data. Assume that the generated mean parameter is [0.8,0.2,0.6,0.4] and the variance parameter is [0.1,0.05,0.08,0.06].

And carrying out Gaussian distribution sampling based on the mean parameter and the variance parameter to obtain hidden variables. Gaussian distributed sampling can increase the diversity of the generated results. Assume that the hidden variable obtained by sampling is [0.85,0.18,0.65,0.45].

And constructing a Markov transfer matrix from the hidden variables, and carrying out characteristic migration on the hidden variables based on the Markov transfer matrix to generate the migrated hidden variables. The Markov transition matrix describes transition probabilities between states. Assume that after feature migration, the new hidden variable is obtained as [0.82,0.25,0.68,0.42].

Decoding the migrated hidden variables and scene condition constraints to generate migrated scene features. The decoding process is the inverse of the encoding, mapping hidden variables back into the original feature space. Assume that the new scene feature obtained after decoding is [0.88,0.28,0.72,0.48].

According to the method, the scene importance is determined through weighted calculation of the scene coverage rate and the rare event probability, and the sampling temperature parameter is adaptively adjusted based on the importance, so that the sampling efficiency of rare scenes is effectively improved. The combination of the conditional variation self-encoder and Gaussian distribution sampling is introduced, so that efficient encoding and diversified generation of scene features are realized. And the Markov transfer matrix is adopted to carry out feature migration, and the KL divergence threshold value is used for controlling the generation quality, so that the continuity and effectiveness of the generated scene are ensured. The method ensures that the scene after the characteristic migration maintains the correlation with the original scene and can generate new effective change.

In a second aspect of the embodiment of the present invention, an intelligent driving simulation test system for an internal combustion locomotive is provided, the system comprising:

In a third aspect of an embodiment of the present invention,

There is provided an electronic device including:

A processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.

In a fourth aspect of an embodiment of the present invention,

There is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.

The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.

It should be noted that the above embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that the technical solution described in the above embodiments may be modified or some or all of the technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the scope of the technical solution of the embodiments of the present invention.

Claims

1. The intelligent driving simulation test method for the diesel locomotive is characterized by comprising the following steps of:

The method comprises the steps of constructing a deep reinforcement learning network, comprising a prediction branch network for generating a multi-mode prediction track and a dual-channel control network for generating a driving strategy, setting a safety index as a network constraint condition for a conventional driving strategy, increasing the weight of the safety index and the smoothness index along with the increase of scene complexity, correspondingly reducing the weight of the efficiency index, carrying out safety constraint optimization on the basis of collision risk assessment for a risk avoidance driving strategy, inputting the driving state information, standardized perception data and scene complexity data into the deep reinforcement learning network to generate driving control parameters, specifically comprising the steps of receiving the position, the speed and the acceleration information of an obstacle in the standardized perception data by the prediction branch network, extracting track characteristics by a multi-head self-attention encoder, generating a multi-mode prediction track based on a cyclic neural network decoder, setting the multi-mode prediction track, the driving state information and the scene complexity data as state space vectors, inputting the dual-channel control network, comprising a conventional driving channel and an emergency avoidance driving channel, generating the conventional driving control parameters by the conventional driving strategy based on collision risk assessment, setting the safety index as the safety index and the scene complexity index, generating the driving strategy by the conventional driving strategy based on the collision risk avoidance driving strategy, increasing the safety index and the emergency risk avoidance driving strategy, and the driving strategy by the corresponding to the increase of the safety index and the emergency risk complexity index, and carrying out weighted fusion on the conventional driving strategy and the risk avoidance driving strategy according to the dynamic fusion weight to generate driving control parameters.

2. The method of claim 1, wherein the generating of the normalized perceptual data comprises:

3. The method of claim 1, wherein evaluating the confidence levels of the regular driving maneuver and the risk avoidance driving maneuver based on the spatiotemporal collision risk and the minimum safe distance, the mapping the confidence levels to dynamic fusion weights comprises:

4. The method of claim 1, wherein the training of the deep reinforcement learning network comprises:

5. The method of claim 4, wherein the dynamic course adjustment mechanism comprises:

6. The method of claim 1, wherein constructing a virtual simulation environment comprises:

7. The method of claim 6, wherein randomly sampling the feature combination result using a Monte Carlo method, performing feature transfer using a Markov chain, and generating a composite test scene comprises:

8. An intelligent driving simulation test system for an internal combustion locomotive, for implementing the method as claimed in any one of the preceding claims 1-7, characterized by comprising: