CN118981620A - A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction - Google Patents
A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction Download PDFInfo
- Publication number
- CN118981620A CN118981620A CN202411462619.4A CN202411462619A CN118981620A CN 118981620 A CN118981620 A CN 118981620A CN 202411462619 A CN202411462619 A CN 202411462619A CN 118981620 A CN118981620 A CN 118981620A
- Authority
- CN
- China
- Prior art keywords
- speed
- fuel cell
- lstm
- driving
- energy management
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60L—PROPULSION OF ELECTRICALLY-PROPELLED VEHICLES; SUPPLYING ELECTRIC POWER FOR AUXILIARY EQUIPMENT OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRODYNAMIC BRAKE SYSTEMS FOR VEHICLES IN GENERAL; MAGNETIC SUSPENSION OR LEVITATION FOR VEHICLES; MONITORING OPERATING VARIABLES OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRIC SAFETY DEVICES FOR ELECTRICALLY-PROPELLED VEHICLES
- B60L58/00—Methods or circuit arrangements for monitoring or controlling batteries or fuel cells, specially adapted for electric vehicles
- B60L58/30—Methods or circuit arrangements for monitoring or controlling batteries or fuel cells, specially adapted for electric vehicles for monitoring or controlling fuel cells
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60L—PROPULSION OF ELECTRICALLY-PROPELLED VEHICLES; SUPPLYING ELECTRIC POWER FOR AUXILIARY EQUIPMENT OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRODYNAMIC BRAKE SYSTEMS FOR VEHICLES IN GENERAL; MAGNETIC SUSPENSION OR LEVITATION FOR VEHICLES; MONITORING OPERATING VARIABLES OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRIC SAFETY DEVICES FOR ELECTRICALLY-PROPELLED VEHICLES
- B60L2240/00—Control parameters of input or output; Target parameters
- B60L2240/40—Drive Train control parameters
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Sustainable Development (AREA)
- Sustainable Energy (AREA)
- Power Engineering (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Electric Propulsion And Braking For Vehicles (AREA)
- Fuel Cell (AREA)
Abstract
The invention provides a fuel cell automobile reinforcement learning energy management method considering working condition prediction, which comprises the following steps: constructing comprehensive driving condition data by adopting four different types of classical driving conditions, dividing the comprehensive driving condition data into segments, analyzing principal components by extracting characteristic parameters of the working conditions, and clustering driving cycle segments by a K-means clustering method; after three different types of driving cycle segment sets are obtained, an off-line training is carried out on a working condition identification module based on Bi-LSTM and a vehicle speed prediction module based on Markov, wherein the vehicle speed prediction module selects data sets representing three different road characteristics to respectively train three different Markov matrixes; extracting working condition characteristic parameters of a vehicle speed prediction sequence, and converting the working condition characteristic parameters into an equivalent factor regulating coefficientAnd willAs a status input to DDPG the energy management algorithm, the fuel cell power P fc acts to set the appropriate bonus function to achieve comprehensive consideration for achieving multi-objective optimization.
Description
Technical Field
The invention relates to the technical field of fuel cells, in particular to a reinforced learning energy management method of a fuel cell automobile considering working condition prediction.
Background
The energy exhaustion and environmental crisis problems in the world are increasingly serious, and automobiles are one of important carriers of energy consumption and air pollution, so that energy efficiency optimization, energy conservation and emission reduction of automobiles are hot spots of current research. The hydrogen fuel cell is used as an emerging energy power generation technology, is applied to automobiles, and has the advantages of zero emission, no pollution, low noise, high energy efficiency, renewable energy utilization, rapid fuel filling and the like. Therefore, the technical route has wide prospect in the field of new energy automobiles, and is one of the viable schemes for relieving the problems of non-renewable energy consumption and pollutant gas emission.
At present, the corresponding technology of the domestic hydrogen fuel cell automobile still needs to be improved, and a certain technical gap still exists in the energy management scheme. The energy management (ENERGY MANAGEMENT SYSTEM, EMS) is one of the core technologies of the hydrogen fuel cell automobile, has great influence on the fuel economy, the service life, the power performance and the comfort of the whole automobile, and is also a research difficulty and a hot spot in the field of the fuel cell automobile. Fuel cell automobiles generally adopt an all-electric power system, but if only a fuel cell provides the power of the whole automobile, problems such as time lag response and incapability of recovering braking energy exist, so that the power requirement cannot be met in time, and the energy cannot be recovered. In the current fuel cell car power system architecture, a battery or a super capacitor is added as an auxiliary power source to cope with abrupt power demand and simultaneously perform braking energy recovery. The power system also has a single power source to be changed into multiple power sources, so that the required power is distributed to each energy source by a higher-efficiency energy management strategy, the economy and the power performance of the whole vehicle are ensured, and the performance of the whole vehicle is improved.
If three factors of equivalent hydrogen consumption, SOC offset and fuel cell service life are considered, the rule-based strategy is calibrated by means of expert experience, and global optimization is not achieved. The optimization-based energy management strategy is poor in real-time performance and has limitations for the multi-objective optimization problem. A new strategy is needed to meet the multi-objective, high real-time, optimal energy management needs.
Because the fuel cell is applied to the truck, the complex working conditions such as urban roads, rural roads, expressways and the like are required to be frequently converted, and the complex working condition requirements of the single energy management algorithm cannot be well met.
The existing energy management strategies are various, but focus on the optimization algorithm, but the advantages of combination of vehicle speed working condition prediction and energy management cannot be fully exerted, if the future vehicle speed is considered in the energy management algorithm, and the front and rear driving state information is considered at the same time by using a Bi-directional long-short-term memory network Bi-LSTM, the vehicle has foresight in the driving process, and the accuracy of working condition identification can be effectively improved according to the situation of sudden changes in the front and rear of the vehicle speed working condition, so that the energy efficiency is improved. Therefore, the invention provides a fuel cell vehicle reinforcement learning energy management method considering working condition prediction.
Disclosure of Invention
The invention provides a fuel cell automobile reinforcement learning energy management method considering working condition prediction, which solves the technical problems mentioned in the background art.
The technical scheme adopted for solving the technical problems is as follows:
A fuel cell vehicle reinforcement learning energy management method that considers operating condition predictions, the method comprising the steps of;
S01, constructing comprehensive driving working condition data capable of reflecting various road working condition characteristics by adopting four different types of classical driving working conditions, dividing the comprehensive driving working condition data into segments according to each segment of 100S, carrying out Principal Component Analysis (PCA) by extracting working condition characteristic parameters of the working conditions, clustering driving cycle segments by a K-means clustering method, and clustering the driving cycle segments into three working conditions of low speed, medium speed and high speed;
s02, after three different types of driving cycle segment sets are obtained, offline training is carried out on a working condition identification module based on Bi-LSTM and a vehicle speed prediction module based on Markov;
S03, extracting working condition characteristic parameters of a vehicle speed prediction sequence, such as average vehicle speed and standard deviation of the vehicle speed, and converting the working condition characteristic parameters into an equivalent factor regulating coefficient And willThe fuel cell power P fc is input as a state to the DDPG energy management algorithm as an action and appropriate bonus functions including hydrogen consumption, SOC fluctuations and fuel cell degradation are set to achieve the optimization problem of comprehensively considering achieving multiple objectives.
As a further technical scheme of the invention, the step of constructing comprehensive driving condition data capable of reflecting various road condition characteristics by adopting four different types of classical driving conditions comprises the following steps of:
preprocessing working condition data: the method comprises the steps of combining NEDC, WLTC, CLTC-P, FTP75 typical road working conditions to serve as comprehensive driving working condition data capable of reflecting various road working condition characteristics, and using the comprehensive driving working condition data as an algorithm for identifying and predicting subsequent training working conditions;
NEDC (New European DRIVING CYCLE) operating conditions are European endurance standard test operating conditions, including 4 urban and 1 suburban cycles; wherein the urban working condition is 780 seconds, and the highest speed is 50km/h; suburban working conditions are 400 seconds, and the highest speed is 120km/h;
The WLTC standard working condition is closer to the actual road driving condition, the complete test cycle of the WLTC standard working condition is composed of 4 stages of low speed, medium speed, high speed and super speed, and the total duration is 1800s, wherein the idle time is 235s, the stroke is 23266m, the average vehicle speed is 46.5km/h, and the highest vehicle speed is 131.3km/h;
The test working conditions of the CLTC-P (CHINA LIGHT-duty VEHICLE TEST CYCLE-PASSENGER CAR) passenger car comprise 3 speed intervals of low speed, medium speed and high speed, wherein the total duration is 1800s, the total mileage is 14480m, the maximum speed is 114km/h, and the average speed is 28.96km/h;
FTP75 (FEDERAL TEST process) is a standard issued by the united states energy agency for testing the economy and emissions of passenger cars in urban conditions for assessing emissions and fuel economy of light vehicles and light trucks; the complete FTP75 working condition cycle driving time is 1874s, the theoretical driving distance is 17.77km, the average vehicle speed is 34.12km/h, and the highest vehicle speed is 91.25km/h, and the complete FTP75 working condition cycle driving time comprises a cold start transient stage, a steady state stage and a hot start transient stage 3 part.
As a further technical scheme of the invention, the steps of dividing the comprehensive driving working condition into segments according to each segment of 100s and carrying out Principal Component Analysis (PCA) by extracting working condition characteristic parameters of the working condition comprise the following steps:
Setting the comprehensive driving condition data as 100s as one section; the characteristic parameters of the working conditions can reflect the characteristic information of the driving cycle segment of each driving cycle segment; the characteristics of each driving cycle segment are described by 14 parameters, namely an average speed, a maximum speed, an average acceleration, a maximum acceleration, an average deceleration, a maximum deceleration, an idle speed time ratio, an acceleration time ratio, a deceleration time ratio, a uniform speed time ratio, a speed standard deviation, an acceleration standard deviation, a driving mileage and a running time;
however, 14 working condition characteristic parameters are too many, information overlapping and correlation exist among the working condition characteristic parameters, and the calculated amount is large, so that the analysis is not facilitated. In order to reduce the complexity of analysis, main component analysis (PCA) is adopted to perform dimension reduction treatment;
For 100s of comprehensive driving condition data with 14 condition characteristic parameters, reducing the dimension to 3 dimensions, and performing dimension reduction processing on the principal component analysis, wherein the steps comprise:
1) Forming the original data into a matrix X of 100 rows and 14 columns according to columns;
2) Zero-equalizing each row of X, i.e. subtracting the average value of the row;
3) Obtaining covariance matrix ;
4) Obtaining eigenvalues and corresponding eigenvectors of the covariance matrix;
5) Arranging the eigenvectors into a matrix according to the corresponding eigenvalues from top to bottom, and taking the first 3 rows to form a matrix P;
6)、 Namely, the data after dimension reduction to 3 dimensions;
In the process, in order to ensure the integrity of the information after the dimension reduction, the selected main components are required to meet the condition that the accumulated contribution rate reaches more than 80%, the first 3 main components with the largest contribution rate are selected finally, and the accumulated contribution rate reaches more than 80%, so that the characteristic information of most working conditions is covered.
As a further technical scheme of the invention, the step of clustering the driving cycle segments by the K-means clustering method comprises the following steps:
Classifying driving cycle fragments by using a K-means clustering algorithm, wherein in the process, we determine the classification quantity K=3, assign an initial clustering center point for each class, and calculate the Euclidean distance from each sample point to the clustering center; ;
Dividing each point into various types according to the distance minimum principle, and after the first clustering is completed, recalculating the clustering center of each cluster, wherein the new clustering center is the average value of all data points in the cluster, and the calculation formula is as follows:
;
Wherein, Is the firstA collection of data points of a cluster,The number of data points in the set is the number of data points, and the clustering points are updated and redistributed until the clustering center tends to be stable.
Finally, after working condition characteristic parameters of the working condition data are subjected to dimension reduction processing through PCA to obtain the working condition data containing 3 pieces of principal component information, dividing 100 road driving cycle segments into 3 classes through a K-means clustering algorithm, and carrying out vehicle speed working condition prediction by specifically training a Markov transfer matrix.
As a further technical scheme of the invention, after the three different types of driving cycle segment sets are obtained, the step of performing offline training on the Bi-LSTM-based working condition identification module and the Markov-based vehicle speed prediction module comprises the following steps:
Bi-LSTM is composed of forward LSTM and backward LSTM, the forward LSTM processes the input sequence according to time sequence, captures the information from the past to the future, the backward LSTM processes the input sequence from the latest time point to the earliest time point, captures the information from the past; each LSTM of the bidirectional structure internally comprises an input gate, a forget gate, an output gate and a unit state; the input of the model is 14 working condition characteristic parameters, and the output is a working condition mode, which is divided into three types of low speed, medium speed and high speed; the LSTM architecture is:
forgetting the door: the forgetting gate decides which information is forgotten, and the expression is:
;
wherein, Is a sigmoid function of the number of bits,Is the weight coefficient of the forgetting gate,Representing the hidden state of the previous momentAnd the current input stateWhereinIs 14 working condition characteristic parameters including real-time speed, acceleration and the like of the vehicle,Is a bias term;
An input door: the input gate determines which new information needs to be added to the cell state;
;
;
wherein, AndIs a weight matrix of an input gate, the dimension is adjusted according to the number of the input parameters,AndIs a bias term that is used to determine,Is a candidate cell state;
cell state update: updating the cell state by combining the results of the forgetting gate and the input gate;
;
Output door: deciding what value to output based on the updated cell state;
;
;
wherein, Is the weight parameter of the output gate,Is a bias term that is used to determine,Is the output state at the current moment;
Bi-LSTM computes a Bi-directional hidden state by running LSTM layers forward and backward along the time axis, forward LSTM sequentially from the first element to the last element of the sequence, and backward LSTM, the two hidden states being concatenated together to form a final Bi-directional hidden state, the Bi-directional LSTM capturing contextual information before and after each time step in the sequence, thereby providing a more comprehensive characterization;
In a vehicle speed working condition prediction module, a Markov vehicle speed prediction method is adopted, three working condition modes of low speed, medium speed and high speed are identified according to working conditions, each mode corresponds to a Markov transition matrix, three Markov matrixes (the low-speed Markov transition matrix, the medium-speed Markov transition matrix and the high-speed Markov transition matrix) are pertinently trained offline through three driving cycle segments clustered in the step S01, a prediction step length is set to be n, and the Markov outputs n steps of predicted vehicle speed information;
after the vehicle speed prediction considering the working condition recognition is completed, a section of p-step predicted vehicle speed sequence is output.
As a further technical scheme of the present invention, in step S03, DDPG algorithm is based on an Actor-Critic framework, so that the algorithm contains an Actor and Critic network, and each network has a corresponding target network, so that the DDPG algorithm comprises four networks, namely an Actor networkCritic networkTarget Actor networkAnd TARGET CRITIC networksThe updating process of DDPG algorithm, the updating mode of the target network and the purpose of introducing the target network;
DDPG the training process includes:
Initializing critic a network And actor networksParameters of (2)And;
The corresponding target network parameters are initialized,,;
Initializing experience playback D, setting the capacity as K, and setting the number of circulating wheels as M;
According to the current state Input to the current network according to the strategy and noise of the current networkThe action is selected so that the user can select,Generates an actionAfter that, get rewardsAnd go to the next stateAnd store the state action pairsInto experience playback D;
Removing a random batch from D Target value of;
Wherein the method comprises the steps ofIs a cost function of the next time instant,Is a function of the policy at the next moment,、Is the network weight parameter at the next moment;
updating critic parameters, wherein the loss is as follows: ;
Updating actor network: ;
Soft update target network: 。
As a further technical scheme of the invention, in step S03, DDPG deep reinforcement learning energy management strategies of working condition identification and vehicle speed working condition prediction are fused;
A set of state:
;
the power is required for the whole vehicle, FOH is the life of the fuel cell in the state of power cell;
Setting an action set:
;
Power for the fuel cell;
Setting a reward function:
;
R is the function of the prize to be awarded, Hydrogen consumption rate for fuel cells; lambda is an equivalent factor adjustment coefficient calculated by the p-step predicted vehicle speed sequence output from the vehicle speed predicting section,In the training process, an optimal energy management strategy is achieved by selecting a proper k value; after training, lambda is adjusted in real time through a predicted vehicle speed sequence which is specifically received by lambda in the actual running process; wherein the method comprises the steps ofTo predict the standard deviation of the vehicle speed for the vehicle speed sequence,Average vehicle speed for the predicted vehicle speed sequence; The power is output for the power battery; LHV is the lower heating value of hydrogen; as the fuel cell life degradation value, To account for a weighting factor of fuel cell life degradation;
in the process of realizing DDPG-based energy management, firstly, inputting a state into an Actor of an Agent by an environment, selecting a corresponding action by the Agent, obtaining a real-time rewarding state S' after transition to a later moment, storing experience samples in a sample pool, removing earliest experience after the samples in the sample pool reach a preset number, continuously enabling an Agent to execute the action by a DDPG algorithm, collecting data interacted with the environment, including the state and rewarding, updating network parameters, enabling the Agent to gradually learn an optimal action strategy, and realizing optimal energy efficiency optimization.
The beneficial effects of the invention are as follows: compared with an energy management algorithm based on rules and optimization, the method adopts an energy management strategy based on a depth deterministic strategy gradient DDPG algorithm. Rule-based algorithms typically rely on preset fixed rules and engineers' experience, and optimization goals and constraints of the optimization algorithm are also relatively fixed, and real-time is relatively poor. The DDPG algorithm is able to adaptively adjust policies through constant interactions and learning with the environment. And in the energy management process, a plurality of factors and uncertainties of mutual association coupling are involved, and the DDPG algorithm can process the problems of high-latitude state space and continuous action space, continuously control the output power of the fuel cell, and can effectively adapt to the energy management scene of the fuel cell automobile. And the optimal solution can be limited by complexity when solving the optimal solution based on the optimized energy management strategy, and only the local optimal solution can be obtained in some cases. And optimization-based algorithms require an accurate mathematical model, whereas DDPG-based algorithms can learn and optimize without knowledge of the system model.
Meanwhile, compared with the traditional DDPG algorithm, the bi-LSTM-based working condition recognition and Markov vehicle speed prediction module is added, so that the bi-LSTM-based working condition recognition and Markov vehicle speed prediction module can adapt to complex road working conditions. And the bi-LSTM algorithm is adopted, compared with the traditional LSTM classification algorithm, the bi-LSTM consists of two LSTM, and bidirectional information can be captured simultaneously, so that the prediction process can consider past and future vehicle speed information simultaneously. For example, when urban working conditions of frequent start and stop are identified, the bi-LSTM can consider the previous start and stop condition and the subsequent possible change trend at the same time, so that the type of the working conditions can be identified more accurately, the identification accuracy can be improved effectively, and information omission can be reduced. After the working condition type is identified based on the bi-LSTM working condition identification model, three well trained Markov state transition matrixes of low-speed, medium-speed and high-speed types are used for carrying out vehicle speed prediction according to different road working conditions in a better pertinence manner, and the vehicle speed prediction precision is improved, so that the energy efficiency of the energy management system is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 shows a schematic diagram of a fuel cell vehicle power system according to an embodiment of the present invention.
FIG. 2 illustrates a reinforcement learning energy management strategy diagram incorporating condition recognition and prediction provided by implementations of the present invention.
FIG. 3 shows a Bi-LSTM architecture diagram provided by an embodiment of the present invention.
FIG. 4 illustrates a DDPG-based fuel cell vehicle energy management architecture diagram provided by an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
1.1 Fuel cell automobile Power System architecture:
the invention relates to a fuel cell automobile, which is powered by a power battery and a fuel cell and driven by a driving motor. The fuel cell system is connected to the main circuit after being stabilized by the DC/DC converter, the power battery is directly connected to the main circuit, and the motor controller controls the driving motor to drive the vehicle to run after the power of the fuel cell and the power of the power battery are output; as shown in fig. 1:
1.2 longitudinal dynamics model:
the invention is a fuel cell automobile energy management problem, the transverse stability is not considered temporarily, and the longitudinal dynamics of the automobile needs to be analyzed and modeled. The vehicle receives four kinds of resistance, air resistance, ramp resistance, rolling resistance and acceleration resistance during running.
Driving force of whole vehicleCan be expressed as:
;
wherein, In order to provide air resistance, the air resistance,For the resistance of the ramp to be the same,In order to provide a rolling resistance,Is acceleration resistance; the calculation equations are respectively as follows:
;
;
;
;
wherein, The mass of the fuel cell automobile; gravitational acceleration; Road grade; Is the air density; Is the forward windward area of the vehicle; is the air resistance coefficient; Is the speed of the vehicle; Is the rolling resistance coefficient; The rotation mass conversion coefficient.
1.3 Lithium battery model:
The lithium battery model adopts a first-order Rint model, the lithium battery is equivalent to an ideal voltage source and a structure connected with an internal resistor in series, and meanwhile, the polarization phenomenon of the battery is ignored, so that the structure is simple. In this model, the output power of the power cell The method comprises the following steps:
;
Battery current The method comprises the following steps:
;
the updating of the state of charge (SOC) of the battery adopts an ampere-hour integration method:
;
wherein, Is an open circuit voltage; Is a current; is the internal resistance of the battery.
1.4 Fuel cell model:
in the process of constructing a fuel cell hydrogen consumption model, the fuel cell hydrogen consumption model is expressed as:
;
In the above-mentioned method, the step of, For the power of the fuel cell system,For the efficiency of the fuel cell system,Is the lower heat value of hydrogen。
1.5 Fuel cell degradation model:
The degree of degradation of a fuel cell stack is evaluated taking into account fuel cell degradation primarily in view of four major degradation conditions. Degradation mechanisms under dynamic cycling, degradation mechanisms under start/stop cycling, degradation mechanisms under idle conditions, degradation mechanisms under high power loads. Expressed by the formula:
;
Degradation mechanism under start/stop cycle:
;
degradation mechanism under dynamic cycling:
;
Degradation mechanism under idle conditions:
;
Degradation mechanism under high power load:
;
Wherein a, b, c, d are constants, which are empirical parameters obtained by calibration in experiments. The remaining life of the fuel cell after normalization is expressed as:
;
Referring to fig. 2, as an embodiment of the present invention, a fuel cell vehicle reinforcement learning energy management method considering condition prediction is provided, and includes the following steps;
S01, constructing comprehensive driving working condition data capable of reflecting various road working condition characteristics by adopting four different types of classical driving working conditions, dividing the comprehensive driving working condition data into segments according to each segment of 100S, carrying out Principal Component Analysis (PCA) by extracting working condition characteristic parameters of the working conditions, clustering driving cycle segments by a K-means clustering method, and clustering the driving cycle segments into three working conditions of low speed, medium speed and high speed;
s02, after three different types of driving cycle segment sets are obtained, offline training is carried out on a working condition identification module based on Bi-LSTM and a vehicle speed prediction module based on Markov;
S03, extracting working condition characteristic parameters of a vehicle speed prediction sequence, such as average vehicle speed and standard deviation of the vehicle speed, and converting the working condition characteristic parameters into an equivalent factor regulating coefficient And willThe fuel cell power P fc is input as a state to the DDPG energy management algorithm as an action and appropriate bonus functions including hydrogen consumption, SOC fluctuations and fuel cell degradation are set to achieve the optimization problem of comprehensively considering achieving multiple objectives.
In this embodiment, the step of constructing the comprehensive driving condition data capable of reflecting various road condition features by adopting four different types of classical driving conditions includes:
Preprocessing working condition data: the driving condition of the automobile is obtained by post-processing according to actual driving data, and the trend of time along with speed change is reflected. Firstly, in order to comprehensively reflect road driving conditions and improve the accuracy of identifying all road working conditions, a driving cycle of a sample is constructed, four typical road working conditions NEDC, WLTC, CLTC-P, FTP are combined to be used as comprehensive driving working condition three-dimensional data capable of reflecting various road working condition characteristics and used as an algorithm for identifying and predicting subsequent training working conditions;
NEDC (New European DRIVING CYCLE) operating conditions are European endurance standard test operating conditions, including 4 urban and 1 suburban cycles; wherein the urban working condition is 780 seconds, and the highest speed is 50km/h; suburban working conditions are 400 seconds, and the highest speed is 120km/h;
The WLTC standard working condition is closer to the actual road driving condition, the complete test cycle of the WLTC standard working condition is composed of 4 stages of low speed, medium speed, high speed and super speed, and the total duration is 1800s, wherein the idle time is 235s, the stroke is 23266m, the average vehicle speed is 46.5km/h, and the highest vehicle speed is 131.3km/h;
The test working conditions of the CLTC-P (CHINA LIGHT-duty VEHICLE TEST CYCLE-PASSENGER CAR) passenger car comprise 3 speed intervals of low speed, medium speed and high speed, wherein the total duration is 1800s, the total mileage is 14480m, the maximum speed is 114km/h, and the average speed is 28.96km/h;
FTP75 (FEDERAL TEST process) is a standard issued by the united states energy agency for testing the economy and emissions of passenger cars in urban conditions for assessing emissions and fuel economy of light vehicles and light trucks; the complete FTP75 working condition cycle driving time is 1874s, the theoretical driving distance is 17.77km, the average vehicle speed is 34.12km/h, and the highest vehicle speed is 91.25km/h, and the complete FTP75 working condition cycle driving time comprises a cold start transient stage, a steady state stage and a hot start transient stage 3 part.
In this embodiment, the step of performing the segment division on the comprehensive driving condition according to each segment of 100s, and performing the Principal Component Analysis (PCA) by extracting the condition characteristic parameters of the condition includes:
The comprehensive driving condition data constructed through 4 typical road conditions cannot be directly applied to the condition recognition model and the training of the vehicle speed prediction Markov transfer matrix, the comprehensive driving condition data is required to be subjected to sectional processing, and can be assumed to be 100s as a section; the characteristic parameters of the working conditions can reflect the characteristic information of the driving cycle segment of each driving cycle segment; the characteristics of each driving cycle segment are described by 14 parameters, namely an average speed, a maximum speed, an average acceleration, a maximum acceleration, an average deceleration, a maximum deceleration, an idle speed time ratio, an acceleration time ratio, a deceleration time ratio, a uniform speed time ratio, a speed standard deviation, an acceleration standard deviation, a driving mileage and a running time;
however, 14 working condition characteristic parameters are too many, information overlapping and correlation exist among the working condition characteristic parameters, and the calculated amount is large, so that the analysis is not facilitated. In order to reduce the complexity of analysis, main component analysis (PCA) is adopted to perform dimension reduction treatment;
For 100s of comprehensive driving condition data with 14 condition characteristic parameters, reducing the dimension to 3 dimensions, and performing dimension reduction processing on the principal component analysis, wherein the steps comprise:
1) Forming the original data into a matrix X of 100 rows and 14 columns according to columns;
2) Zero-equalizing each row of X, i.e. subtracting the average value of the row;
3) Obtaining covariance matrix ;
4) Obtaining eigenvalues and corresponding eigenvectors of the covariance matrix;
5) Arranging the eigenvectors into a matrix according to the corresponding eigenvalues from top to bottom, and taking the first 3 rows to form a matrix P;
6)、 Namely, the data after dimension reduction to 3 dimensions;
In the process, in order to ensure the integrity of the information after the dimension reduction, the selected main components are required to meet the condition that the accumulated contribution rate reaches more than 80%, the first 3 main components with the largest contribution rate are selected finally, and the accumulated contribution rate reaches more than 80%, so that the characteristic information of most working conditions is covered.
In this embodiment, the step of clustering the driving cycle segments by the K-means clustering method includes:
Assuming a combined operating condition of 10000s, it can be divided into 100 driving cycle segments. The 100 driving cycle segments after PCA dimension reduction processing are divided into three types of working conditions of low speed, medium speed and high speed so as to train three types of Markov transfer matrixes in a vehicle speed prediction module. Classifying driving cycle fragments by using a K-means clustering algorithm, wherein in the process, we determine the classification quantity K=3, assign an initial clustering center point for each class, and calculate the Euclidean distance from each sample point to the clustering center;
;
Dividing each point into various types according to the distance minimum principle, and after the first clustering is completed, recalculating the clustering center of each cluster, wherein the new clustering center is the average value of all data points in the cluster, and the calculation formula is as follows:
;
Wherein, Is the firstA collection of data points of a cluster,The number of data points in the set is the number of data points, and the clustering points are updated and redistributed until the clustering center tends to be stable.
Finally, after working condition characteristic parameters of the working condition data are subjected to dimension reduction processing through PCA to obtain the working condition data containing 3 pieces of principal component information, dividing 100 road driving cycle segments into 3 classes through a K-means clustering algorithm, and carrying out vehicle speed working condition prediction by specifically training a Markov transfer matrix.
In this embodiment, after the three different types of driving cycle segment sets are obtained, the step of performing offline training on the Bi-LSTM-based condition recognition module and the markov-based vehicle speed prediction module includes:
The characteristic of the complex road working condition is considered, and the working condition identification module is added before the vehicle speed prediction module, so that the vehicle speed prediction accuracy can be effectively improved. The LSTM network is designed for solving the problem that the gradient vanishes or gradient explodes easily when the traditional RNN (recurrent neural network) processes long sequences, the road condition recognition model of the invention adopts a Bi-LSTM method based on a Bi-directional long-short-term memory network, the Bi-LSTM architecture is shown in figure 3, the Bi-LSTM consists of forward LSTM and backward LSTM, the forward LSTM processes the input sequence according to time sequence, captures the information from the past to the future, the backward LSTM processes the input sequence from the latest time point to the earliest time point, and captures the information never coming to the past; this means that the model not only can predict the current vehicle speed working condition according to the past running state, but also can refer to the future running state information, more comprehensively capture the variation trend of parameters such as vehicle speed, acceleration and the like, adapt to some abrupt change road working conditions, and make more accurate judgment. Each LSTM of the bidirectional structure internally comprises an input gate, a forget gate, an output gate and a unit state; the input of the model is 14 working condition characteristic parameters, and the output is a working condition mode, which is divided into three types of low speed, medium speed and high speed; compared with the traditional LSTM, the method has the advantages that the prediction effect is more accurate, and the adaptability to the identification of the working condition of the complex road is stronger. The LSTM architecture is:
forgetting the door: the forgetting gate decides which information is forgotten, and the expression is:
;
wherein, Is a sigmoid function of the number of bits,Is the weight coefficient of the forgetting gate,Representing the hidden state of the previous momentAnd the current input stateWhereinIs 14 working condition characteristic parameters including real-time speed, acceleration and the like of the vehicle,Is a bias term;
An input door: the input gate determines which new information needs to be added to the cell state;
;
;
wherein, AndIs a weight matrix of an input gate, the dimension is adjusted according to the number of the input parameters,AndIs a bias term that is used to determine,Is a candidate cell state;
cell state update: updating the cell state by combining the results of the forgetting gate and the input gate;
;
Output door: deciding what value to output based on the updated cell state;
;
;
wherein, Is the weight parameter of the output gate,Is a bias term that is used to determine,Is the output state at the current moment;
Bi-LSTM computes a Bi-directional hidden state by running LSTM layers forward and backward along the time axis, forward LSTM sequentially from the first element to the last element of the sequence, and backward LSTM, the two hidden states being concatenated together to form a final Bi-directional hidden state, the Bi-directional LSTM capturing contextual information before and after each time step in the sequence, thereby providing a more comprehensive characterization;
In a vehicle speed working condition prediction module, a Markov vehicle speed prediction method is adopted, three working condition modes of low speed, medium speed and high speed are identified according to working conditions, each mode corresponds to a Markov transition matrix, three Markov matrixes (the low-speed Markov transition matrix, the medium-speed Markov transition matrix and the high-speed Markov transition matrix) are pertinently trained offline through three driving cycle segments clustered in the step S01, a prediction step length is set to be n, and the Markov outputs n steps of predicted vehicle speed information;
after the vehicle speed prediction considering the working condition recognition is completed, a section of p-step predicted vehicle speed sequence is output.
Referring to fig. 4, in the present embodiment, in step S03, DDPG algorithm is based on an Actor-Critic framework, so that the algorithm contains Actor and Critic networks, and each network has its corresponding target network, so that the DDPG algorithm includes four networks, namely Actor networkCritic networkTarget Actor networkAnd TARGET CRITIC networksThe updating process of DDPG algorithm, the updating mode of the target network and the purpose of introducing the target network;
DDPG the training process includes:
Initializing critic a network And actor networksParameters of (2)And;
The corresponding target network parameters are initialized,,;
Initializing experience playback D, setting the capacity as K, and setting the number of circulating wheels as M;
According to the current state Input to the current network according to the strategy and noise of the current networkThe action is selected so that the user can select,Generates an actionAfter that, get rewardsAnd go to the next stateAnd store the state action pairsInto experience playback D;
Removing a random batch from D Target value of;
Wherein the method comprises the steps ofIs a cost function of the next time instant,Is a function of the policy at the next moment,、Is the network weight parameter at the next moment;
updating critic parameters, wherein the loss is as follows: ;
Updating actor network: ;
Soft update target network: 。
In the embodiment, in step S03, a DDPG deep reinforcement learning energy management strategy integrating the working condition identification and the vehicle speed working condition prediction is used;
A set of state:
;
the power is required for the whole vehicle, FOH is the life of the fuel cell in the state of power cell;
Setting an action set:
;
power for the fuel cell; compared with an energy management algorithm based on DQN and Q-learning to optimize the Q table, the motion space of the energy management algorithm based on DDPG is continuous, so that the power output requirement of an actual fuel cell can be more met;
Setting a reward function:
;
R is the function of the prize to be awarded, Hydrogen consumption rate for fuel cells; lambda is an equivalent factor adjustment coefficient calculated by the p-step predicted vehicle speed sequence output from the vehicle speed predicting section,In the training process, an optimal energy management strategy is achieved by selecting a proper k value; after training, lambda is adjusted in real time through a predicted vehicle speed sequence which is specifically received by lambda in the actual running process; wherein the method comprises the steps ofTo predict the standard deviation of the vehicle speed for the vehicle speed sequence,Average vehicle speed for the predicted vehicle speed sequence; The power is output for the power battery; LHV is the lower heating value of hydrogen; as the fuel cell life degradation value, To account for a weighting factor of fuel cell life degradation;
in the process of realizing DDPG-based energy management, firstly, inputting a state into an Actor of an Agent by an environment, selecting a corresponding action by the Agent, obtaining a real-time rewarding state S' after transition to a later moment, storing experience samples in a sample pool, removing earliest experience after the samples in the sample pool reach a preset number, continuously enabling an Agent to execute the action by a DDPG algorithm, collecting data interacted with the environment, including the state and rewarding, updating network parameters, enabling the Agent to gradually learn an optimal action strategy, and realizing optimal energy efficiency optimization.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.
Claims (7)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411462619.4A CN118981620B (en) | 2024-10-19 | 2024-10-19 | A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411462619.4A CN118981620B (en) | 2024-10-19 | 2024-10-19 | A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118981620A true CN118981620A (en) | 2024-11-19 |
| CN118981620B CN118981620B (en) | 2025-02-25 |
Family
ID=93453733
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411462619.4A Active CN118981620B (en) | 2024-10-19 | 2024-10-19 | A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118981620B (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119502719A (en) * | 2025-01-13 | 2025-02-25 | 南京恒天领锐汽车有限公司 | A driving and braking control method for a pure electric commercial vehicle driven by dual motors on front and rear axles |
| CN119653617A (en) * | 2025-02-18 | 2025-03-18 | 深圳市盛鸿运科技有限公司 | A method and device for preparing a new energy vehicle power circuit board |
| CN120450386A (en) * | 2025-07-09 | 2025-08-08 | 江西五十铃汽车有限公司 | A new energy vehicle energy control method and system |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070112475A1 (en) * | 2005-11-17 | 2007-05-17 | Motility Systems, Inc. | Power management systems and devices |
| WO2020165509A1 (en) * | 2019-02-15 | 2020-08-20 | Hutchinson | Electric energy management system |
| KR102327413B1 (en) * | 2021-02-25 | 2021-11-16 | 국민대학교산학협력단 | Power managing apparatus and method for the same |
| CN117922382A (en) * | 2023-06-05 | 2024-04-26 | 吉林大学 | Fuel cell automobile energy management method based on map path planning |
| CN118386952A (en) * | 2024-06-26 | 2024-07-26 | 合肥工业大学 | A fuel cell vehicle energy management method and system based on operating condition identification |
-
2024
- 2024-10-19 CN CN202411462619.4A patent/CN118981620B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070112475A1 (en) * | 2005-11-17 | 2007-05-17 | Motility Systems, Inc. | Power management systems and devices |
| WO2020165509A1 (en) * | 2019-02-15 | 2020-08-20 | Hutchinson | Electric energy management system |
| KR102327413B1 (en) * | 2021-02-25 | 2021-11-16 | 국민대학교산학협력단 | Power managing apparatus and method for the same |
| CN117922382A (en) * | 2023-06-05 | 2024-04-26 | 吉林大学 | Fuel cell automobile energy management method based on map path planning |
| CN118386952A (en) * | 2024-06-26 | 2024-07-26 | 合肥工业大学 | A fuel cell vehicle energy management method and system based on operating condition identification |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119502719A (en) * | 2025-01-13 | 2025-02-25 | 南京恒天领锐汽车有限公司 | A driving and braking control method for a pure electric commercial vehicle driven by dual motors on front and rear axles |
| CN119653617A (en) * | 2025-02-18 | 2025-03-18 | 深圳市盛鸿运科技有限公司 | A method and device for preparing a new energy vehicle power circuit board |
| CN120450386A (en) * | 2025-07-09 | 2025-08-08 | 江西五十铃汽车有限公司 | A new energy vehicle energy control method and system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118981620B (en) | 2025-02-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113034210B (en) | Vehicle running cost evaluation method based on data driving scene | |
| CN118981620B (en) | A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction | |
| Feng et al. | Energy consumption prediction strategy for electric vehicle based on LSTM-transformer framework | |
| Song et al. | Multi-mode energy management strategy for fuel cell electric vehicles based on driving pattern identification using learning vector quantization neural network algorithm | |
| CN112327168A (en) | XGboost-based electric vehicle battery consumption prediction method | |
| CN113715805B (en) | A method of energy management based on rule fusion and deep reinforcement learning based on working condition identification | |
| CN108819934A (en) | A kind of power distribution control method of hybrid vehicle | |
| Soo et al. | Machine learning based battery pack health prediction using real-world data | |
| Shi et al. | A cloud-based energy management strategy for hybrid electric city bus considering real-time passenger load prediction | |
| He et al. | Adaptive energy management strategy for Extended Range Electric Vehicles under complex road conditions based on RF-IGWO and MGO algorithms | |
| Lee et al. | Learning to recognize driving patterns for collectively characterizing electric vehicle driving behaviors | |
| Montazeri-Gh et al. | Driving condition recognition for genetic-fuzzy HEV control | |
| CN119428342A (en) | A multifunctional controller for new energy vehicles and new energy vehicles | |
| CN113552803A (en) | Energy management method based on working condition identification | |
| Hasib et al. | Driving range prediction of electric vehicles: A machine learning approach | |
| CN116946107A (en) | Hybrid system mode decision and power distribution method under energy track following | |
| Nguyen et al. | Optimal energy management strategy based on driving pattern recognition for a dual-motor dual-source electric vehicle | |
| Deptuła et al. | Application of a decision classifier tree to evaluate energy consumption of an electric vehicle under real traffic conditions | |
| CN120207123A (en) | A method for predicting the remaining driving range of pure electric vehicles based on real vehicle data | |
| CN119936665A (en) | A lithium battery charging time prediction method, system, device and storage medium | |
| CN118770009A (en) | A fuel cell vehicle energy management method considering the preceding vehicle following | |
| CN114897065B (en) | Energy-saving driving strategy for high-speed trains based on unit and multivariate fusion prediction model | |
| Li et al. | Prediction of Low-Temperature Energy Consumption and Driving Range of Pure Electric Vehicles Based on the CatBoost Algorithm | |
| Arun et al. | Deep learning-based driving cycle development for Kinta district | |
| Han et al. | Transfer Deep Reinforcement Learning‐Based Energy Management Strategy for Plug‐In Hybrid Electric Heavy‐Duty Trucks under Segmented Usage Scenarios |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |