[go: up one dir, main page]

CN118981620A - A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction - Google Patents

A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction Download PDF

Info

Publication number
CN118981620A
CN118981620A CN202411462619.4A CN202411462619A CN118981620A CN 118981620 A CN118981620 A CN 118981620A CN 202411462619 A CN202411462619 A CN 202411462619A CN 118981620 A CN118981620 A CN 118981620A
Authority
CN
China
Prior art keywords
speed
fuel cell
lstm
driving
energy management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411462619.4A
Other languages
Chinese (zh)
Other versions
CN118981620B (en
Inventor
朱仲文
张梓睿
张梓迟
佟强
李丞
王维志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202411462619.4A priority Critical patent/CN118981620B/en
Publication of CN118981620A publication Critical patent/CN118981620A/en
Application granted granted Critical
Publication of CN118981620B publication Critical patent/CN118981620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60LPROPULSION OF ELECTRICALLY-PROPELLED VEHICLES; SUPPLYING ELECTRIC POWER FOR AUXILIARY EQUIPMENT OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRODYNAMIC BRAKE SYSTEMS FOR VEHICLES IN GENERAL; MAGNETIC SUSPENSION OR LEVITATION FOR VEHICLES; MONITORING OPERATING VARIABLES OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRIC SAFETY DEVICES FOR ELECTRICALLY-PROPELLED VEHICLES
    • B60L58/00Methods or circuit arrangements for monitoring or controlling batteries or fuel cells, specially adapted for electric vehicles
    • B60L58/30Methods or circuit arrangements for monitoring or controlling batteries or fuel cells, specially adapted for electric vehicles for monitoring or controlling fuel cells
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60LPROPULSION OF ELECTRICALLY-PROPELLED VEHICLES; SUPPLYING ELECTRIC POWER FOR AUXILIARY EQUIPMENT OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRODYNAMIC BRAKE SYSTEMS FOR VEHICLES IN GENERAL; MAGNETIC SUSPENSION OR LEVITATION FOR VEHICLES; MONITORING OPERATING VARIABLES OF ELECTRICALLY-PROPELLED VEHICLES; ELECTRIC SAFETY DEVICES FOR ELECTRICALLY-PROPELLED VEHICLES
    • B60L2240/00Control parameters of input or output; Target parameters
    • B60L2240/40Drive Train control parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Sustainable Development (AREA)
  • Sustainable Energy (AREA)
  • Power Engineering (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)
  • Fuel Cell (AREA)

Abstract

The invention provides a fuel cell automobile reinforcement learning energy management method considering working condition prediction, which comprises the following steps: constructing comprehensive driving condition data by adopting four different types of classical driving conditions, dividing the comprehensive driving condition data into segments, analyzing principal components by extracting characteristic parameters of the working conditions, and clustering driving cycle segments by a K-means clustering method; after three different types of driving cycle segment sets are obtained, an off-line training is carried out on a working condition identification module based on Bi-LSTM and a vehicle speed prediction module based on Markov, wherein the vehicle speed prediction module selects data sets representing three different road characteristics to respectively train three different Markov matrixes; extracting working condition characteristic parameters of a vehicle speed prediction sequence, and converting the working condition characteristic parameters into an equivalent factor regulating coefficientAnd willAs a status input to DDPG the energy management algorithm, the fuel cell power P fc acts to set the appropriate bonus function to achieve comprehensive consideration for achieving multi-objective optimization.

Description

Fuel cell automobile reinforcement learning energy management method considering working condition prediction
Technical Field
The invention relates to the technical field of fuel cells, in particular to a reinforced learning energy management method of a fuel cell automobile considering working condition prediction.
Background
The energy exhaustion and environmental crisis problems in the world are increasingly serious, and automobiles are one of important carriers of energy consumption and air pollution, so that energy efficiency optimization, energy conservation and emission reduction of automobiles are hot spots of current research. The hydrogen fuel cell is used as an emerging energy power generation technology, is applied to automobiles, and has the advantages of zero emission, no pollution, low noise, high energy efficiency, renewable energy utilization, rapid fuel filling and the like. Therefore, the technical route has wide prospect in the field of new energy automobiles, and is one of the viable schemes for relieving the problems of non-renewable energy consumption and pollutant gas emission.
At present, the corresponding technology of the domestic hydrogen fuel cell automobile still needs to be improved, and a certain technical gap still exists in the energy management scheme. The energy management (ENERGY MANAGEMENT SYSTEM, EMS) is one of the core technologies of the hydrogen fuel cell automobile, has great influence on the fuel economy, the service life, the power performance and the comfort of the whole automobile, and is also a research difficulty and a hot spot in the field of the fuel cell automobile. Fuel cell automobiles generally adopt an all-electric power system, but if only a fuel cell provides the power of the whole automobile, problems such as time lag response and incapability of recovering braking energy exist, so that the power requirement cannot be met in time, and the energy cannot be recovered. In the current fuel cell car power system architecture, a battery or a super capacitor is added as an auxiliary power source to cope with abrupt power demand and simultaneously perform braking energy recovery. The power system also has a single power source to be changed into multiple power sources, so that the required power is distributed to each energy source by a higher-efficiency energy management strategy, the economy and the power performance of the whole vehicle are ensured, and the performance of the whole vehicle is improved.
If three factors of equivalent hydrogen consumption, SOC offset and fuel cell service life are considered, the rule-based strategy is calibrated by means of expert experience, and global optimization is not achieved. The optimization-based energy management strategy is poor in real-time performance and has limitations for the multi-objective optimization problem. A new strategy is needed to meet the multi-objective, high real-time, optimal energy management needs.
Because the fuel cell is applied to the truck, the complex working conditions such as urban roads, rural roads, expressways and the like are required to be frequently converted, and the complex working condition requirements of the single energy management algorithm cannot be well met.
The existing energy management strategies are various, but focus on the optimization algorithm, but the advantages of combination of vehicle speed working condition prediction and energy management cannot be fully exerted, if the future vehicle speed is considered in the energy management algorithm, and the front and rear driving state information is considered at the same time by using a Bi-directional long-short-term memory network Bi-LSTM, the vehicle has foresight in the driving process, and the accuracy of working condition identification can be effectively improved according to the situation of sudden changes in the front and rear of the vehicle speed working condition, so that the energy efficiency is improved. Therefore, the invention provides a fuel cell vehicle reinforcement learning energy management method considering working condition prediction.
Disclosure of Invention
The invention provides a fuel cell automobile reinforcement learning energy management method considering working condition prediction, which solves the technical problems mentioned in the background art.
The technical scheme adopted for solving the technical problems is as follows:
A fuel cell vehicle reinforcement learning energy management method that considers operating condition predictions, the method comprising the steps of;
S01, constructing comprehensive driving working condition data capable of reflecting various road working condition characteristics by adopting four different types of classical driving working conditions, dividing the comprehensive driving working condition data into segments according to each segment of 100S, carrying out Principal Component Analysis (PCA) by extracting working condition characteristic parameters of the working conditions, clustering driving cycle segments by a K-means clustering method, and clustering the driving cycle segments into three working conditions of low speed, medium speed and high speed;
s02, after three different types of driving cycle segment sets are obtained, offline training is carried out on a working condition identification module based on Bi-LSTM and a vehicle speed prediction module based on Markov;
S03, extracting working condition characteristic parameters of a vehicle speed prediction sequence, such as average vehicle speed and standard deviation of the vehicle speed, and converting the working condition characteristic parameters into an equivalent factor regulating coefficient And willThe fuel cell power P fc is input as a state to the DDPG energy management algorithm as an action and appropriate bonus functions including hydrogen consumption, SOC fluctuations and fuel cell degradation are set to achieve the optimization problem of comprehensively considering achieving multiple objectives.
As a further technical scheme of the invention, the step of constructing comprehensive driving condition data capable of reflecting various road condition characteristics by adopting four different types of classical driving conditions comprises the following steps of:
preprocessing working condition data: the method comprises the steps of combining NEDC, WLTC, CLTC-P, FTP75 typical road working conditions to serve as comprehensive driving working condition data capable of reflecting various road working condition characteristics, and using the comprehensive driving working condition data as an algorithm for identifying and predicting subsequent training working conditions;
NEDC (New European DRIVING CYCLE) operating conditions are European endurance standard test operating conditions, including 4 urban and 1 suburban cycles; wherein the urban working condition is 780 seconds, and the highest speed is 50km/h; suburban working conditions are 400 seconds, and the highest speed is 120km/h;
The WLTC standard working condition is closer to the actual road driving condition, the complete test cycle of the WLTC standard working condition is composed of 4 stages of low speed, medium speed, high speed and super speed, and the total duration is 1800s, wherein the idle time is 235s, the stroke is 23266m, the average vehicle speed is 46.5km/h, and the highest vehicle speed is 131.3km/h;
The test working conditions of the CLTC-P (CHINA LIGHT-duty VEHICLE TEST CYCLE-PASSENGER CAR) passenger car comprise 3 speed intervals of low speed, medium speed and high speed, wherein the total duration is 1800s, the total mileage is 14480m, the maximum speed is 114km/h, and the average speed is 28.96km/h;
FTP75 (FEDERAL TEST process) is a standard issued by the united states energy agency for testing the economy and emissions of passenger cars in urban conditions for assessing emissions and fuel economy of light vehicles and light trucks; the complete FTP75 working condition cycle driving time is 1874s, the theoretical driving distance is 17.77km, the average vehicle speed is 34.12km/h, and the highest vehicle speed is 91.25km/h, and the complete FTP75 working condition cycle driving time comprises a cold start transient stage, a steady state stage and a hot start transient stage 3 part.
As a further technical scheme of the invention, the steps of dividing the comprehensive driving working condition into segments according to each segment of 100s and carrying out Principal Component Analysis (PCA) by extracting working condition characteristic parameters of the working condition comprise the following steps:
Setting the comprehensive driving condition data as 100s as one section; the characteristic parameters of the working conditions can reflect the characteristic information of the driving cycle segment of each driving cycle segment; the characteristics of each driving cycle segment are described by 14 parameters, namely an average speed, a maximum speed, an average acceleration, a maximum acceleration, an average deceleration, a maximum deceleration, an idle speed time ratio, an acceleration time ratio, a deceleration time ratio, a uniform speed time ratio, a speed standard deviation, an acceleration standard deviation, a driving mileage and a running time;
however, 14 working condition characteristic parameters are too many, information overlapping and correlation exist among the working condition characteristic parameters, and the calculated amount is large, so that the analysis is not facilitated. In order to reduce the complexity of analysis, main component analysis (PCA) is adopted to perform dimension reduction treatment;
For 100s of comprehensive driving condition data with 14 condition characteristic parameters, reducing the dimension to 3 dimensions, and performing dimension reduction processing on the principal component analysis, wherein the steps comprise:
1) Forming the original data into a matrix X of 100 rows and 14 columns according to columns;
2) Zero-equalizing each row of X, i.e. subtracting the average value of the row;
3) Obtaining covariance matrix
4) Obtaining eigenvalues and corresponding eigenvectors of the covariance matrix;
5) Arranging the eigenvectors into a matrix according to the corresponding eigenvalues from top to bottom, and taking the first 3 rows to form a matrix P;
6)、 Namely, the data after dimension reduction to 3 dimensions;
In the process, in order to ensure the integrity of the information after the dimension reduction, the selected main components are required to meet the condition that the accumulated contribution rate reaches more than 80%, the first 3 main components with the largest contribution rate are selected finally, and the accumulated contribution rate reaches more than 80%, so that the characteristic information of most working conditions is covered.
As a further technical scheme of the invention, the step of clustering the driving cycle segments by the K-means clustering method comprises the following steps:
Classifying driving cycle fragments by using a K-means clustering algorithm, wherein in the process, we determine the classification quantity K=3, assign an initial clustering center point for each class, and calculate the Euclidean distance from each sample point to the clustering center;
Dividing each point into various types according to the distance minimum principle, and after the first clustering is completed, recalculating the clustering center of each cluster, wherein the new clustering center is the average value of all data points in the cluster, and the calculation formula is as follows:
Wherein, Is the firstA collection of data points of a cluster,The number of data points in the set is the number of data points, and the clustering points are updated and redistributed until the clustering center tends to be stable.
Finally, after working condition characteristic parameters of the working condition data are subjected to dimension reduction processing through PCA to obtain the working condition data containing 3 pieces of principal component information, dividing 100 road driving cycle segments into 3 classes through a K-means clustering algorithm, and carrying out vehicle speed working condition prediction by specifically training a Markov transfer matrix.
As a further technical scheme of the invention, after the three different types of driving cycle segment sets are obtained, the step of performing offline training on the Bi-LSTM-based working condition identification module and the Markov-based vehicle speed prediction module comprises the following steps:
Bi-LSTM is composed of forward LSTM and backward LSTM, the forward LSTM processes the input sequence according to time sequence, captures the information from the past to the future, the backward LSTM processes the input sequence from the latest time point to the earliest time point, captures the information from the past; each LSTM of the bidirectional structure internally comprises an input gate, a forget gate, an output gate and a unit state; the input of the model is 14 working condition characteristic parameters, and the output is a working condition mode, which is divided into three types of low speed, medium speed and high speed; the LSTM architecture is:
forgetting the door: the forgetting gate decides which information is forgotten, and the expression is:
wherein, Is a sigmoid function of the number of bits,Is the weight coefficient of the forgetting gate,Representing the hidden state of the previous momentAnd the current input stateWhereinIs 14 working condition characteristic parameters including real-time speed, acceleration and the like of the vehicle,Is a bias term;
An input door: the input gate determines which new information needs to be added to the cell state;
wherein, AndIs a weight matrix of an input gate, the dimension is adjusted according to the number of the input parameters,AndIs a bias term that is used to determine,Is a candidate cell state;
cell state update: updating the cell state by combining the results of the forgetting gate and the input gate;
Output door: deciding what value to output based on the updated cell state;
wherein, Is the weight parameter of the output gate,Is a bias term that is used to determine,Is the output state at the current moment;
Bi-LSTM computes a Bi-directional hidden state by running LSTM layers forward and backward along the time axis, forward LSTM sequentially from the first element to the last element of the sequence, and backward LSTM, the two hidden states being concatenated together to form a final Bi-directional hidden state, the Bi-directional LSTM capturing contextual information before and after each time step in the sequence, thereby providing a more comprehensive characterization;
In a vehicle speed working condition prediction module, a Markov vehicle speed prediction method is adopted, three working condition modes of low speed, medium speed and high speed are identified according to working conditions, each mode corresponds to a Markov transition matrix, three Markov matrixes (the low-speed Markov transition matrix, the medium-speed Markov transition matrix and the high-speed Markov transition matrix) are pertinently trained offline through three driving cycle segments clustered in the step S01, a prediction step length is set to be n, and the Markov outputs n steps of predicted vehicle speed information;
after the vehicle speed prediction considering the working condition recognition is completed, a section of p-step predicted vehicle speed sequence is output.
As a further technical scheme of the present invention, in step S03, DDPG algorithm is based on an Actor-Critic framework, so that the algorithm contains an Actor and Critic network, and each network has a corresponding target network, so that the DDPG algorithm comprises four networks, namely an Actor networkCritic networkTarget Actor networkAnd TARGET CRITIC networksThe updating process of DDPG algorithm, the updating mode of the target network and the purpose of introducing the target network;
DDPG the training process includes:
Initializing critic a network And actor networksParameters of (2)And
The corresponding target network parameters are initialized,
Initializing experience playback D, setting the capacity as K, and setting the number of circulating wheels as M;
According to the current state Input to the current network according to the strategy and noise of the current networkThe action is selected so that the user can select,Generates an actionAfter that, get rewardsAnd go to the next stateAnd store the state action pairsInto experience playback D;
Removing a random batch from D Target value of
Wherein the method comprises the steps ofIs a cost function of the next time instant,Is a function of the policy at the next moment,Is the network weight parameter at the next moment;
updating critic parameters, wherein the loss is as follows:
Updating actor network:
Soft update target network:
As a further technical scheme of the invention, in step S03, DDPG deep reinforcement learning energy management strategies of working condition identification and vehicle speed working condition prediction are fused;
A set of state:
the power is required for the whole vehicle, FOH is the life of the fuel cell in the state of power cell;
Setting an action set:
Power for the fuel cell;
Setting a reward function:
R is the function of the prize to be awarded, Hydrogen consumption rate for fuel cells; lambda is an equivalent factor adjustment coefficient calculated by the p-step predicted vehicle speed sequence output from the vehicle speed predicting section,In the training process, an optimal energy management strategy is achieved by selecting a proper k value; after training, lambda is adjusted in real time through a predicted vehicle speed sequence which is specifically received by lambda in the actual running process; wherein the method comprises the steps ofTo predict the standard deviation of the vehicle speed for the vehicle speed sequence,Average vehicle speed for the predicted vehicle speed sequence; The power is output for the power battery; LHV is the lower heating value of hydrogen; as the fuel cell life degradation value, To account for a weighting factor of fuel cell life degradation;
in the process of realizing DDPG-based energy management, firstly, inputting a state into an Actor of an Agent by an environment, selecting a corresponding action by the Agent, obtaining a real-time rewarding state S' after transition to a later moment, storing experience samples in a sample pool, removing earliest experience after the samples in the sample pool reach a preset number, continuously enabling an Agent to execute the action by a DDPG algorithm, collecting data interacted with the environment, including the state and rewarding, updating network parameters, enabling the Agent to gradually learn an optimal action strategy, and realizing optimal energy efficiency optimization.
The beneficial effects of the invention are as follows: compared with an energy management algorithm based on rules and optimization, the method adopts an energy management strategy based on a depth deterministic strategy gradient DDPG algorithm. Rule-based algorithms typically rely on preset fixed rules and engineers' experience, and optimization goals and constraints of the optimization algorithm are also relatively fixed, and real-time is relatively poor. The DDPG algorithm is able to adaptively adjust policies through constant interactions and learning with the environment. And in the energy management process, a plurality of factors and uncertainties of mutual association coupling are involved, and the DDPG algorithm can process the problems of high-latitude state space and continuous action space, continuously control the output power of the fuel cell, and can effectively adapt to the energy management scene of the fuel cell automobile. And the optimal solution can be limited by complexity when solving the optimal solution based on the optimized energy management strategy, and only the local optimal solution can be obtained in some cases. And optimization-based algorithms require an accurate mathematical model, whereas DDPG-based algorithms can learn and optimize without knowledge of the system model.
Meanwhile, compared with the traditional DDPG algorithm, the bi-LSTM-based working condition recognition and Markov vehicle speed prediction module is added, so that the bi-LSTM-based working condition recognition and Markov vehicle speed prediction module can adapt to complex road working conditions. And the bi-LSTM algorithm is adopted, compared with the traditional LSTM classification algorithm, the bi-LSTM consists of two LSTM, and bidirectional information can be captured simultaneously, so that the prediction process can consider past and future vehicle speed information simultaneously. For example, when urban working conditions of frequent start and stop are identified, the bi-LSTM can consider the previous start and stop condition and the subsequent possible change trend at the same time, so that the type of the working conditions can be identified more accurately, the identification accuracy can be improved effectively, and information omission can be reduced. After the working condition type is identified based on the bi-LSTM working condition identification model, three well trained Markov state transition matrixes of low-speed, medium-speed and high-speed types are used for carrying out vehicle speed prediction according to different road working conditions in a better pertinence manner, and the vehicle speed prediction precision is improved, so that the energy efficiency of the energy management system is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 shows a schematic diagram of a fuel cell vehicle power system according to an embodiment of the present invention.
FIG. 2 illustrates a reinforcement learning energy management strategy diagram incorporating condition recognition and prediction provided by implementations of the present invention.
FIG. 3 shows a Bi-LSTM architecture diagram provided by an embodiment of the present invention.
FIG. 4 illustrates a DDPG-based fuel cell vehicle energy management architecture diagram provided by an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
1.1 Fuel cell automobile Power System architecture:
the invention relates to a fuel cell automobile, which is powered by a power battery and a fuel cell and driven by a driving motor. The fuel cell system is connected to the main circuit after being stabilized by the DC/DC converter, the power battery is directly connected to the main circuit, and the motor controller controls the driving motor to drive the vehicle to run after the power of the fuel cell and the power of the power battery are output; as shown in fig. 1:
1.2 longitudinal dynamics model:
the invention is a fuel cell automobile energy management problem, the transverse stability is not considered temporarily, and the longitudinal dynamics of the automobile needs to be analyzed and modeled. The vehicle receives four kinds of resistance, air resistance, ramp resistance, rolling resistance and acceleration resistance during running.
Driving force of whole vehicleCan be expressed as:
wherein, In order to provide air resistance, the air resistance,For the resistance of the ramp to be the same,In order to provide a rolling resistance,Is acceleration resistance; the calculation equations are respectively as follows:
wherein, The mass of the fuel cell automobile; gravitational acceleration; Road grade; Is the air density; Is the forward windward area of the vehicle; is the air resistance coefficient; Is the speed of the vehicle; Is the rolling resistance coefficient; The rotation mass conversion coefficient.
1.3 Lithium battery model:
The lithium battery model adopts a first-order Rint model, the lithium battery is equivalent to an ideal voltage source and a structure connected with an internal resistor in series, and meanwhile, the polarization phenomenon of the battery is ignored, so that the structure is simple. In this model, the output power of the power cell The method comprises the following steps:
Battery current The method comprises the following steps:
the updating of the state of charge (SOC) of the battery adopts an ampere-hour integration method:
wherein, Is an open circuit voltage; Is a current; is the internal resistance of the battery.
1.4 Fuel cell model:
in the process of constructing a fuel cell hydrogen consumption model, the fuel cell hydrogen consumption model is expressed as:
In the above-mentioned method, the step of, For the power of the fuel cell system,For the efficiency of the fuel cell system,Is the lower heat value of hydrogen
1.5 Fuel cell degradation model:
The degree of degradation of a fuel cell stack is evaluated taking into account fuel cell degradation primarily in view of four major degradation conditions. Degradation mechanisms under dynamic cycling, degradation mechanisms under start/stop cycling, degradation mechanisms under idle conditions, degradation mechanisms under high power loads. Expressed by the formula:
Degradation mechanism under start/stop cycle:
degradation mechanism under dynamic cycling:
Degradation mechanism under idle conditions:
Degradation mechanism under high power load:
Wherein a, b, c, d are constants, which are empirical parameters obtained by calibration in experiments. The remaining life of the fuel cell after normalization is expressed as:
Referring to fig. 2, as an embodiment of the present invention, a fuel cell vehicle reinforcement learning energy management method considering condition prediction is provided, and includes the following steps;
S01, constructing comprehensive driving working condition data capable of reflecting various road working condition characteristics by adopting four different types of classical driving working conditions, dividing the comprehensive driving working condition data into segments according to each segment of 100S, carrying out Principal Component Analysis (PCA) by extracting working condition characteristic parameters of the working conditions, clustering driving cycle segments by a K-means clustering method, and clustering the driving cycle segments into three working conditions of low speed, medium speed and high speed;
s02, after three different types of driving cycle segment sets are obtained, offline training is carried out on a working condition identification module based on Bi-LSTM and a vehicle speed prediction module based on Markov;
S03, extracting working condition characteristic parameters of a vehicle speed prediction sequence, such as average vehicle speed and standard deviation of the vehicle speed, and converting the working condition characteristic parameters into an equivalent factor regulating coefficient And willThe fuel cell power P fc is input as a state to the DDPG energy management algorithm as an action and appropriate bonus functions including hydrogen consumption, SOC fluctuations and fuel cell degradation are set to achieve the optimization problem of comprehensively considering achieving multiple objectives.
In this embodiment, the step of constructing the comprehensive driving condition data capable of reflecting various road condition features by adopting four different types of classical driving conditions includes:
Preprocessing working condition data: the driving condition of the automobile is obtained by post-processing according to actual driving data, and the trend of time along with speed change is reflected. Firstly, in order to comprehensively reflect road driving conditions and improve the accuracy of identifying all road working conditions, a driving cycle of a sample is constructed, four typical road working conditions NEDC, WLTC, CLTC-P, FTP are combined to be used as comprehensive driving working condition three-dimensional data capable of reflecting various road working condition characteristics and used as an algorithm for identifying and predicting subsequent training working conditions;
NEDC (New European DRIVING CYCLE) operating conditions are European endurance standard test operating conditions, including 4 urban and 1 suburban cycles; wherein the urban working condition is 780 seconds, and the highest speed is 50km/h; suburban working conditions are 400 seconds, and the highest speed is 120km/h;
The WLTC standard working condition is closer to the actual road driving condition, the complete test cycle of the WLTC standard working condition is composed of 4 stages of low speed, medium speed, high speed and super speed, and the total duration is 1800s, wherein the idle time is 235s, the stroke is 23266m, the average vehicle speed is 46.5km/h, and the highest vehicle speed is 131.3km/h;
The test working conditions of the CLTC-P (CHINA LIGHT-duty VEHICLE TEST CYCLE-PASSENGER CAR) passenger car comprise 3 speed intervals of low speed, medium speed and high speed, wherein the total duration is 1800s, the total mileage is 14480m, the maximum speed is 114km/h, and the average speed is 28.96km/h;
FTP75 (FEDERAL TEST process) is a standard issued by the united states energy agency for testing the economy and emissions of passenger cars in urban conditions for assessing emissions and fuel economy of light vehicles and light trucks; the complete FTP75 working condition cycle driving time is 1874s, the theoretical driving distance is 17.77km, the average vehicle speed is 34.12km/h, and the highest vehicle speed is 91.25km/h, and the complete FTP75 working condition cycle driving time comprises a cold start transient stage, a steady state stage and a hot start transient stage 3 part.
In this embodiment, the step of performing the segment division on the comprehensive driving condition according to each segment of 100s, and performing the Principal Component Analysis (PCA) by extracting the condition characteristic parameters of the condition includes:
The comprehensive driving condition data constructed through 4 typical road conditions cannot be directly applied to the condition recognition model and the training of the vehicle speed prediction Markov transfer matrix, the comprehensive driving condition data is required to be subjected to sectional processing, and can be assumed to be 100s as a section; the characteristic parameters of the working conditions can reflect the characteristic information of the driving cycle segment of each driving cycle segment; the characteristics of each driving cycle segment are described by 14 parameters, namely an average speed, a maximum speed, an average acceleration, a maximum acceleration, an average deceleration, a maximum deceleration, an idle speed time ratio, an acceleration time ratio, a deceleration time ratio, a uniform speed time ratio, a speed standard deviation, an acceleration standard deviation, a driving mileage and a running time;
however, 14 working condition characteristic parameters are too many, information overlapping and correlation exist among the working condition characteristic parameters, and the calculated amount is large, so that the analysis is not facilitated. In order to reduce the complexity of analysis, main component analysis (PCA) is adopted to perform dimension reduction treatment;
For 100s of comprehensive driving condition data with 14 condition characteristic parameters, reducing the dimension to 3 dimensions, and performing dimension reduction processing on the principal component analysis, wherein the steps comprise:
1) Forming the original data into a matrix X of 100 rows and 14 columns according to columns;
2) Zero-equalizing each row of X, i.e. subtracting the average value of the row;
3) Obtaining covariance matrix
4) Obtaining eigenvalues and corresponding eigenvectors of the covariance matrix;
5) Arranging the eigenvectors into a matrix according to the corresponding eigenvalues from top to bottom, and taking the first 3 rows to form a matrix P;
6)、 Namely, the data after dimension reduction to 3 dimensions;
In the process, in order to ensure the integrity of the information after the dimension reduction, the selected main components are required to meet the condition that the accumulated contribution rate reaches more than 80%, the first 3 main components with the largest contribution rate are selected finally, and the accumulated contribution rate reaches more than 80%, so that the characteristic information of most working conditions is covered.
In this embodiment, the step of clustering the driving cycle segments by the K-means clustering method includes:
Assuming a combined operating condition of 10000s, it can be divided into 100 driving cycle segments. The 100 driving cycle segments after PCA dimension reduction processing are divided into three types of working conditions of low speed, medium speed and high speed so as to train three types of Markov transfer matrixes in a vehicle speed prediction module. Classifying driving cycle fragments by using a K-means clustering algorithm, wherein in the process, we determine the classification quantity K=3, assign an initial clustering center point for each class, and calculate the Euclidean distance from each sample point to the clustering center;
Dividing each point into various types according to the distance minimum principle, and after the first clustering is completed, recalculating the clustering center of each cluster, wherein the new clustering center is the average value of all data points in the cluster, and the calculation formula is as follows:
Wherein, Is the firstA collection of data points of a cluster,The number of data points in the set is the number of data points, and the clustering points are updated and redistributed until the clustering center tends to be stable.
Finally, after working condition characteristic parameters of the working condition data are subjected to dimension reduction processing through PCA to obtain the working condition data containing 3 pieces of principal component information, dividing 100 road driving cycle segments into 3 classes through a K-means clustering algorithm, and carrying out vehicle speed working condition prediction by specifically training a Markov transfer matrix.
In this embodiment, after the three different types of driving cycle segment sets are obtained, the step of performing offline training on the Bi-LSTM-based condition recognition module and the markov-based vehicle speed prediction module includes:
The characteristic of the complex road working condition is considered, and the working condition identification module is added before the vehicle speed prediction module, so that the vehicle speed prediction accuracy can be effectively improved. The LSTM network is designed for solving the problem that the gradient vanishes or gradient explodes easily when the traditional RNN (recurrent neural network) processes long sequences, the road condition recognition model of the invention adopts a Bi-LSTM method based on a Bi-directional long-short-term memory network, the Bi-LSTM architecture is shown in figure 3, the Bi-LSTM consists of forward LSTM and backward LSTM, the forward LSTM processes the input sequence according to time sequence, captures the information from the past to the future, the backward LSTM processes the input sequence from the latest time point to the earliest time point, and captures the information never coming to the past; this means that the model not only can predict the current vehicle speed working condition according to the past running state, but also can refer to the future running state information, more comprehensively capture the variation trend of parameters such as vehicle speed, acceleration and the like, adapt to some abrupt change road working conditions, and make more accurate judgment. Each LSTM of the bidirectional structure internally comprises an input gate, a forget gate, an output gate and a unit state; the input of the model is 14 working condition characteristic parameters, and the output is a working condition mode, which is divided into three types of low speed, medium speed and high speed; compared with the traditional LSTM, the method has the advantages that the prediction effect is more accurate, and the adaptability to the identification of the working condition of the complex road is stronger. The LSTM architecture is:
forgetting the door: the forgetting gate decides which information is forgotten, and the expression is:
wherein, Is a sigmoid function of the number of bits,Is the weight coefficient of the forgetting gate,Representing the hidden state of the previous momentAnd the current input stateWhereinIs 14 working condition characteristic parameters including real-time speed, acceleration and the like of the vehicle,Is a bias term;
An input door: the input gate determines which new information needs to be added to the cell state;
wherein, AndIs a weight matrix of an input gate, the dimension is adjusted according to the number of the input parameters,AndIs a bias term that is used to determine,Is a candidate cell state;
cell state update: updating the cell state by combining the results of the forgetting gate and the input gate;
Output door: deciding what value to output based on the updated cell state;
wherein, Is the weight parameter of the output gate,Is a bias term that is used to determine,Is the output state at the current moment;
Bi-LSTM computes a Bi-directional hidden state by running LSTM layers forward and backward along the time axis, forward LSTM sequentially from the first element to the last element of the sequence, and backward LSTM, the two hidden states being concatenated together to form a final Bi-directional hidden state, the Bi-directional LSTM capturing contextual information before and after each time step in the sequence, thereby providing a more comprehensive characterization;
In a vehicle speed working condition prediction module, a Markov vehicle speed prediction method is adopted, three working condition modes of low speed, medium speed and high speed are identified according to working conditions, each mode corresponds to a Markov transition matrix, three Markov matrixes (the low-speed Markov transition matrix, the medium-speed Markov transition matrix and the high-speed Markov transition matrix) are pertinently trained offline through three driving cycle segments clustered in the step S01, a prediction step length is set to be n, and the Markov outputs n steps of predicted vehicle speed information;
after the vehicle speed prediction considering the working condition recognition is completed, a section of p-step predicted vehicle speed sequence is output.
Referring to fig. 4, in the present embodiment, in step S03, DDPG algorithm is based on an Actor-Critic framework, so that the algorithm contains Actor and Critic networks, and each network has its corresponding target network, so that the DDPG algorithm includes four networks, namely Actor networkCritic networkTarget Actor networkAnd TARGET CRITIC networksThe updating process of DDPG algorithm, the updating mode of the target network and the purpose of introducing the target network;
DDPG the training process includes:
Initializing critic a network And actor networksParameters of (2)And
The corresponding target network parameters are initialized,
Initializing experience playback D, setting the capacity as K, and setting the number of circulating wheels as M;
According to the current state Input to the current network according to the strategy and noise of the current networkThe action is selected so that the user can select,Generates an actionAfter that, get rewardsAnd go to the next stateAnd store the state action pairsInto experience playback D;
Removing a random batch from D Target value of
Wherein the method comprises the steps ofIs a cost function of the next time instant,Is a function of the policy at the next moment,Is the network weight parameter at the next moment;
updating critic parameters, wherein the loss is as follows:
Updating actor network:
Soft update target network:
In the embodiment, in step S03, a DDPG deep reinforcement learning energy management strategy integrating the working condition identification and the vehicle speed working condition prediction is used;
A set of state:
the power is required for the whole vehicle, FOH is the life of the fuel cell in the state of power cell;
Setting an action set:
power for the fuel cell; compared with an energy management algorithm based on DQN and Q-learning to optimize the Q table, the motion space of the energy management algorithm based on DDPG is continuous, so that the power output requirement of an actual fuel cell can be more met;
Setting a reward function:
R is the function of the prize to be awarded, Hydrogen consumption rate for fuel cells; lambda is an equivalent factor adjustment coefficient calculated by the p-step predicted vehicle speed sequence output from the vehicle speed predicting section,In the training process, an optimal energy management strategy is achieved by selecting a proper k value; after training, lambda is adjusted in real time through a predicted vehicle speed sequence which is specifically received by lambda in the actual running process; wherein the method comprises the steps ofTo predict the standard deviation of the vehicle speed for the vehicle speed sequence,Average vehicle speed for the predicted vehicle speed sequence; The power is output for the power battery; LHV is the lower heating value of hydrogen; as the fuel cell life degradation value, To account for a weighting factor of fuel cell life degradation;
in the process of realizing DDPG-based energy management, firstly, inputting a state into an Actor of an Agent by an environment, selecting a corresponding action by the Agent, obtaining a real-time rewarding state S' after transition to a later moment, storing experience samples in a sample pool, removing earliest experience after the samples in the sample pool reach a preset number, continuously enabling an Agent to execute the action by a DDPG algorithm, collecting data interacted with the environment, including the state and rewarding, updating network parameters, enabling the Agent to gradually learn an optimal action strategy, and realizing optimal energy efficiency optimization.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (7)

1.一种考虑工况预测的燃料电池汽车强化学习能量管理方法,其特征在于,所述方法包括以下步骤;1. A fuel cell vehicle reinforcement learning energy management method considering operating condition prediction, characterized in that the method comprises the following steps; S01、采用四种不同类型的经典行驶工况构建能反应各种道路工况特征的综合行驶工况数据,对综合行驶工况数据按照每段100s进行片段划分,通过提取工况的工况特征参数进行主成分分析,通过K-means聚类方法对驾驶循环片段聚类,聚类成低速、中速、高速工况三种;S01. Four different types of classic driving conditions are used to construct comprehensive driving condition data that can reflect the characteristics of various road conditions. The comprehensive driving condition data is divided into segments of 100 seconds each. The principal component analysis is performed by extracting the characteristic parameters of the driving conditions. The driving cycle segments are clustered by the K-means clustering method and clustered into three types: low-speed, medium-speed, and high-speed conditions. S02、得到三种不同类型的驾驶循环片段集合后,对基于Bi-LSTM的工况识别模块以及基于马尔可夫的车速预测模块进行离线训练;S02, after obtaining three different types of driving cycle segments, offline training is performed on the Bi-LSTM-based working condition recognition module and the Markov-based vehicle speed prediction module; S03、提取车速预测序列的工况特征参数,转化为等效因子调节系数,并将作为一种状态输入到DDPG的能量管理算法当中,燃料电池功率Pfc作为动作,并且设置合适的奖励函数,包括氢气消耗、SOC波动和燃料电池退化,以达到综合考虑实现多目标的优化问题。S03. Extract the working condition characteristic parameters of the vehicle speed prediction sequence and convert them into equivalent factor adjustment coefficients , and As a state input into the energy management algorithm of DDPG, the fuel cell power P fc is used as an action, and a suitable reward function is set, including hydrogen consumption, SOC fluctuation and fuel cell degradation, to achieve a comprehensive consideration of the optimization problem of achieving multiple objectives. 2.根据权利要求1所述的一种考虑工况预测的燃料电池汽车强化学习能量管理方法,其特征在于,所述采用四种不同类型的经典行驶工况构建能反应各种道路工况特征的综合行驶工况数据的步骤包括:2. A fuel cell vehicle reinforcement learning energy management method considering operating condition prediction according to claim 1, characterized in that the step of using four different types of classic driving conditions to construct comprehensive driving condition data that can reflect various road condition characteristics comprises: 将NEDC、WLTC、CLTC-P、FTP75四种典型道路工况组合起来,作为一条能反应各种道路工况特征的综合行驶工况数据,用于后续训练工况识别及预测的算法;The four typical road conditions of NEDC, WLTC, CLTC-P and FTP75 are combined as a comprehensive driving condition data that can reflect the characteristics of various road conditions, which is used for subsequent training of condition recognition and prediction algorithms; NEDC工况是欧洲续航标准测试工况,包含4个市区循环和1个郊区循环;其中市区工况共780秒,最高车速50km/h;郊区工况400秒,最高车速120km/h;The NEDC cycle is the European endurance standard test cycle, which includes 4 urban cycles and 1 suburban cycle. The urban cycle lasts for 780 seconds with a maximum speed of 50 km/h; the suburban cycle lasts for 400 seconds with a maximum speed of 120 km/h. WLTC标准工况更接近实际道路驾驶条件,其完整的测试循环由低速、中速、高速、超高速4个阶段组成,总共历时1800s,其中怠速时间235s、行程23266m、平均车速46.5km/h、最高车速131.3km/h;The WLTC standard working condition is closer to the actual road driving conditions. Its complete test cycle consists of four stages: low speed, medium speed, high speed and ultra-high speed, which lasts a total of 1800s, including 235s of idling time, 23266m of travel, an average speed of 46.5km/h and a maximum speed of 131.3km/h. CLTC-P乘用车测试工况,包括低速、中速、高速3个速度区间,时长共计1800s、总里程为14480m、最大速度为114km/h 、平均速度为28.96km/h;CLTC-P passenger car test conditions include three speed ranges: low speed, medium speed and high speed, with a total duration of 1800s, a total mileage of 14480m, a maximum speed of 114km/h and an average speed of 28.96km/h; FTP75是美国能源署颁布的用于测试乘用车在市区工况的经济性和排放的标准,用于评估轻型汽车和轻型货车的排放和燃油经济性;完整的FTP75工况循环驾驶时间为1874s,理论行驶距离17.77km,平均车速34.12km/h,最高车速91.25km/h,包括冷起动瞬态阶段、稳态阶段、热起动瞬态阶段3部分。FTP75 is a standard issued by the U.S. Energy Administration for testing the economy and emissions of passenger cars in urban conditions. It is used to evaluate the emissions and fuel economy of light cars and light trucks. The complete FTP75 operating cycle takes 1874 seconds, with a theoretical driving distance of 17.77 km, an average speed of 34.12 km/h, and a maximum speed of 91.25 km/h. It includes three parts: cold start transient stage, steady state stage, and hot start transient stage. 3.根据权利要求1所述的一种考虑工况预测的燃料电池汽车强化学习能量管理方法,其特征在于,所述对综合行驶工况数据按照每段100s进行片段划分,通过提取工况的工况特征参数进行主成分分析的步骤包括:3. A fuel cell vehicle reinforcement learning energy management method considering operating condition prediction according to claim 1, characterized in that the step of dividing the comprehensive driving condition data into segments of 100 seconds each and performing principal component analysis by extracting the operating condition characteristic parameters of the operating condition comprises: 综合行驶工况数据设定为100s为一段;而工况特征参数反应每一个驾驶循环片段的特征信息;用平均速度、最大速度、平均加速度、最大加速度、平均减速度、最大减速度、怠速时间比、加速时间比、减速时间比、匀速时间比、速度标准差、加速度标准差、行驶里程、运行时间这14个参数对每一个驾驶循环片段的特征进行描述;The comprehensive driving condition data is set to 100s per segment; the condition characteristic parameters reflect the characteristic information of each driving cycle segment; the characteristics of each driving cycle segment are described by 14 parameters including average speed, maximum speed, average acceleration, maximum acceleration, average deceleration, maximum deceleration, idle time ratio, acceleration time ratio, deceleration time ratio, uniform speed time ratio, speed standard deviation, acceleration standard deviation, mileage, and running time; 对于100s具有14个工况特征参数的综合行驶工况数据,并降维到3维,主成分分析进行降维处理的步骤包括:For 100s of comprehensive driving condition data with 14 condition characteristic parameters, the dimension is reduced to 3 dimensions. The steps of principal component analysis for dimension reduction processing include: 1)、将原始数据按列组成100行14列矩阵X;1) Organize the original data into a 100-row 14-column matrix X; 2)、将X的每一行进行零均值化,即减去这一行的均值;2) Zero-mean each row of X, that is, subtract the mean of this row; 3)、求出协方差矩阵3) Find the covariance matrix ; 4)、求出协方差矩阵的特征值及对应的特征向量;4) Find the eigenvalues and corresponding eigenvectors of the covariance matrix; 5)、将特征向量按对应特征值大小从上到下按行排列成矩阵,取前3行组成矩阵 P;5) Arrange the eigenvectors into a matrix by row from top to bottom according to the corresponding eigenvalues, and take the first 3 rows to form the matrix P; 6)、即为降维到3维后的数据;6) That is, the data after dimension reduction to 3 dimensions; 在这个过程当中,为了保证降维后信息的完整性,所选取的主成分需满足累计贡献率达到80%以上,最后选取前3个贡献率最大的主成分,并且累计贡献率达到80%以上,足以涵盖绝大部分工况特征信息。In this process, in order to ensure the integrity of the information after dimensionality reduction, the selected principal components must satisfy the cumulative contribution rate of more than 80%. Finally, the top three principal components with the largest contribution rates are selected, and the cumulative contribution rate reaches more than 80%, which is sufficient to cover most of the operating condition characteristic information. 4.根据权利要求1所述的一种考虑工况预测的燃料电池汽车强化学习能量管理方法,其特征在于,所述通过K-means聚类方法对驾驶循环片段聚类的步骤包括:4. The fuel cell vehicle reinforcement learning energy management method considering operating condition prediction according to claim 1, characterized in that the step of clustering driving cycle segments by using the K-means clustering method comprises: 使用K-means聚类算法对驾驶循环片段进行分类,确定分类数量K=3,并为每类指定一个初始聚类中心点,通过计算每个样本点到聚类中心的欧几里得距离;Use K-means clustering algorithm to classify driving cycle segments, determine the number of categories K=3, and assign an initial cluster center point to each category, by calculating the Euclidean distance from each sample point to the cluster center; ; 按照距离最小原则将每个点划分到各类,并且完成第一次聚类后,对于每个聚类,重新计算其聚类中心,新的聚类中心是该聚类内所有数据点的均值,计算公式如下:Each point is divided into each category according to the minimum distance principle. After the first clustering is completed, for each cluster, its cluster center is recalculated. The new cluster center is the mean of all data points in the cluster. The calculation formula is as follows: ; 其中 ,是第个聚类的数据点集合,是该集合中数据点的数量,上式对各聚类点进行更新,重新分配,直到聚类中心趋于稳定为止。in, It is A set of data points that are clustered, is the number of data points in the set. The above formula updates and redistributes each cluster point until the cluster center tends to be stable. 5.根据权利要求4所述的一种考虑工况预测的燃料电池汽车强化学习能量管理方法,其特征在于,所述得到三种不同类型的驾驶循环片段集合后,对基于Bi-LSTM的工况识别模块以及基于马尔可夫的车速预测模块进行离线训练的步骤包括:5. According to the fuel cell vehicle reinforcement learning energy management method considering operating condition prediction according to claim 4, it is characterized in that after obtaining the three different types of driving cycle segment sets, the step of offline training the Bi-LSTM-based operating condition recognition module and the Markov-based vehicle speed prediction module comprises: Bi-LSTM由前向LSTM和后向LSTM组成,前向 LSTM 按照时间顺序处理输入序列,捕捉从过去到未来的信息,而后向 LSTM 则相反,从最晚的时间点到最早的时间点处理输入序列,捕捉从未来到过去的信息;这种双向结构每一个LSTM内部包含,输入门、遗忘门、输出门和单元状态;模型的输入为14个工况特征参数,输出为工况模式,分低速、中速和高速三种;LSTM的架构为:Bi-LSTM consists of forward LSTM and backward LSTM. Forward LSTM processes the input sequence in chronological order and captures information from the past to the future, while backward LSTM processes the input sequence from the latest time point to the earliest time point and captures information from the future to the past. Each LSTM in this bidirectional structure contains an input gate, a forget gate, an output gate, and a unit state. The input of the model is 14 operating condition feature parameters, and the output is the operating mode, which is divided into three types: low speed, medium speed, and high speed. The architecture of LSTM is: 遗忘门:遗忘门决定哪些信息被遗忘,其表达式为:Forget gate: The forget gate determines which information is forgotten. Its expression is: ; 其中,是sigmoid函数,是遗忘门的权重系数,表示前一时刻的隐藏状态和当前输入状态的拼接,其中是14个工况特征参数,为偏置项;in, is the sigmoid function, is the weight coefficient of the forget gate, Represents the hidden state at the previous moment and the current input state splicing, where There are 14 working condition characteristic parameters. is the bias term; 输入门:输入门决定哪些新信息需要被加入到细胞状态中;Input gate: The input gate determines what new information needs to be added to the cell state; ; ; 其中,是输入门的权重矩阵,根据输入的参数个数不同调整维度,是偏置项,是候选细胞状态;in, and is the weight matrix of the input gate, and the dimension is adjusted according to the number of input parameters. and is the bias term, is a candidate cell state; 单元状态更新:结合遗忘门和输入门的结果更新细胞状态;Unit state update: Update the cell state by combining the results of the forget gate and the input gate; ; 输出门:基于更新后的细胞状态决定输出什么值;Output gate: determines what value to output based on the updated cell state; ; ; 其中,是输出门的权重参数,是偏置项,是当前时刻的输出状态;in, is the weight parameter of the output gate, is the bias term, is the output state at the current moment; Bi-LSTM通过将LSTM层沿着时间轴前向和后向运行来计算双向隐藏状态,前向LSTM从序列的第一个元素到最后一个元素顺序计算,而后向LSTM则相反,这两个隐藏状态被连接在一起形成最终的双向隐藏状态,双向LSTM捕捉到序列中每个时间步之前和之后的上下文信息,从而提供更全面的特征表示;Bi-LSTM calculates bidirectional hidden states by running the LSTM layer forward and backward along the time axis. The forward LSTM calculates sequentially from the first element to the last element of the sequence, while the backward LSTM does the opposite. The two hidden states are concatenated together to form the final bidirectional hidden state. The bidirectional LSTM captures the contextual information before and after each time step in the sequence, thereby providing a more comprehensive feature representation. 在车速工况预测模块中,采用马尔科夫车速预测方法,工况识别出低速、中速、高速三种工况模式,每一种模式对应一个马尔科夫转移矩阵,再采用步骤S01中聚类的三种驾驶循环片段针对性地离线训练三个马尔可夫矩阵,设定预测步长为n,马尔科夫输出n步的预测车速信息;In the vehicle speed condition prediction module, the Markov vehicle speed prediction method is used to identify the three working conditions of low speed, medium speed and high speed. Each mode corresponds to a Markov transfer matrix. Then, the three driving cycle segments clustered in step S01 are used to train three Markov matrices offline in a targeted manner. The prediction step length is set to n, and Markov outputs the predicted vehicle speed information of n steps. 完成考虑工况识别的车速预测后,会输出一段p步预测车速序列。After completing the vehicle speed prediction considering the working condition identification, a p-step predicted vehicle speed sequence will be output. 6.根据权利要求1所述的一种考虑工况预测的燃料电池汽车强化学习能量管理方法,其特征在于,步骤S03中,DDPG算法基于Actor-Critic框架,DDPG算法中含有Actor和Critic网络,并且每个网络都有其对应的目标网络,DDPG算法中包括四个网络,分别是Actor网络,Critic网络,Target Actor网络和Target Critic网络,DDPG算法的更新过程,目标网络的更新方式以及引入目标网络的目的;6. A fuel cell vehicle reinforcement learning energy management method considering operating condition prediction according to claim 1, characterized in that, in step S03, the DDPG algorithm is based on the Actor-Critic framework, the DDPG algorithm contains Actor and Critic networks, and each network has its corresponding target network. The DDPG algorithm includes four networks, namely, Actor network , Critic Network , Target Actor Network and Target Critic Network , the update process of the DDPG algorithm, the update method of the target network and the purpose of introducing the target network; DDPG训练过程包括:The DDPG training process includes: 初始化critic网络和actor网络的参数Initialize the critic network and actor networks Parameters and ; 初始化对应的目标网络参数,Initialize the corresponding target network parameters, , ; 初始化经验回放D,设置容量为K,设置循环轮数为M轮;Initialize the experience playback D, set the capacity to K, and set the number of loop rounds to M; 根据当前的状态,输入到当前网络,根据当前网络的策略以及噪声选择动作,,产生动作后,得到奖励,并且进入到下一个状态,并储存状态动作对到经验回放D中;According to the current status , input to the current network, according to the current network strategy and noise Select the action, , generating action Afterwards, get rewarded , and enter the next state , and store state-action pairs Go to Experience Replay D; 从D中取出一个随机批量的,目标值Take a random batch from D , target value ; 其中是下一时刻价值函数,是下一时刻策略函数,是下一时刻网络权重参数;in is the value function at the next moment, is the next moment strategy function, , is the network weight parameter at the next moment; 更新critic参数,其损失为:Update the critic parameters, and its loss is: ; 更新actor网络:Update the actor network: ; 软更新目标网络:Soft update target network: . 7.根据权利要求1所述的一种考虑工况预测的燃料电池汽车强化学习能量管理方法,其特征在于,步骤S03中,融合工况识别和车速工况预测的DDPG深度强化学习能量管理策略;7. A fuel cell vehicle reinforcement learning energy management method considering operating condition prediction according to claim 1, characterized in that, in step S03, a DDPG deep reinforcement learning energy management strategy integrating operating condition identification and vehicle speed operating condition prediction is used; 置状态集合:Set the state collection: ; 为整车需求功率,为动力电池状态,FOH为燃料电池寿命; The power required for the vehicle, is the power battery status, FOH is the fuel cell life; 置动作集合:Set action set: ; 为燃料电池功率; is the fuel cell power; 置奖励函数:Set the reward function: ; R为奖励函数,为燃料电池氢气消耗速率;λ是通过车速预测部分输出的p步预测车速序列计算出来的等效因子调节系数,,在训练过程中通过选择合适的k值,达到最优的能量管理策略;训练完成后,在实际运行过程中λ通过其具体接受到的预测车速序列进行实时调整;其中为预测车速序列的车速标准差,为预测车速序列的平均车速;为动力电池输出功率;LHV为氢气低位热值;为燃料电池寿命退化值,为考虑燃料电池寿命退化的权重因子;R is the reward function, is the fuel cell hydrogen consumption rate; λ is the equivalent factor adjustment coefficient calculated by the p-step predicted speed sequence output by the speed prediction part, , in the training process, the optimal energy management strategy is achieved by selecting the appropriate k value; after the training is completed, in the actual operation process, λ is adjusted in real time according to the specific predicted vehicle speed sequence it receives; To predict the standard deviation of the speed series, To predict the average speed of the speed sequence; is the output power of the power battery; LHV is the lower heating value of hydrogen; is the fuel cell life degradation value, is the weight factor to consider the fuel cell lifetime degradation; 在实现基于DDPG能量管理的过程中,首先环境将状态输入到智能体的Actor当中,智能体选择相应的动作,获得实时奖励后过渡到后一刻的状态S’,并且将经验样本储存在样本池中,在样本池中的样本达到预设数量后去除最早的经验,DDPG算法不断的让Agent执行动作,收集与环境交互的数据,包括状态和奖励,更新网络参数,使Agent逐渐学会最优的动作策略,实现最优的能效优化。In the process of implementing DDPG-based energy management, the environment first inputs the state into the agent’s Actor. The agent selects the corresponding action, obtains real-time rewards, and then transitions to the next state S’. The experience samples are stored in the sample pool. When the number of samples in the sample pool reaches the preset number, the earliest experience is removed. The DDPG algorithm continuously lets the agent perform actions, collects data on interactions with the environment, including states and rewards, and updates network parameters, so that the agent gradually learns the optimal action strategy and achieves optimal energy efficiency optimization.
CN202411462619.4A 2024-10-19 2024-10-19 A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction Active CN118981620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411462619.4A CN118981620B (en) 2024-10-19 2024-10-19 A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411462619.4A CN118981620B (en) 2024-10-19 2024-10-19 A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction

Publications (2)

Publication Number Publication Date
CN118981620A true CN118981620A (en) 2024-11-19
CN118981620B CN118981620B (en) 2025-02-25

Family

ID=93453733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411462619.4A Active CN118981620B (en) 2024-10-19 2024-10-19 A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction

Country Status (1)

Country Link
CN (1) CN118981620B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119502719A (en) * 2025-01-13 2025-02-25 南京恒天领锐汽车有限公司 A driving and braking control method for a pure electric commercial vehicle driven by dual motors on front and rear axles
CN119653617A (en) * 2025-02-18 2025-03-18 深圳市盛鸿运科技有限公司 A method and device for preparing a new energy vehicle power circuit board
CN120450386A (en) * 2025-07-09 2025-08-08 江西五十铃汽车有限公司 A new energy vehicle energy control method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070112475A1 (en) * 2005-11-17 2007-05-17 Motility Systems, Inc. Power management systems and devices
WO2020165509A1 (en) * 2019-02-15 2020-08-20 Hutchinson Electric energy management system
KR102327413B1 (en) * 2021-02-25 2021-11-16 국민대학교산학협력단 Power managing apparatus and method for the same
CN117922382A (en) * 2023-06-05 2024-04-26 吉林大学 Fuel cell automobile energy management method based on map path planning
CN118386952A (en) * 2024-06-26 2024-07-26 合肥工业大学 A fuel cell vehicle energy management method and system based on operating condition identification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070112475A1 (en) * 2005-11-17 2007-05-17 Motility Systems, Inc. Power management systems and devices
WO2020165509A1 (en) * 2019-02-15 2020-08-20 Hutchinson Electric energy management system
KR102327413B1 (en) * 2021-02-25 2021-11-16 국민대학교산학협력단 Power managing apparatus and method for the same
CN117922382A (en) * 2023-06-05 2024-04-26 吉林大学 Fuel cell automobile energy management method based on map path planning
CN118386952A (en) * 2024-06-26 2024-07-26 合肥工业大学 A fuel cell vehicle energy management method and system based on operating condition identification

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119502719A (en) * 2025-01-13 2025-02-25 南京恒天领锐汽车有限公司 A driving and braking control method for a pure electric commercial vehicle driven by dual motors on front and rear axles
CN119653617A (en) * 2025-02-18 2025-03-18 深圳市盛鸿运科技有限公司 A method and device for preparing a new energy vehicle power circuit board
CN120450386A (en) * 2025-07-09 2025-08-08 江西五十铃汽车有限公司 A new energy vehicle energy control method and system

Also Published As

Publication number Publication date
CN118981620B (en) 2025-02-25

Similar Documents

Publication Publication Date Title
CN113034210B (en) Vehicle running cost evaluation method based on data driving scene
CN118981620B (en) A reinforcement learning energy management method for fuel cell vehicles considering operating condition prediction
Feng et al. Energy consumption prediction strategy for electric vehicle based on LSTM-transformer framework
Song et al. Multi-mode energy management strategy for fuel cell electric vehicles based on driving pattern identification using learning vector quantization neural network algorithm
CN112327168A (en) XGboost-based electric vehicle battery consumption prediction method
CN113715805B (en) A method of energy management based on rule fusion and deep reinforcement learning based on working condition identification
CN108819934A (en) A kind of power distribution control method of hybrid vehicle
Soo et al. Machine learning based battery pack health prediction using real-world data
Shi et al. A cloud-based energy management strategy for hybrid electric city bus considering real-time passenger load prediction
He et al. Adaptive energy management strategy for Extended Range Electric Vehicles under complex road conditions based on RF-IGWO and MGO algorithms
Lee et al. Learning to recognize driving patterns for collectively characterizing electric vehicle driving behaviors
Montazeri-Gh et al. Driving condition recognition for genetic-fuzzy HEV control
CN119428342A (en) A multifunctional controller for new energy vehicles and new energy vehicles
CN113552803A (en) Energy management method based on working condition identification
Hasib et al. Driving range prediction of electric vehicles: A machine learning approach
CN116946107A (en) Hybrid system mode decision and power distribution method under energy track following
Nguyen et al. Optimal energy management strategy based on driving pattern recognition for a dual-motor dual-source electric vehicle
Deptuła et al. Application of a decision classifier tree to evaluate energy consumption of an electric vehicle under real traffic conditions
CN120207123A (en) A method for predicting the remaining driving range of pure electric vehicles based on real vehicle data
CN119936665A (en) A lithium battery charging time prediction method, system, device and storage medium
CN118770009A (en) A fuel cell vehicle energy management method considering the preceding vehicle following
CN114897065B (en) Energy-saving driving strategy for high-speed trains based on unit and multivariate fusion prediction model
Li et al. Prediction of Low-Temperature Energy Consumption and Driving Range of Pure Electric Vehicles Based on the CatBoost Algorithm
Arun et al. Deep learning-based driving cycle development for Kinta district
Han et al. Transfer Deep Reinforcement Learning‐Based Energy Management Strategy for Plug‐In Hybrid Electric Heavy‐Duty Trucks under Segmented Usage Scenarios

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant