CN116306903A - A Robust Adversarial Training Framework for Multi-Agent Reinforcement Learning Energy Systems - Google Patents
A Robust Adversarial Training Framework for Multi-Agent Reinforcement Learning Energy Systems Download PDFInfo
- Publication number
- CN116306903A CN116306903A CN202211516697.9A CN202211516697A CN116306903A CN 116306903 A CN116306903 A CN 116306903A CN 202211516697 A CN202211516697 A CN 202211516697A CN 116306903 A CN116306903 A CN 116306903A
- Authority
- CN
- China
- Prior art keywords
- agent
- strategy
- attack
- training
- optimal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a robust countermeasure training framework for a multi-agent reinforcement learning energy system, which comprises the following components: constructing an anti-smart agent to generate an anti-attack and modeling as a random gaming system observable by the anti-attack portion; fixing the pre-trained harmful multi-agent strategy, and training an optimal deterministic countermeasure strategy to generate bounded disturbance; the optimal attack countermeasure strategy is fixed, and the robustness of the victim strategy under the optimal attacker is improved through countermeasure training. The beneficial effects of the invention are as follows: the invention models the resistance attack as an attack opponent based on single agent reinforcement learning, and learns to obtain the strongest attack strategy considering attack constraint. Mathematically, the problem is constructed as a countermarkov game and the performance of a multi-agent reinforcement learning-based integrated energy management system is improved by robust countertraining.
Description
Technical Field
The invention relates to the field of security defense of electric power systems, in particular to a robust countermeasure training framework for a multi-agent reinforcement learning energy system.
Background
With the development of socioeconomic and the increase of energy demand, electric power systems are undergoing a fundamental revolution in planning and operation from fossil fuels to clean energy. Under the background of rapid development of the energy Internet, the comprehensive energy system with multiple energy sources such as electricity, gas, heat, cold and the like coupled and coordinated can realize multi-energy complementation, promote the consumption of renewable energy sources, improve the energy utilization efficiency and relieve unbalanced supply and demand. Compared with the traditional power system, the energy flow of the comprehensive energy system is more complex, and the operation regulation and control of the comprehensive energy system relate to more complex load demands, supply devices and operation modes. The novel characteristics of high coupling of energy demand, supply and storage can cause the problems of improved system operation mode and dynamic characteristic complexity, aggravated uncertainty of the two sides of the source load, increased variable and dimension of a mathematical model of a simulation system, reduced safety stability margin and the like, so that the traditional comprehensive energy management method based on the mathematical model mechanism is difficult to meet the requirements of online evaluation and real-time control. Therefore, a data-driven comprehensive energy management method with multi-agent reinforcement learning as a core has been developed. With the integration of information and communication technologies, the safety and vulnerability problems of the comprehensive energy management system based on multi-agent reinforcement learning are more indiscriminate. The communication network of the comprehensive energy management system, including the monitoring and data acquisition network, the intelligent electric meter and other devices, is easy to be attacked by malicious network behaviours.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a robust countermeasure training framework for a multi-agent reinforcement learning energy system. The invention enhances the resistance of the comprehensive energy management system based on multi-agent reinforcement learning to the challenge attack through robust challenge training. Firstly, constructing an opponent intelligent body, wherein the aim is to model the system into a partial observable random game system for antagonism by formulating a fight attack and causing the worst performance of a control system; training the adversary agent to learn an optimal deterministic challenge strategy to generate bounded perturbations; and finally, robust countermeasure training is adopted for the damaged multi-agent reinforcement learning comprehensive energy management system so as to enhance the robustness of the model.
In a first aspect, a robust countermeasure training framework for a multi-agent reinforcement learning energy system is provided, comprising:
step 2, fixing a pre-trained harmful multi-agent strategy, and training an optimal deterministic countermeasure strategy to generate bounded disturbance;
and 3, fixing an optimal attack countermeasure strategy, and improving the robustness of the victim strategy under the optimal attacker through countermeasure training.
Preferably, step 1 includes:
step 1.1, expressing a comprehensive energy management system based on multi-agent reinforcement learning as a part of observable random game problem, wherein each agent controls a building, and the accumulated rewards of the whole team are maximized by optimizing strategies of all agents:
<N,S,{A i } i∈N ,P,{R i } i∈N ,γ,{O i } i∈N ,Z>
wherein N is the number of agents, S is the environmental state, A i Is the action space of the ith agent, { A i } i∈N Is a joint action space defined as a=a 1 ×…×A N The method comprises the steps of carrying out a first treatment on the surface of the P is S.times.A.times.S.delta.S is given action at any time tLower slave state s t State s to the next time t+1 t+1 State transition probabilities of (2);Is the ith agent slave(s) t ,a t ) To the next time state s t+1 Timely feeding back rewards; gamma is the discount factor; o (O) i Is the observation space of the ith agent, and the joint observation space is { O } i } i∈N Defined as o=o 1 ×…×O N The method comprises the steps of carrying out a first treatment on the surface of the Z is S×A → delta (O) is the joint observation O at any time t t E O in arbitrary action a t Under state s t Is a part of the observation probability;
at time t, each agent i is based on observationsBy policy->Select action->Then the environment moves to the next state, s, according to the state transition probability, P t+1 ~P(·|s t ,a t ) The method comprises the steps of carrying out a first treatment on the surface of the Each agent i gets a reward +.>And new local observations->
Step 1.2, introducing an opponent agent into the comprehensive energy management system, and modeling the system as a random game problem observable by an antagonism part by generating the worst performance of a model caused by the strongest antagonism attack:
<N,S,A adv ,{A i } i∈N ,P,{R i } i∈N ,R adv ,γ,{O i } i∈N ,Z>
where N is the number of victim agents, S is the environmental state, A adv And R is adv An attacker's action space and rewards function, respectively; a is that i Is the action space of the A-th victim agent, { A i } i∈N Is a joint action space defined as a=a 1 ×…×A N ;P:S×A adv XA X S → delta (S) is a given action at any time tAnd A adv Lower slave state s t To the next time state s t+1 State transition probabilities of (2);Is the ith agent slave(s) t ,a t ) To the next time state s t+1 Timely feeding back rewards; gamma is the discount factor; o (O) i Is the observation space of the ith agent, and the joint observation space is { O } i } i∈N Defined as o=o 1 ×…×O N The method comprises the steps of carrying out a first treatment on the surface of the Z is S×A → delta (O) is the joint observation O at any time t t E O in arbitrary action a t Under state s t Is a function of the observation probability of (a).
Preferably, step 2 includes:
step 2.1, fixing the strategy parameters of the pre-trained normal harmful multi-agent systemθ i Model parameters representing each agent strategy, training an anti-agent strategy u φ Phi is a policy parameter of an attack agent to simulate against an attack and threaten one of themThe attack generated by the intelligent agent is as follows:
wherein delta t Is the generated attack vector for a particular agent observation,is the observation of the agent to be attacked, B (o j ) Is a boundary constraint of the disturbance; the input of the compromised agent j is expressed as:
the victim policy makes decisions based on disturbance observations:
wherein the method comprises the steps ofIs the action made by the multi-agent comprehensive energy management system after being attacked;
step 2.2, fixing the harmful multi-agent system strategy pi θ Defines the rewarding function of the attacker as R adv =-∑R i Then its objective function is:
where J (θ, Φ) = Σr i The attack agent and the multi-agent comprehensive energy management system perform interactive training to generate an optimal attack strategy
Preferably, the steps areIn step 3, the optimal aggressor strategy trained in step 2.2 is fixedWherein phi is * Is a parameter of an optimal attack strategy, an attack vector is generated by utilizing the parameter and environment interaction, the robustness of a victim strategy under an optimal attacker is improved through resistance training, and an objective function is as follows:
where J (θ, P) = Σr i 。
In a second aspect, a robust countermeasure training apparatus for a multi-agent-oriented reinforcement learning energy system is provided, configured to execute the robust countermeasure training framework for a multi-agent-oriented reinforcement learning energy system according to the first aspect, including:
a construction module for constructing an challenge agent to generate a challenge attack and modeling as a random gaming system observable by the challenge portion;
the first fixing module is used for fixing the pre-trained harmful multi-agent strategy, and training an optimal deterministic countermeasure strategy to generate bounded disturbance;
the second fixing module is used for fixing the optimal anti-attack strategy and improving the robustness of the victim strategy under the optimal attacker through the anti-attack training.
The beneficial effects of the invention are as follows: the invention designs a robust countermeasure training framework for a multi-agent reinforcement learning energy system so as to cope with potential countermeasure attack. The antagonistic attack is modeled as an attack opponent based on single agent reinforcement learning, and the strongest attack strategy considering attack constraint is learned. Mathematically, the problem is constructed as a countermarkov game and the performance of a multi-agent reinforcement learning-based integrated energy management system is improved by robust countertraining.
Drawings
FIG. 1 is a flow chart of a robust countermeasure training framework for a multi-agent reinforcement learning energy system;
fig. 2 is a schematic structural diagram of a robust countermeasure training framework for a multi-agent reinforcement learning energy system.
Detailed Description
The invention is further described below with reference to examples. The following examples are presented only to aid in the understanding of the invention. It should be noted that it will be apparent to those skilled in the art that modifications can be made to the present invention without departing from the principles of the invention, and such modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.
In order to ensure the whole stable, reliable and efficient operation of the comprehensive energy management system based on multi-agent reinforcement learning and improve the robustness of the comprehensive energy management system against malicious network attack, the invention provides a robust countermeasure training framework for the multi-agent reinforcement learning energy system, which enhances the elasticity by using countermeasure training and has important significance for realizing the stable and safe operation of the community comprehensive energy management system.
In the following, an experiment based on a community-level integrated energy management system including nine buildings is taken as an example to describe how to implement robust enhancement of the multi-agent reinforcement learning integrated energy management system.
As shown in fig. 1, the present invention is a robust countermeasure training framework for a multi-agent reinforcement learning energy system, the method comprising the steps of:
(1) The comprehensive energy management system based on multi-agent reinforcement learning is expressed as a part of observable random game problem, each agent controls one building, and the accumulated rewards of the whole team are maximized by optimizing the strategies of all agents:
<N,S,{A i } i∈N ,P,{R i } i∈N ,γ,{O i } i∈N ,Z>
wherein N is the number of agents, S is the environmental state, A i Is the action space of the ith agent, { A i } i∈N Is a joint action space defined as a=a 1 ×…×A N The method comprises the steps of carrying out a first treatment on the surface of the P is S.times.A.times.S.delta.S is given action at any time tLower slave state s t State s to the next time t+1 t+1 State transition probabilities of (2);Is the ith agent slave(s) t ,α t ) To the next time state s t+1 Timely feeding back rewards; gamma is the discount factor; o (O) i Is the observation space of the ith agent, and the joint observation space is { O } i } i∈N Defined as o=o 1 ×…×O N The method comprises the steps of carrying out a first treatment on the surface of the Z is S×A → delta (O) is the joint observation O at any time t t E O under arbitrary action a t State s t Is a function of the observation probability of (a). At time t, each agent i is based on observations +.>By policy->Select action->Then, the environment moves to the next state according to the state transition probability P. Each agent i gets a reward +.>And new local observations->The whole process is iterated continuously, and a track about observations, actions and rewards can be obtained for each agent i:The purpose of agent i is to obtain a policy pi i To maximize the cumulative return on the discount, as shown in the following formula:
J=∑R i
Where-i is all agents within set N except agent i. In the cooperative environment, the comprehensive energy management system based on multi-agent reinforcement learning aims at optimizing agent strategy parameters theta= { theta 1 ,θ 2 ,…,θ N Maximizing cumulative team total prize J:
(2) As shown in fig. 2, an adversary agent is introduced into the multi-agent reinforcement learning based integrated energy management system, aiming to model this system as a random game problem observable in the resistant part by generating the worst performance of the model caused by the strongest resistant attack:
<N,S,A adv ,{A i } i∈N ,P,{R i } i∈N ,R adv ,γ,{O i } i∈N ,Z>
where N is the number of victim agents, S is the environmental state, A adv And R is adv An attacker's action space and rewards function, respectively; a is that i Is the action space of the ith victim agent, { A i } i∈N Is a joint action space defined as a=a 1 ×…×A N ;P:S×A adv XA X S → delta (S) is a given action at any time tAnd A adv Lower slave state s t State s to the next time t+1 t+1 State transition probabilities of (2);Is the ith agent slave(s) t ,a t ) To the next time state s t+1 Timely feeding back rewards; gamma is the discount factor; o (O) i Is the observation space of the ith agent, and the joint observation space is { O } i } i∈N Fixed, fixedMeaning o=o 1 ×…×O N The method comprises the steps of carrying out a first treatment on the surface of the Z is S×A → delta (O) is the joint observation O at any time t t E O under arbitrary action a t State s t Is a function of the observation probability of (a). Note that N, S, { a i } i∈N ,,γ,{O i } i∈N Z is consistent with the definition of the partially observable random game described above, but P, { R i } i∈N Is subjected to A adv Influence.
(3) Fixing pre-trained normal harmful multi-agent system strategy parameters(θ i Model parameters representing each agent-only strategy), training an agent-resistant strategy u φ (phi is the policy parameter of the attacking agent) to simulate a challenge against and threat one of the agents, which generates the attack as:
wherein delta t Is the generated attack vector for a particular agent observation,is the observation of the agent to be attacked, B (o j ) Is the boundary constraint of the disturbance. Then the input of the compromised agent j is:
the victim policy makes decisions based on disturbance observations:
wherein the method comprises the steps ofIs quiltAnd after attack, the multi-agent reinforcement learning comprehensive energy management system performs actions. If the resistive disturbance is within the physical constraints of the physical characteristics and amplitude ranges, such as steadily increasing inflexible energy demands and energy storage within capacity, the defense mechanism cannot detect. Thus, the resistive disturbance can be limited to B (o j ) Thus, the loopholes of the comprehensive energy management system based on multi-agent reinforcement learning are discovered.
(4) Fixed-pest multi-agent system strategy pi θ Defines the rewarding function of the attacker as R adv =-∑R i Then its objective function is:
where J (θ, Φ) = Σr i . The attack agent and the multi-agent comprehensive energy management system perform interactive training to generate an optimal attack strategyWherein phi is * Is a parameter of the optimal attack strategy. Compared with a random attack strategy for generating random noise, the attack effect on the multi-agent comprehensive energy management system is as follows:
TABLE 1
The cumulative ramp rate, average daily peak and maximum peak are metrics related to the demand load profile of the multi-agent integrated energy management system. The optimal attack can be found from the table, so that the cumulative climbing rate, average daily peak value and maximum peak value of the model are improved by 38.61%, 8.77% and 16.42%, the load demand curve of the multi-agent comprehensive energy management system is more concussive, the attack effect is better than that of random attack, and the vulnerability of the comprehensive energy management system is fully explored.
(5) Optimal attacker strategy obtained through fixed trainingThe attack vector is generated for a certain victim agent by using the method, the robustness of the multi-agent comprehensive energy management system under the optimal attacker is improved through the antagonism training, and the objective function is as follows:
wherein J (θ, φ) * )=∑R i
At the same time, a random attack strategy of random noise is also adopted for contrast training. The performance of the multi-agent comprehensive energy management system with different training modes under the attack resistance is as follows:
TABLE 2
Compared with the non-countermeasure training, the optimal attack countermeasure training can be found from the table, so that the accumulated climbing rate, average daily peak value and maximum peak value of the model are reduced by 13.24%, 4.78% and 6.96%, the load demand curve of the multi-agent comprehensive energy management system is flattened, better performance can be maintained even under the attack, and the robustness of the comprehensive energy management system to the attack is improved. Conversely, the model trained against random attacks does not maintain good performance.
In summary, the invention introduces a single agent reinforcement learning-based attack countermeasure, and interacts with a multi-agent reinforcement learning comprehensive energy management system to generate the strongest attack, wherein the attack is realized by interfering the observation of a certain harmful agent; and the trained optimal countermeasure attack strategy is fixed, the comprehensive energy management system for the reinforcement learning of the harmful multi-agent and the countermeasure training are carried out, and the elasticity of the comprehensive energy management system for the countermeasure attack is enhanced by learning the countermeasure experience, so that a robust control strategy is generated.
Claims (5)
1. A robust countermeasure training framework for a multi-agent reinforcement learning energy system, comprising:
step 1, constructing an anti-intelligent agent to generate an anti-attack and modeling as a random game system with observable anti-part;
step 2, fixing a pre-trained harmful multi-agent strategy, and training an optimal deterministic countermeasure strategy to generate bounded disturbance;
and 3, fixing an optimal attack countermeasure strategy, and improving the robustness of the victim strategy under the optimal attacker through countermeasure training.
2. The multi-agent reinforcement learning energy system-oriented robust countermeasure training framework of claim 1, wherein step 1 comprises:
step 1.1, expressing a comprehensive energy management system based on multi-agent reinforcement learning as a part of observable random game problem, wherein each agent controls a building, and the accumulated rewards of the whole team are maximized by optimizing strategies of all agents:
wherein N is the number of agents, S is the environmental state, A i Is the action space of the ith agent,is a joint action space defined as a=a 1 ×…×A N The method comprises the steps of carrying out a first treatment on the surface of the P is S.times.A.times.S.delta.S is given at any time t>Lower slave state s t State s to the next time t+1 t+1 State transition probabilities of (2);Is the ith agent slave(s) t ,a t ) To the next time state s t+1 Timely feeding back rewards; gamma is the discount factor; o (O) i Is the observation space of the ith agent, and the joint observation space is { O } i } i∈N Defined as o=o 1 ×…×O N The method comprises the steps of carrying out a first treatment on the surface of the Z is S×A → delta (O) is the joint observation O at any time t t E O in arbitrary action a t Under state s t Is a part of the observation probability;
at time t, each agent i is based on observationsBy policy->Select action->Then the environment moves to the next state, s, according to the state transition probability, P t+1 ~P(·|s t ,a t ) The method comprises the steps of carrying out a first treatment on the surface of the Each agent i gets a reward +.>And new local observations
Step 1.2, introducing an opponent agent into the comprehensive energy management system, and modeling the system as a random game problem observable by an antagonism part by generating the worst performance of a model caused by the strongest antagonism attack:
<W,S,A adv ,{A i } i∈N ,P,{R i } i∈w ,R adv ,γ,{O i } i∈N ,Z>
where N is the number of victim agents, S is the environmental state, A adv And R is adv An attacker's action space and rewards function, respectively; a is that i Is the action space of the ith victim agent,is a joint action space defined as a=a 1 ×…×A N ;P:S×A adv XA X S → delta (S) is given action +.>And A adv Lower slave state s t To the next time state s t+1 State transition probabilities of (2);Is the ith agent slave(s) t ,a t ) To the next time state s t+1 Timely feeding back rewards; gamma is the discount factor; o (O) i Is the observation space of the ith agent, and the joint observation space is { O } i } i∈N Defined as o=o 1 ×…×O N The method comprises the steps of carrying out a first treatment on the surface of the Z is S×A → delta (O) is the joint observation O at any time t t E O in arbitrary action a t Under state s t Is a function of the observation probability of (a).
3. The multi-agent reinforcement learning energy system-oriented robust countermeasure training framework of claim 2, wherein step 2 comprises:
step 2.1, fixing the strategy parameters of the pre-trained normal harmful multi-agent systemθ i Model parameters representing each agent strategy, training an anti-agent strategy u φ Phi is a policy parameter of an attacking agent to simulate a challenge against and threat one of the agents, and the generated challenge is:
wherein delta t Is the generated attack vector for a particular agent observation,is the observation of the agent to be attacked, B (o j ) Is a boundary constraint of the disturbance; the input of the compromised agent j is expressed as:
the victim policy makes decisions based on disturbance observations:
wherein the method comprises the steps ofIs the action made by the multi-agent comprehensive energy management system after being attacked;
step 2.2, fixing the harmful multi-agent system strategy pi θ Defines the rewarding function of the attacker as R adv =-∑R i Then its objective function is:
4. A multi-intelligence oriented system according to claim 3The robust countermeasure training framework of the body reinforcement learning energy system is characterized in that in the step 3, the optimal attacker strategy obtained by training in the step 2.2 is fixedWherein phi is * Is a parameter of an optimal attack strategy, an attack vector is generated by utilizing the parameter and environment interaction, the robustness of a victim strategy under an optimal attacker is improved through resistance training, and an objective function is as follows:
where J (θ, Φ) = Σr i 。
5. A multi-agent reinforcement learning energy system-oriented robust countermeasure training apparatus for executing the multi-agent reinforcement learning energy system-oriented robust countermeasure training framework of claim 1, comprising:
a construction module for constructing an challenge agent to generate a challenge attack and modeling as a random gaming system observable by the challenge portion;
the first fixing module is used for fixing the pre-trained harmful multi-agent strategy, and training an optimal deterministic countermeasure strategy to generate bounded disturbance;
the second fixing module is used for fixing the optimal anti-attack strategy and improving the robustness of the victim strategy under the optimal attacker through the anti-attack training.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211516697.9A CN116306903B (en) | 2022-11-30 | 2022-11-30 | Robust countermeasure training frame for multi-agent reinforcement learning energy system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211516697.9A CN116306903B (en) | 2022-11-30 | 2022-11-30 | Robust countermeasure training frame for multi-agent reinforcement learning energy system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN116306903A true CN116306903A (en) | 2023-06-23 |
| CN116306903B CN116306903B (en) | 2025-11-28 |
Family
ID=86785697
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211516697.9A Active CN116306903B (en) | 2022-11-30 | 2022-11-30 | Robust countermeasure training frame for multi-agent reinforcement learning energy system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116306903B (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118485282A (en) * | 2024-07-15 | 2024-08-13 | 华北电力大学 | Electric vehicle charging scheduling method and system based on robust reinforcement learning |
| CN119151235A (en) * | 2024-11-11 | 2024-12-17 | 四川大学 | Source-charge double-side energy storage collaborative scheduling method based on multiple agents |
Citations (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070087756A1 (en) * | 2005-10-04 | 2007-04-19 | Hoffberg Steven M | Multifactorial optimization system and method |
| WO2013176784A1 (en) * | 2012-05-24 | 2013-11-28 | University Of Southern California | Optimal strategies in security games |
| WO2016065055A1 (en) * | 2014-10-21 | 2016-04-28 | Ask Y, Llc | Platooning control via accurate synchronization |
| CN107888412A (en) * | 2016-11-08 | 2018-04-06 | 清华大学 | Multi-agent network finite time contains control method and device |
| CN108377238A (en) * | 2018-02-01 | 2018-08-07 | 国网江苏省电力有限公司苏州供电分公司 | Information network security of power system policy learning device and method based on Attack Defence |
| CN111461226A (en) * | 2020-04-01 | 2020-07-28 | 深圳前海微众银行股份有限公司 | Adversarial sample generation method, device, terminal and readable storage medium |
| CN112052456A (en) * | 2020-08-31 | 2020-12-08 | 浙江工业大学 | Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents |
| WO2021068638A1 (en) * | 2019-10-12 | 2021-04-15 | 中国海洋大学 | Interactive intenstive learning method that combines tamer framework and facial expression feedback |
| US20210166123A1 (en) * | 2019-11-29 | 2021-06-03 | NavInfo Europe B.V. | Method for training a robust deep neural network model |
| CN113031554A (en) * | 2021-03-12 | 2021-06-25 | 西北工业大学 | A fixed-time tracking consistency control method for second-order multi-agent systems |
| CN113282100A (en) * | 2021-04-28 | 2021-08-20 | 南京大学 | Unmanned aerial vehicle confrontation game training control method based on reinforcement learning |
| NL2025214B1 (en) * | 2019-11-29 | 2021-08-31 | Navinfo Europe B V | A method for training a robust deep neural network model |
| CN113485313A (en) * | 2021-06-25 | 2021-10-08 | 杭州玳数科技有限公司 | Anti-interference method and device for automatic driving vehicle |
| CN113822318A (en) * | 2021-06-29 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Adversarial training method, device, computer equipment and storage medium of neural network |
| CN114358141A (en) * | 2021-12-14 | 2022-04-15 | 中国运载火箭技术研究院 | A multi-agent reinforcement learning method for multi-combat unit collaborative decision-making |
| CN114638339A (en) * | 2022-03-10 | 2022-06-17 | 中国人民解放军空军工程大学 | Intelligent agent task allocation method based on deep reinforcement learning |
| CN114925850A (en) * | 2022-05-11 | 2022-08-19 | 华东师范大学 | Deep reinforcement learning confrontation defense method for disturbance reward |
| CN115291625A (en) * | 2022-07-15 | 2022-11-04 | 同济大学 | Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning |
| CN115392432A (en) * | 2022-07-21 | 2022-11-25 | 华东师范大学 | Extensible multi-agent reinforcement learning method in cooperation environment |
-
2022
- 2022-11-30 CN CN202211516697.9A patent/CN116306903B/en active Active
Patent Citations (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20070087756A1 (en) * | 2005-10-04 | 2007-04-19 | Hoffberg Steven M | Multifactorial optimization system and method |
| WO2013176784A1 (en) * | 2012-05-24 | 2013-11-28 | University Of Southern California | Optimal strategies in security games |
| WO2016065055A1 (en) * | 2014-10-21 | 2016-04-28 | Ask Y, Llc | Platooning control via accurate synchronization |
| CN107888412A (en) * | 2016-11-08 | 2018-04-06 | 清华大学 | Multi-agent network finite time contains control method and device |
| CN108377238A (en) * | 2018-02-01 | 2018-08-07 | 国网江苏省电力有限公司苏州供电分公司 | Information network security of power system policy learning device and method based on Attack Defence |
| WO2021068638A1 (en) * | 2019-10-12 | 2021-04-15 | 中国海洋大学 | Interactive intenstive learning method that combines tamer framework and facial expression feedback |
| US20210166123A1 (en) * | 2019-11-29 | 2021-06-03 | NavInfo Europe B.V. | Method for training a robust deep neural network model |
| NL2025214B1 (en) * | 2019-11-29 | 2021-08-31 | Navinfo Europe B V | A method for training a robust deep neural network model |
| CN111461226A (en) * | 2020-04-01 | 2020-07-28 | 深圳前海微众银行股份有限公司 | Adversarial sample generation method, device, terminal and readable storage medium |
| CN112052456A (en) * | 2020-08-31 | 2020-12-08 | 浙江工业大学 | Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents |
| CN113031554A (en) * | 2021-03-12 | 2021-06-25 | 西北工业大学 | A fixed-time tracking consistency control method for second-order multi-agent systems |
| CN113282100A (en) * | 2021-04-28 | 2021-08-20 | 南京大学 | Unmanned aerial vehicle confrontation game training control method based on reinforcement learning |
| CN113485313A (en) * | 2021-06-25 | 2021-10-08 | 杭州玳数科技有限公司 | Anti-interference method and device for automatic driving vehicle |
| CN113822318A (en) * | 2021-06-29 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Adversarial training method, device, computer equipment and storage medium of neural network |
| CN114358141A (en) * | 2021-12-14 | 2022-04-15 | 中国运载火箭技术研究院 | A multi-agent reinforcement learning method for multi-combat unit collaborative decision-making |
| CN114638339A (en) * | 2022-03-10 | 2022-06-17 | 中国人民解放军空军工程大学 | Intelligent agent task allocation method based on deep reinforcement learning |
| CN114925850A (en) * | 2022-05-11 | 2022-08-19 | 华东师范大学 | Deep reinforcement learning confrontation defense method for disturbance reward |
| CN115291625A (en) * | 2022-07-15 | 2022-11-04 | 同济大学 | Multi-unmanned aerial vehicle air combat decision method based on multi-agent layered reinforcement learning |
| CN115392432A (en) * | 2022-07-21 | 2022-11-25 | 华东师范大学 | Extensible multi-agent reinforcement learning method in cooperation environment |
Non-Patent Citations (7)
| Title |
|---|
| CHAOWEI XIAO ET AL.: "Characterizing Attacks on Deep Reinforcement Learning", 《ICLR 2019》, 31 December 2019 (2019-12-31), pages 1 - 20 * |
| LERREL PINTO ET AL.: "Robust Adversarial Reinforcement Learning", 《ARXIV》, 8 March 2017 (2017-03-08), pages 1 - 10 * |
| NESHAT ELHAMI FARD ET AL.: "Adversarial Attacks on Heterogeneous Multi-Agent Deep Reinforcement Learning System with Time-Delayed Data Transmission", 《JOURNAL SENSOR AND ACTUATOR NETWORKS》, vol. 11, no. 3, 9 August 2022 (2022-08-09), pages 1 - 25 * |
| XINLEI PAN ET AL.: "Characterizing Attacks on Deep Reinforcement Learning", 《AAMAS \'22: PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS》, 9 May 2022 (2022-05-09), pages 1010 - 1018 * |
| 景栋盛;杨钰;薛劲松;朱斐;吴文;: "基于最优初始值Q学习的电力信息网络防御策略学习算法", 《计算机与现代化》, no. 11, 15 November 2018 (2018-11-15), pages 18 - 22 * |
| 林彤等: "微机继电保护系统故障信息自动检测方法研究", 《电子设计工程》, vol. 28, no. 16, 18 August 2020 (2020-08-18), pages 87 - 91 * |
| 王赛男;: "信息安全领域中鲁棒的深度学习及其应用研究", 《智能计算机与应用》, vol. 9, no. 06, 1 November 2019 (2019-11-01), pages 111 - 117 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118485282A (en) * | 2024-07-15 | 2024-08-13 | 华北电力大学 | Electric vehicle charging scheduling method and system based on robust reinforcement learning |
| CN119151235A (en) * | 2024-11-11 | 2024-12-17 | 四川大学 | Source-charge double-side energy storage collaborative scheduling method based on multiple agents |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116306903B (en) | 2025-11-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| He et al. | Three-stage Stackelberg game enabled clustered federated learning in heterogeneous UAV swarms | |
| Zhao et al. | Modified cuckoo search algorithm to solve economic power dispatch optimization problems | |
| Cai et al. | Chaotic ant swarm optimization to economic dispatch | |
| Feng et al. | Robust federated deep reinforcement learning for optimal control in multiple virtual power plants with electric vehicles | |
| CN112862281A (en) | Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system | |
| CN116306903B (en) | Robust countermeasure training frame for multi-agent reinforcement learning energy system | |
| CN101908172B (en) | A kind of power market hybrid simulation method adopting multiple intelligent agent algorithms | |
| CN111275174A (en) | A Game-Oriented Radar Countermeasure Strategy Generation Method | |
| CN106712075A (en) | Peaking strategy optimization method considering safety constraints of wind power integration system | |
| Yang et al. | DISTRIBUTED OPTIMAL DISPATCH OF VIRTUAL POWER PLANT BASED ON ELM TRANSFORMATION. | |
| CN115293052A (en) | Power system active power flow online optimization control method, storage medium and device | |
| CN117441168A (en) | Methods and apparatus for adversarial attacks in deep reinforcement learning | |
| Niknam et al. | New self‐adaptive bat‐inspired algorithm for unit commitment problem | |
| Zhang et al. | An improved symbiosis particle swarm optimization for solving economic load dispatch problem | |
| CN113837654B (en) | Multi-objective-oriented smart grid hierarchical scheduling method | |
| Hassan et al. | Optimal power flow analysis considering renewable energy resources uncertainty based on an improved wild horse optimizer | |
| Ahmadian et al. | Price restricted optimal bidding model using derated sensitivity factors by considering risk concept | |
| CN120124859A (en) | A dynamic game comprehensive evaluation method for system combat capability based on deep learning | |
| CN117190405A (en) | An energy-saving optimization control method for dehumidification unit system based on reinforcement learning | |
| Chen et al. | A multi-factor evolutionary algorithm for solving the multi-tasking robust optimization problem on networked systems | |
| Zhi‐gang et al. | Robust DED based on bad scenario set considering wind, EV and battery switching station | |
| Lakshminarasimman et al. | Water wave optimization algorithm for solving multi-area economic dispatch problem | |
| Zheng et al. | A hybrid invasive weed optimization algorithm for the economic load dispatch problem in power systems | |
| Jing et al. | An open-ended learning framework for opponent modeling | |
| Yadav | Hybridization of particle swarm optimization with differential evolution for solving combined economic emission dispatch model for smart grid |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |