CN116578113A

CN116578113A - A method and system for aircraft intelligent cooperative confrontation decision-making based on reinforcement learning

Info

Publication number: CN116578113A
Application number: CN202310536062.3A
Authority: CN
Inventors: 黄操; 季玉龙; 周文涛; 王一; 王进林; 朱珑涛; 何杨
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-05-12
Filing date: 2023-05-12
Publication date: 2023-08-11

Abstract

The invention discloses a decision-making method and system for intelligent cooperative confrontation of aircraft based on reinforcement learning. The method includes observation value design: simulation modeling of flight dynamics, weapons, radar, etc. The action space of an aircraft, including the target aircraft number and the command value made by the four headings, the command action value includes the angle of attack, roll angle, throttle amount; reward function design: design inventory reward, distance reward/penalty and radar lock item; Reinforcement learning environment design: use training mode and application mode to carry out dynamic control of aircraft and opponents, and realize the data interface function of state, action and reward value; the invention customizes the intelligent cooperative confrontation decision system of aircraft, and the objective function is reasonable. After a certain amount of training, it has a significant effect, can ensure the effectiveness of the model and algorithm, and can be used to formulate a suitable countermeasure strategy for the aircraft.

Description

A method and system for aircraft intelligent cooperative confrontation decision-making based on reinforcement learning

技术领域technical field

本发明涉及飞行器智能协同对抗技术领域，特别是一种基于强化学习的飞行器智能协同对抗决策方法和系统。The invention relates to the technical field of aircraft intelligent cooperative confrontation, in particular to a decision-making method and system for aircraft intelligent cooperative confrontation based on reinforcement learning.

背景技术Background technique

随着军事技术不断地进步，低空防御系统已逐渐向天、空、海、潜、路综合一体化防御方向发展。这意味着未来空战的作战方式不再局限于单机作战，而逐渐转化为系统与系统、体系与体系之间的对抗。在此背景下，多飞行器协同作战的理念和战术不断发展，协同任务分配技术也成为研究者们日益关注的热点问题。任务分配指基于任务需求、战场环境、目标配置等信息，满足一定的约束条件，建立使整体作战效益最高的分配方案。合适的任务分配方案在多飞行器协同作战中起着关键作用。With the continuous advancement of military technology, the low-altitude defense system has gradually developed towards the integrated defense of sky, air, sea, submarine and road. This means that the combat method of future air combat is no longer limited to stand-alone combat, but gradually transformed into a confrontation between systems and systems, systems and systems. In this context, the concept and tactics of multi-aircraft cooperative operations continue to develop, and the technology of cooperative task allocation has become a hot issue that researchers are increasingly concerned about. Task allocation refers to the establishment of an allocation plan that maximizes the overall operational efficiency based on information such as task requirements, battlefield environment, and target configuration, and satisfies certain constraints. Appropriate task allocation scheme plays a key role in multi-aircraft cooperative operations.

实现单方数量4架(含)以内的飞行器集群不同数量的定制化对抗，其中机型可具体指定。为了使分布式实时仿真系统达到逼真的仿真效果，在系统内部，往往不仅需要对各种数据模型进行实时解算，而且需要一个延迟时间极低的确定性网络在系统之间传递数据，这样才能让各个子系统之间协调一致地工作。传统上，分布式环境仿真一般使用“高速度以太网+上下位机”的解决方案来满足这两方面的需求。受TCP/IP协议所限，传统的以太网并不能满足各实时仿真子系统间实时、确定地传输数据的需求。尽管可以采取一些措施(如提高网络速度、降低网络负荷等)来降低延迟，但是仍然很难从根本上解决以太网不具有实时性和确定性的固有缺陷，并且这样做还会增加额外的成本。Realize the customized confrontation of different numbers of aircraft clusters within 4 (inclusive) of each side, and the aircraft type can be specified. In order to make the distributed real-time simulation system achieve a realistic simulation effect, it is often necessary not only to solve various data models in real time in the system, but also to transfer data between systems with a deterministic network with extremely low delay time. Let the various subsystems work in harmony. Traditionally, distributed environment simulation generally uses the solution of "high-speed Ethernet + upper and lower computer" to meet the needs of these two aspects. Limited by the TCP/IP protocol, the traditional Ethernet cannot meet the needs of real-time and deterministic data transmission between real-time simulation subsystems. Although some measures (such as increasing network speed, reducing network load, etc.) can be taken to reduce delay, it is still difficult to fundamentally solve the inherent defects that Ethernet does not have real-time and deterministic, and doing so will increase additional costs .

JSBSim模型是国外开发的一种通用的飞行动力学模型，可进行多种机型仿真，实时性高，符合本项目对飞行动力学模型的要求。JSBSim是一个开源的跨平台六自由度非线性飞行动力学模型。它采用面向对象的C++语言编写，支持不同类型航空或航天飞行器的动力学建模。该模型中的飞行器动力学特性由可扩展标记语言表述，不必编译和链接代码就可以建立自己的六自由度飞行器模型并进行仿真。The JSBSim model is a general-purpose flight dynamics model developed abroad. It can simulate various aircraft types and has high real-time performance, which meets the requirements of this project for the flight dynamics model. JSBSim is an open source cross-platform six-degree-of-freedom nonlinear flight dynamics model. It is written in object-oriented C++ language and supports dynamic modeling of different types of aviation or aerospace vehicles. The aircraft dynamics in the model are expressed by the extensible markup language, without compiling and linking the code, you can build your own six-degree-of-freedom aircraft model and simulate it.

MAPPO算法是将PPO算法应用于多智能体任务的变种，同样也是采用ActorCritic架构。不同的是在Actor部分，为了进一步降低优势函数的方差，使用泛化优势估计函数代替。使用类似TD的方式对优势函数进行估计，对其方差和偏差进行平衡，能够在一定偏差的情况下显著降低评估的方差。The MAPPO algorithm is a variant that applies the PPO algorithm to multi-agent tasks, and also uses the ActorCritic architecture. The difference is that in the Actor part, in order to further reduce the variance of the advantage function, the generalized advantage estimation function is used instead. Estimate the advantage function in a way similar to TD, and balance its variance and bias, which can significantly reduce the variance of the evaluation under a certain bias.

发明内容Contents of the invention

为解决现有技术中存在的上述问题本发明提供了一种基于强化学习的飞行器智能协同对抗决策方法和系统，能够保证模型和算法的有效性，可用于飞行器制定合适的对抗策略。具体方案如下：In order to solve the above-mentioned problems in the prior art, the present invention provides a method and system for aircraft intelligent cooperative confrontation decision-making based on reinforcement learning, which can ensure the effectiveness of models and algorithms, and can be used for aircraft to formulate appropriate countermeasures. The specific plan is as follows:

一种基于强化学习的飞行器智能协同对抗决策方法，包括以下步骤：A method for intelligent cooperative confrontation decision-making of aircraft based on reinforcement learning, comprising the following steps:

步骤1：观测值设计：基于飞行动力学对不同机型以及其武器、雷达进行仿真建模；飞行器获取的信息包括：自身位置信息、自身与敌方飞行器相对位置关系、自身飞行器的速度、自身飞行器与敌方飞行器的速度差距；Step 1: Observation value design: Simulate and model different models, their weapons, and radars based on flight dynamics; the information obtained by the aircraft includes: its own position information, the relative position relationship between itself and the enemy aircraft, the speed of its own aircraft, its own The speed difference between the aircraft and the enemy aircraft;

步骤2：动作空间设计：设计每架飞行器的动作空间，包括目标飞行器编号和四个航向做出的指令值，所述指令动作值包括迎角、滚转角、油门量，写为如下形式：Step 2: Action space design: Design the action space of each aircraft, including the target aircraft number and the command values made by the four headings. The command action values include angle of attack, roll angle, and throttle, and are written in the following form:

a＝[target,x_t,y_t,z_t,v_t]a＝[target,x _t ,y _t ,z _t ,v _t ]

其中，target表示该机选择的目标飞机的编号，x_t,y_t,z_t,v_t分别表示智能体在四个航迹维度上做出的指令值；Among them, target represents the number of the target aircraft selected by the aircraft, and x _t , y _t , z _t , v _t represent the command values made by the agent on the four track dimensions respectively;

步骤3：回报函数设计：设计存货奖励、距离奖励/惩罚和雷达锁定项，回报函数写为如下形式：Step 3: Reward function design: Design inventory rewards, distance rewards/penalties and radar locking items, and the reward function is written as follows:

r＝ω_αr_s+ω_βr_d+ω_γr_r r＝ω _α r _s +ω _β r _d +ω _γ r _r

其中，r_s为存活奖励部分，r_d为距离奖励/惩罚项，r_r为雷达锁定项；ω_α、ω_β和ω_γ为各部分的比例；Among them, r _s is the survival reward part, r _d is the distance reward/penalty item, r _r is the radar lock item; ω _α , ω _β and ω _γ are the proportions of each part;

步骤4：强化学习环境设计：采用训练模式和应用模式进行飞行器及对抗方的动态控制，并实现状态、动作和奖励值的数据接口功能；Step 4: Reinforcement learning environment design: Use training mode and application mode to dynamically control the aircraft and opponents, and realize the data interface functions of state, action and reward value;

训练回合由训练步长组成，每个训练回合包含有限训练步长；A training round consists of training steps, and each training round contains a finite number of training steps;

智能体将状态信息作为深度神经网络的输入，经过运算后生成动作；The agent uses the state information as the input of the deep neural network, and generates actions after calculation;

动作经过格式转换后，形成飞行器可执行的指令，发送至环境中。After the action is format converted, it forms an executable command for the aircraft and sends it to the environment.

进一步的，对雷达进行仿真建模用于对空/空功能中的空中拦截与空中格斗进行仿真，具体包括：Further, the simulation modeling of the radar is used to simulate the air interception and air combat in the air/air function, including:

步骤1.1：雷达数据处理建模Step 1.1: Radar Data Processing Modeling

步骤a：对目标检测点迹信息进行数据预处理；Step a: Perform data preprocessing on the target detection point trace information;

步骤b：进入航迹管理模块，判断此点迹信息为真实的新目标，则开辟一条新航迹；若此时点迹能够跟已有航迹信息关联上，则成为稳定运动点迹；Step b: Enter the track management module, judge that the point track information is a real new target, then open a new track; if the point track can be associated with the existing track information at this time, it will become a stable motion point track;

步骤c：将准目标在球坐标系下的距离、方位、俯仰角信息反转换到直角坐标系下的三向位置坐标，从而进行滤波和预测，并将滤波结果发送给输出接口；Step c: Reversely convert the distance, azimuth, and pitch angle information of the quasi-target in the spherical coordinate system to the three-way position coordinates in the Cartesian coordinate system, thereby performing filtering and prediction, and sending the filtering result to the output interface;

步骤d：若目标丢失一段时间，则判为航迹终结，界面清空；Step d: If the target is lost for a period of time, it will be judged as the end of the track, and the interface will be cleared;

步骤1.2：数据预处理Step 1.2: Data Preprocessing

将雷达系统整体设计中的发射机模块、接收机模块和目标处理模块合并简化处理；对于波束的收发，由雷达系统的扫描范围、发射功率、目标距离，加上目标雷达散射截面积，确定在未收电子干扰，天气晴好，且目标落在无杂波区情况下的理论最大可探测距离。Combine the transmitter module, receiver module and target processing module in the overall design of the radar system to simplify the processing; for the transmission and reception of the beam, it is determined by the scanning range of the radar system, the transmission power, the target distance, and the target radar scattering cross-sectional area. The theoretical maximum detectable distance under the condition of no electronic jamming, fine weather, and target falling in the clutter-free area.

更进一步的，所述航迹管理模块进行航迹关联具体为：Further, the track association performed by the track management module is specifically:

计算当前探测到的目标点信息是否落进已有航迹上次预测到此刻的点迹为中心的设定范围内：Calculate whether the currently detected target point information falls within the set range centered on the point track from the last prediction of the existing track to this moment:

1)若目标检测模块检测到目标点迹，且检测到的目标点迹与已经建立的航迹关联失败，则认定其为新目标，且当同时雷达能连续两次对该目标点迹关联成功，则进入航迹起始；1) If the target detection module detects the target track, and the detected target track fails to be associated with the established track, it is considered to be a new target, and at the same time the radar can successfully associate the target track twice consecutively , then enter the track start;

2)若目标检测模块探测到目标信息，且准目标信息与已建立好的航迹关联失败，且在后面的一段时间里，机载雷达探测到的目标点迹与之前建立好的航迹没有关联成功，则判为虚警，进行航迹的终结；2) If the target detection module detects the target information, and the association between the quasi-target information and the established track fails, and in the following period of time, the target point track detected by the airborne radar is different from the previously established track. If the association is successful, it will be judged as a false alarm and the track will be terminated;

3)若目标检测模块没有检测到目标点迹，此时航迹关联失败，并安排下一时刻要执行小搜事件，当雷达执行小搜没有检测到丢失目标，则进行航迹终结；3) If the target detection module does not detect the target point track, the track association fails at this time, and arrange to execute the small search event at the next moment. When the radar performs the small search and does not detect the lost target, the track will be terminated;

4)若目标检测模块探测到目标点迹，且检测到的目标点迹与已建立的航迹在距离、方位、俯仰三维均关联成功，则判断该点迹是此航迹的新的观察点，即要进行航迹的维持。4) If the target detection module detects the target point track, and the detected target point track is successfully associated with the established track in three dimensions of distance, azimuth, and pitch, then it is judged that the point track is a new observation point of this track , that is, to maintain the track.

一种基于强化学习的飞行器智能协同对抗决策系统，包括上层架构和下层架构；An aircraft intelligent cooperative confrontation decision-making system based on reinforcement learning, including an upper-layer architecture and a lower-layer architecture;

上层架构包括导调控制仿真节点、战术指挥仿真节点、战场环境管理节点、战术推演节点和战术仿真器，各仿真节点间通过DIS网络进行数据通讯与交互；The upper layer structure includes guidance and control simulation nodes, tactical command simulation nodes, battlefield environment management nodes, tactical deduction nodes and tactical simulators, and data communication and interaction between each simulation node through the DIS network;

下层架构位于单台战术仿真器内，通过混合实时通讯网络，将仿真器内的火控解算模块、飞控解算模块、视景解算模块、视景显示模块、仪表解算模块多功能显示模块和设备控制与采集模块连接起来。The lower layer structure is located in a single tactical simulator. Through the hybrid real-time communication network, the fire control calculation module, flight control calculation module, visual calculation module, visual display module, and instrument calculation module in the simulator are multifunctional The display module and the device control are connected with the acquisition module.

进一步的，further,

1)所述导调控制仿真节点是整个系统的管理和监控中心，用于协调和控制整个系统的运行，监视系统状态并记录和回放数据以进行评估；其通过DIS网络与其他仿真节点进行数据交互，包括指令下达、状态查询和数据传输；1) The pilot control simulation node is the management and monitoring center of the entire system, used to coordinate and control the operation of the entire system, monitor the system status and record and playback data for evaluation; it communicates with other simulation nodes through the DIS network Interaction, including order issuing, status inquiry and data transmission;

2)所述战术指挥仿真节点负责飞行器的指挥和协调，用于实现飞行器之间的通信和协作，确保团队合作并实现指定的任务目标；其通过DIS网络接收来自导调控制仿真节点的指令，向战场环境管理节点和战术推演节点发送指令，并从飞行器中接收数据以更新状态；2) The tactical command simulation node is responsible for the command and coordination of the aircraft, and is used to realize the communication and collaboration between the aircraft, to ensure teamwork and to achieve the specified mission objectives; it receives instructions from the guidance control simulation node through the DIS network, Send instructions to the battlefield environment management node and tactical deduction node, and receive data from the aircraft to update the status;

3)所述战场环境管理节点通过建立JSBSim动力学模型搭建仿真空战环境，负责对整个战场环境进行管理和监控，用于实现环境建模和仿真，以及在战场上定位和跟踪飞行器的位置；其通过DIS网络接收来自导调控制仿真节点的指令，更新战场环境信息，并将其发送给战术推演节点；3) The battlefield environment management node builds a simulated air combat environment by establishing a JSBSim dynamic model, and is responsible for managing and monitoring the entire battlefield environment, for realizing environmental modeling and simulation, and positioning and tracking the position of the aircraft on the battlefield; Receive instructions from the guidance control simulation node through the DIS network, update the battlefield environment information, and send it to the tactical deduction node;

4)所述战术推演节点负责战术推演和规划，用于收集来自其他节点的信息并对其进行分析，制定战术策略和规划航线；其通过DIS网络接收来自战场环境管理节点和战术指挥仿真节点的信息，分析这些信息并产生相应的行动计划；4) The tactical deduction node is responsible for tactical deduction and planning, and is used to collect information from other nodes and analyze it, formulate tactical strategies and plan routes; it receives information from battlefield environment management nodes and tactical command simulation nodes through the DIS network information, analyze it and generate a corresponding action plan;

5)所述战术仿真器负责模拟飞行器的行为，用于通过模拟来预测飞行器的行为和性能，以便指导飞行器的行动；其通过DIS网络接收来自战场环境管理节点和战术推演节点的信息，并基于这些信息模拟飞行器的行为。5) The tactical simulator is responsible for simulating the behavior of the aircraft, and is used to predict the behavior and performance of the aircraft through simulation, so as to guide the actions of the aircraft; it receives information from the battlefield environment management node and the tactical deduction node through the DIS network, and based on This information simulates the behavior of the aircraft.

更进一步的，further more,

1)所述火控解算模块负责计算飞行器的火控数据，包括导弹发射方位角和仰角、目标距离、弹道修正；此模块接收来自飞控解算模块和视景解算模块的数据，通过计算产生相应的火控数据，并将其发送给飞控解算模块；1) The fire control calculation module is responsible for calculating the fire control data of the aircraft, including missile launch azimuth and elevation angle, target distance, ballistic correction; this module receives data from the flight control calculation module and the visual scene calculation module, through Calculate and generate corresponding fire control data, and send it to the flight control calculation module;

2)所述飞控解算模块块负责计算飞行器的飞行控制数据，包括飞行速度、高度和姿态；此模块接收来自火控解算模块、视景解算模块和仪表解算模块的数据，通过计算产生相应的飞行控制数据，并将其发送给设备控制与采集模块；2) The flight control calculation module block is responsible for calculating the flight control data of the aircraft, including flight speed, height and attitude; this module receives data from the fire control calculation module, the visual scene calculation module and the instrument calculation module, through Calculate and generate corresponding flight control data, and send it to the equipment control and acquisition module;

3)所述视景解算模块负责计算飞行器的场景渲染，此模块接收来自火控解算模块、飞控解算模块和设备控制与采集模块的数据，通过计算产生相应的图像数据，并将其发送给视景显示模块；3) The visual scene calculation module is responsible for calculating the scene rendering of the aircraft. This module receives data from the fire control calculation module, the flight control calculation module and the equipment control and acquisition module, generates corresponding image data through calculation, and It is sent to the visual display module;

4)所述视景显示模块负责将视景解算模块产生的视觉数据以图像形式显示出来，此模块接收来自视景解算模块的数据，并将其渲染为可视化的图像；4) The visual display module is responsible for displaying the visual data generated by the visual calculation module in the form of an image. This module receives data from the visual calculation module and renders it as a visualized image;

5)所述仪表解算模块负责计算飞行器的各种仪表数据，包括速度、高度和姿态；此模块接收来自飞控解算模块的数据，通过计算产生相应的仪表数据，并将其发送给多功能显示模块；5) The instrument calculation module is responsible for calculating various instrument data of the aircraft, including speed, altitude and attitude; this module receives data from the flight control calculation module, generates corresponding instrument data through calculation, and sends it to multiple Function display module;

6)所述多功能显示模块负责显示仪表解算模块产生的仪表数据，以及其他与飞行器相关的数据，包括火控数据、任务信息、电池状态；此模块接收来自仪表解算模块和设备控制与采集模块的数据，并将其渲染为可视化的信息；6) The multifunctional display module is responsible for displaying the instrument data generated by the instrument calculation module, and other data related to the aircraft, including fire control data, mission information, and battery status; this module receives information from the instrument calculation module and equipment control and Collect module data and render it as visualized information;

7)所述设备控制与采集模块负责与飞行器各个设备进行通讯和数据采集。7) The device control and acquisition module is responsible for communication and data acquisition with various devices of the aircraft.

更进一步的，在所述上层架构和下层架构中，对以下四种技术进行有机综合，形成一套基于HLA与混合实时网络的仿真体系结构：Furthermore, in the above-mentioned upper-layer architecture and lower-layer architecture, the following four technologies are organically integrated to form a simulation architecture based on HLA and hybrid real-time network:

1)利用DIS分布式管理、时间推进机制及负载平衡控制技术1) Using DIS distributed management, time advancement mechanism and load balancing control technology

在系统的上层架构中，通过DIS网络实现分布式管理和数据通信，使各仿真节点之间能够高效地协同工作；同时，系统利用时间推进机制来确保仿真结果的准确性和同步性，并通过负载平衡控制技术来保证系统的稳定性和可靠性；In the upper layer structure of the system, the distributed management and data communication are realized through the DIS network, so that the simulation nodes can work together efficiently; at the same time, the system uses the time advancement mechanism to ensure the accuracy and synchronization of the simulation results, and through Load balance control technology to ensure the stability and reliability of the system;

2)利用反射内存网的高实时性特性及确定性延迟2) Utilize the high real-time characteristics and deterministic delay of the reflective memory network

在系统的下层架构中，利用反射内存网技术实现高实时性和确定性延迟，使得各模块之间能够快速、准确地进行数据交互和协同工作；In the underlying architecture of the system, reflective memory network technology is used to achieve high real-time and deterministic delays, enabling fast and accurate data interaction and collaborative work between modules;

3)利用RTX的精确时钟及抢占式任务调度机制3) Use RTX's precise clock and preemptive task scheduling mechanism

在系统的下层架构中，利用RTX实时操作系统的精确时钟和抢占式任务调度机制，使得系统能够对任务进行精细的控制和调度，从而确保各模块之间的数据交互和协同工作的高效性和准确性；In the lower layer architecture of the system, the precise clock and preemptive task scheduling mechanism of the RTX real-time operating system are used to enable the system to perform fine control and scheduling of tasks, thereby ensuring the efficiency of data interaction and collaborative work between modules. accuracy;

4)利用CAN总线的数据通信机制4) Using the data communication mechanism of CAN bus

在系统的下层架构中，利用CAN总线的数据通信机制，实现各模块之间的高效数据传输和通信，从而保证系统的稳定性和可靠性。In the underlying architecture of the system, the CAN bus data communication mechanism is used to realize efficient data transmission and communication between modules, thus ensuring the stability and reliability of the system.

更进一步的，所述JSBSim动力学模型的基本特征：包括翼展、弦长、机翼面积、飞行员眼位、气动力参考点、重心位置、转动惯量、惯性积、前起和主起接地点位置和发动机推力线、起落架模型。Further, the basic features of the JSBSim dynamic model: including wingspan, chord length, wing area, pilot eye position, aerodynamic reference point, center of gravity position, moment of inertia, product of inertia, front start and main touchdown point Location and engine thrust lines, landing gear model.

本发明基于强化学习的方法，定制飞行器智能协同对抗决策系统，目标函数合理，经过一定的训练之后具有显著的效果，能够保证模型和算法的有效性，可用于飞行器制定合适的对抗策略。The present invention is based on the reinforcement learning method, customizes the aircraft intelligent cooperative confrontation decision-making system, the objective function is reasonable, has a remarkable effect after certain training, can ensure the validity of the model and algorithm, and can be used for the aircraft to formulate a suitable confrontation strategy.

附图说明Description of drawings

图1为本发明中飞行器战术对抗仿真系统基本体系结构。Fig. 1 is the basic architecture of the aircraft tactical confrontation simulation system in the present invention.

具体实施方式Detailed ways

为更加详细解释本发明的特点和技术内容，以下结合附图对本发明进行阐述，此处所属的特点和技术内容仅用于说明和解释本发明，并不用于限制本发明。本领域的技术人员可以对前述各实例的技术方案依据应用进行修改，但这种修改并不使该技术方案的本质仍处于本公开实例的范围。In order to explain the features and technical content of the present invention in more detail, the present invention is described below in conjunction with the accompanying drawings. The features and technical content here are only used to illustrate and explain the present invention, and are not intended to limit the present invention. Those skilled in the art may modify the technical solutions of the foregoing examples according to applications, but such modifications do not make the essence of the technical solutions still within the scope of the disclosed examples.

如图1所示，本发明提供一种基于强化学习的飞行器智能协同对抗决策系统，包括：As shown in Fig. 1, the present invention provides a kind of aircraft intelligent cooperative confrontation decision-making system based on reinforcement learning, including:

1、上层架构：包括导调控制仿真节点、战术指挥仿真节点、战场环境管理节点、战术推演节点、战术仿真器等，这些仿真节点间通过DIS网络进行数据通讯与交互。1. Upper layer structure: including guidance and control simulation nodes, tactical command simulation nodes, battlefield environment management nodes, tactical deduction nodes, tactical simulators, etc. These simulation nodes communicate and interact with each other through the DIS network.

(1)导调控制仿真节点：导调控制仿真节点是整个系统的管理和监控中心。它的主要作用是协调和控制整个系统的运行，监视系统状态并记录和回放数据以进行评估。此节点通过DIS网络与其他仿真节点进行数据交互，包括指令下达、状态查询、数据传输等。(1) Tuning control simulation node: The tuning control simulation node is the management and monitoring center of the entire system. Its main role is to coordinate and control the operation of the entire system, monitor system status and record and playback data for evaluation. This node performs data interaction with other simulation nodes through the DIS network, including command issuance, status query, data transmission, etc.

(2)战术指挥仿真节点：战术指挥仿真节点负责飞行器的指挥和协调。它的主要作用是实现飞行器之间的通信和协作，确保团队合作并实现指定的任务目标。此节点通过DIS网络接收来自导调控制仿真节点的指令，向战场环境管理节点和战术推演节点发送指令，并从飞行器中接收数据以更新状态。(2) Tactical command simulation node: The tactical command simulation node is responsible for the command and coordination of aircraft. Its main role is to enable communication and collaboration between aircraft, ensuring teamwork and achieving assigned mission objectives. This node receives instructions from the guidance control simulation node through the DIS network, sends instructions to the battlefield environment management node and tactical deduction node, and receives data from the aircraft to update the status.

(3)战场环境管理节点：建立JSBSim动力学模型搭建仿真空战环境，负责对整个战场环境进行管理和监控。它的主要作用是实现环境建模和仿真，以及在战场上定位和跟踪飞行器的位置。此节点通过DIS网络接收来自导调控制仿真节点的指令，更新战场环境信息，并将其发送给战术推演节点。(3) Battlefield environment management node: establish a JSBSim dynamic model to build a simulated air combat environment, and is responsible for managing and monitoring the entire battlefield environment. Its main role is to enable environmental modeling and simulation, as well as to locate and track the position of aircraft on the battlefield. This node receives instructions from the guidance control simulation node through the DIS network, updates the battlefield environment information, and sends it to the tactical deduction node.

(4)战术推演节点：战术推演节点负责战术推演和规划。它的主要作用是收集来自其他节点的信息并对其进行分析，制定战术策略和规划航线。此节点通过DIS网络接收来自战场环境管理节点和战术指挥仿真节点的信息，分析这些信息并产生相应的行动计划。(4) Tactical deduction node: The tactical deduction node is responsible for tactical deduction and planning. Its main role is to collect information from other nodes and analyze it, formulate tactical strategies and plan routes. This node receives information from battlefield environment management nodes and tactical command simulation nodes through the DIS network, analyzes the information and generates corresponding action plans.

(5)战术仿真器：战术仿真器节点负责模拟飞行器的行为。它的主要作用是通过模拟来预测飞行器的行为和性能，以便更好地指导飞行器的行动。此节点通过DIS网络接收来自战场环境管理节点和战术推演节点的信息，并基于这些信息模拟飞行器的行为。(5) Tactical simulator: The tactical simulator node is responsible for simulating the behavior of the aircraft. Its main role is to predict the behavior and performance of the aircraft through simulation in order to better guide the actions of the aircraft. This node receives information from the battlefield environment management node and the tactical deduction node through the DIS network, and simulates the behavior of the aircraft based on these information.

2、下层架构：位于单台战术仿真器内，通过使用本文提出的混合实时通讯网络，将仿真器内的火控解算模块、飞控解算模块、视景解算模块、视景显示模块、仪表解算模块等连接起来。2. Lower layer architecture: located in a single tactical simulator, by using the hybrid real-time communication network proposed in this paper, the fire control calculation module, flight control calculation module, visual calculation module, and visual display module in the simulator , instrument calculation module and so on.

(1)火控解算模块：火控解算模块负责计算飞行器的火控数据，包括导弹发射方位角和仰角、目标距离、弹道修正等。此模块接收来自飞控解算模块和视景解算模块的数据，通过计算产生相应的火控数据，并将其发送给飞控解算模块。(1) Fire control calculation module: The fire control calculation module is responsible for calculating the fire control data of the aircraft, including missile launch azimuth and elevation angle, target distance, trajectory correction, etc. This module receives data from the flight control calculation module and the visual calculation module, generates corresponding fire control data through calculation, and sends it to the flight control calculation module.

(2)飞控解算模块：飞控解算模块负责计算飞行器的飞行控制数据，包括飞行速度、高度、姿态等。此模块接收来自火控解算模块、视景解算模块和仪表解算模块的数据，通过计算产生相应的飞行控制数据，并将其发送给设备控制与采集模块。(2) Flight control calculation module: The flight control calculation module is responsible for calculating the flight control data of the aircraft, including flight speed, altitude, attitude, etc. This module receives data from the fire control calculation module, visual calculation module and instrument calculation module, generates corresponding flight control data through calculation, and sends it to the equipment control and acquisition module.

(3)视景解算模块：视景解算模块负责计算飞行器的场景渲染。此模块接收来自火控解算模块、飞控解算模块和设备控制与采集模块的数据，通过计算产生相应的图像数据，并将其发送给视景显示模块。(3) Visual calculation module: The visual calculation module is responsible for calculating the scene rendering of the aircraft. This module receives data from the fire control calculation module, the flight control calculation module and the equipment control and acquisition module, generates corresponding image data through calculation, and sends it to the visual display module.

(4)视景显示模块：视景显示模块负责将视景解算模块产生的视觉数据以图像形式显示出来。此模块接收来自视景解算模块的数据，并将其渲染为可视化的图像。(4) Visual display module: The visual display module is responsible for displaying the visual data generated by the visual calculation module in the form of images. This module receives data from the View Rendering module and renders it into an image for visualization.

(5)仪表解算模块：仪表解算模块负责计算飞行器的各种仪表数据，包括速度、高度、姿态等。此模块接收来自飞控解算模块的数据，通过计算产生相应的仪表数据，并将其发送给多功能显示模块。(5) Instrument calculation module: The instrument calculation module is responsible for calculating various instrument data of the aircraft, including speed, altitude, attitude, etc. This module receives data from the flight control calculation module, generates corresponding instrument data through calculation, and sends it to the multi-function display module.

(6)多功能显示模块：多功能显示模块负责显示仪表解算模块产生的仪表数据，以及其他与飞行器相关的数据，包括火控数据、任务信息、电池状态等。此模块接收来自仪表解算模块和设备控制与采集模块的数据，并将其渲染为可视化的信息。(6) Multi-function display module: The multi-function display module is responsible for displaying the instrument data generated by the instrument calculation module and other data related to the aircraft, including fire control data, mission information, battery status, etc. This module receives the data from the instrument calculation module and the equipment control and acquisition module, and renders it into visualized information.

(7)设备控制与采集模块：设备控制与采集模块负责与飞行器各个设备进行通讯和数据采集。(7) Equipment control and acquisition module: The equipment control and acquisition module is responsible for communication and data acquisition with various equipment of the aircraft.

3、仿真空战环境的搭建：主要采用JSBSim，一个开源的跨平台六自由度非线性飞行动力学模型。3. The establishment of the simulated air combat environment: JSBSim, an open-source cross-platform six-degree-of-freedom nonlinear flight dynamics model, is mainly used.

建立JSBSim动力学模型，主要包括内容如下：The establishment of JSBSim dynamic model mainly includes the following contents:

基本特征：包括翼展、弦长、机翼面积、飞行员眼位、气动力参考点、重心位置、转动惯量、惯性积、前起和主起接地点位置和发动机推力线、起落架模型Basic features: including wingspan, chord length, wing area, pilot's eye position, aerodynamic reference point, center of gravity position, moment of inertia, product of inertia, position of front takeoff and main takeoff and touchdown points, engine thrust line, landing gear model

飞行控制方案：本发明中由于直接由智能体端到端地操作飞行器舵面状态，因此没有采用任何增稳设计。Flight control scheme: In the present invention, since the state of the rudder surface of the aircraft is directly controlled end-to-end by the agent, no stability enhancement design is adopted.

本发明基于强化学习的飞行器智能协同对抗决策方法，具体如下：The present invention is based on the aircraft intelligent cooperative confrontation decision-making method based on reinforcement learning, specifically as follows:

步骤2：动作空间设计：设计每架飞行器的动作空间，包括目标飞行器编号和四个航向作出的指令值，所述指令动作值包括迎角、滚转角、油门量；Step 2: Action space design: Design the action space of each aircraft, including the target aircraft number and the command values made by the four headings. The command action values include angle of attack, roll angle, and throttle amount;

可写为如下形式：Can be written as follows:

a＝[target,x_t,y_t,z_t,v_t]a＝[target,x _t ,y _t ,z _t ,v _t ]

其中，target表示该机选择的目标飞机的编号，x_t,y_t,z_t,v_t分别表示智能体在四个航迹维度上做出的指令值。Among them, target represents the number of the target aircraft selected by the aircraft, and x _t , y _t , z _t , and v _t represent the command values made by the agent on the four track dimensions respectively.

步骤3：回报函数设计：设计存货奖励、距离奖励/惩罚和雷达锁定项；Step 3: Reward function design: design inventory rewards, distance rewards/penalties and radar locking items;

回报函数可以写为如下形式：The return function can be written as follows:

r＝ω_αr_s+ω_βr_d+ω_γr_r r＝ω _α r _s +ω _β r _d +ω _γ r _r

其中r_s为存活奖励部分，r_d为距离奖励/惩罚项，r_r为雷达锁定项。ω_α，ω_β，ω_γ为各部分的比例。Among them, r _s is the survival reward part, r _d is the distance reward/penalty item, and r _r is the radar lock item. ω _α , ω _β , ω _γ are the proportions of each part.

步骤4：强化学习环境设计：采用训练模式和应用模式进行飞行器及对抗方的动态控制，并实现状态、动作和奖励值的数据接口功能。Step 4: Reinforcement learning environment design: Use training mode and application mode to dynamically control the aircraft and opponents, and realize the data interface functions of state, action and reward value.

训练回合由训练步长组成，每个训练回合包含有限训练步长；智能体将状态信息作为深度神经网络的输入，经过运算后生成动作；动作经过格式转换后，形成飞行器可执行的指令，发送至环境中。The training round is composed of training steps, and each training round contains a limited training step; the agent uses the state information as the input of the deep neural network, and generates actions after calculation; the actions are converted into executable instructions for the aircraft, and sent to the environment.

本发明中由于需要智能体与JSBSim仿真环境进行交互，不采用连续运行方式进行仿真，而在智能体每一步决策做出后，调用函数完成仿真的步进运行。In the present invention, since the agent needs to interact with the JSBSim simulation environment, the continuous operation mode is not used for simulation, but after each decision of the agent is made, a function is called to complete the step-by-step operation of the simulation.

雷达系统仿真选取雷达功能级仿真，对雷达系统的模块进行设计，针对机载相控阵雷达典型的空空方式，对机载相控阵雷达进行功能级系统的仿真，为本项目中的空战智能提供一个快速、准确的战场态势感知。Radar system simulation selects radar function-level simulation, and designs the modules of the radar system. Aiming at the typical air-to-air mode of airborne phased-array radar, the functional-level system simulation of airborne phased-array radar is carried out for the air combat intelligence in this project. Provide a fast and accurate battlefield situation awareness.

机载相控阵雷达建模：Airborne Phased Array Radar Modeling:

(1)主要工作方式建模：本系统建模主要对空/空功能中的空中拦截(AIC)与空中格斗(ACM)进行仿真(1) Modeling of main working methods: The modeling of this system mainly simulates air interception (AIC) and air combat (ACM) in the air/air function

(2)雷达数据处理建模：(2) Radar data processing modeling:

首先对目标检测点迹信息进行数据预处理。Firstly, data preprocessing is performed on the target detection point information.

接下来进入航迹管理模块，判断此点迹信息为真实的新目标，则开辟一条新航迹；若此时点迹能够跟已有航迹信息关联上，则成为了稳定运动点迹。Next, enter the track management module, judge that the point track information is a real new target, and open a new track; if the point track can be associated with the existing track information at this time, it becomes a stable motion point track.

之后，将准目标在球坐标系下的距离、方位、俯仰角信息反转换到直角坐标系下的三向位置坐标，从而进行滤波和预测，并将滤波结果发送给输出接口。After that, the distance, azimuth, and pitch angle information of the quasi-target in the spherical coordinate system are converted back to the three-way position coordinates in the Cartesian coordinate system, so as to perform filtering and prediction, and send the filtering results to the output interface.

若目标丢失一段时间，则判为航迹终结，界面此时清空。If the target is lost for a period of time, it will be judged as the end of the track, and the interface will be cleared at this time.

(3)数据预处理：(3) Data preprocessing:

在本系统中，为提高雷达处理的实时性，将雷达系统整体设计中的发射机模块、接收机模块、目标处理模块合并简化处理。In this system, in order to improve the real-time performance of radar processing, the transmitter module, receiver module and target processing module in the overall design of the radar system are combined to simplify the processing.

对于波束的收发，由雷达系统的扫描范围、发射功率、目标距离，加上目标雷达散射截面积，确定在未收电子干扰，天气晴好，且目标落在无杂波区情况下的理论最大可探测距离。For the transmission and reception of the beam, the scanning range of the radar system, the transmission power, the target distance, and the target radar scattering cross-sectional area are used to determine the theoretical maximum reliability under the condition that no electronic interference is received, the weather is fine, and the target falls in the clutter-free area. Detection distance.

航迹关联：计算当前探测到的目标点信息是否落进已有航迹上次预测到此刻的点迹为中心的设定范围内。当飞机目标做幅度较大的机动时，选择较大的门限来保持航迹的正确更新：Track association: Calculate whether the currently detected target point information falls within the set range centered on the point track that was predicted last time to this moment. When the aircraft target makes a large maneuver, choose a larger threshold to keep the correct update of the track:

1)如果目标检测模块检测到了目标点迹，且检测到的点迹与已经建立的航迹关联失败，此时说明它是一个新目标，且当同时雷达能连续两次对该点迹关联成功，就可以进入航迹起始。1) If the target detection module detects the target track and fails to associate the detected track with the established track, it means that it is a new target, and at the same time the radar can successfully associate the track twice consecutively , you can enter the track start.

2)如果目标检测模块探测到了目标信息，且准目标信息与已建立好的航迹关联失败，且在后面的一段时间里，机载雷达探测到的目标点迹与之前建立好的航迹没有关联成功，则判为虚警，要进行航迹的终结。2) If the target detection module detects the target information, and the association between the quasi-target information and the established track fails, and in the following period of time, the target point track detected by the airborne radar is different from the previously established track. If the association is successful, it will be judged as a false alarm, and the track must be terminated.

3)如果目标检测模块没有检测到目标点迹，此时航迹关联肯定失败，且当雷达执行跟踪事件时，本来是需要进行航迹维持的，只是本次航迹维持需要的目标点迹等于该目标点迹在上一时刻的预测结果，安排下一时刻要执行小搜事件。在没有进行波束建模的情况下，小搜行为相当于扩大一定门限。当雷达执行小搜没有检测到丢失目标，就可以进行航迹终结了。3) If the target detection module does not detect the target track, the track association must fail at this time, and when the radar executes the tracking event, it is originally necessary to maintain the track, but the target track required for this track maintenance is equal to The prediction result of the target point trace at the previous moment is arranged to execute the small search event at the next moment. In the absence of beam modeling, the small search behavior is equivalent to expanding a certain threshold. When the radar performs a small search and does not detect a lost target, the track can be terminated.

4)如果目标检测模块探测到了目标点迹，且检测到的目标点迹与已建立的航迹在距离、方位、俯仰三维均关联成功，则判断该点迹是此航迹的新的观察点，即要进行航迹的维持。4) If the target detection module detects the target point track, and the detected target point track is successfully associated with the established track in three dimensions of distance, azimuth, and pitch, then it is judged that the point track is a new observation point of this track , that is, to maintain the track.

航迹控制层次的集群空战智能决策算例分析：Case analysis of swarm air combat intelligent decision-making at track control level:

观测空间设定：根据六自由下的运动结算方程，以分别表示俯仰角、偏航角和滚转角，反映飞行器相对地面关系坐标系的姿态，[x,y,z]表示飞行器以地面为参考系的三维空间坐标值，以[vx,vy,vz]分别表示飞行器的飞行速率在三个方向上的拆分。Observation space setting: According to the motion settlement equation under six freedoms, with Respectively represent the pitch angle, yaw angle and roll angle, reflecting the attitude of the aircraft relative to the ground relative coordinate system, [x, y, z] represents the three-dimensional space coordinate value of the aircraft with the ground as the reference system, and [vx, vy, vz] Respectively represent the split of the flight speed of the aircraft in three directions.

动作空间设定：基于航迹控制任务的特点与仿真环境的设计，选取目标点的北东天坐标以及目标速度为决策维度进行连续空间的决策控制。同时，为了缩小搜索空间，将决策的位置范围集中至目标机一定范围之内，目标机的编号也通过网络来进行生成，决策结果以一维向量表示如式a＝[target,x_t,y_t,z,v_t]。Action space setting: Based on the characteristics of the track control task and the design of the simulation environment, the north-east sky coordinates of the target point and the target speed are selected as the decision-making dimension for continuous space decision-making control. At the same time, in order to narrow the search space, the location range of the decision-making is concentrated within a certain range of the target machine, and the number of the target machine is also generated through the network. The decision result is represented by a one-dimensional vector such as a=[target,x _t ,y _t ,z,v _t ].

对以下四种技术进行有机综合，形成一套基于HLA与混合实时网络的仿真体系结构：Organically synthesize the following four technologies to form a simulation architecture based on HLA and hybrid real-time network:

在系统的上层架构中，通过DIS网络实现分布式管理和数据通信，使各仿真节点之间能够高效地协同工作。同时，系统利用时间推进机制来确保仿真结果的准确性和同步性，并通过负载平衡控制技术来保证系统的稳定性和可靠性。In the upper layer structure of the system, the distributed management and data communication are realized through the DIS network, so that the simulation nodes can work together efficiently. At the same time, the system uses the time advance mechanism to ensure the accuracy and synchronization of simulation results, and ensures the stability and reliability of the system through load balancing control technology.

在系统的下层架构中，利用反射内存网技术实现高实时性和确定性延迟，使得各模块之间能够快速、准确地进行数据交互和协同工作。In the underlying architecture of the system, reflective memory network technology is used to achieve high real-time and deterministic delays, enabling fast and accurate data interaction and collaborative work between modules.

在系统的下层架构中，利用RTX实时操作系统的精确时钟和抢占式任务调度机制，使得系统能够对任务进行精细的控制和调度，从而确保各模块之间的数据交互和协同工作的高效性和准确性。In the lower layer architecture of the system, the precise clock and preemptive task scheduling mechanism of the RTX real-time operating system are used to enable the system to perform fine control and scheduling of tasks, thereby ensuring the efficiency of data interaction and collaborative work between modules. accuracy.

Claims

1. A method for intelligent cooperative confrontation decision-making of aircraft based on reinforcement learning, characterized in that, comprising the following steps:

Step 1: Observation value design: Simulate and model different models, their weapons, and radars based on flight dynamics; the information obtained by the aircraft includes: its own position information, the relative position relationship between itself and the enemy aircraft, the speed of its own aircraft, its own The speed difference between the aircraft and the enemy aircraft;

Step 2: Action space design: Design the action space of each aircraft, including the target aircraft number and the command values made by the four headings. The command action values include angle of attack, roll angle, and throttle, and are written in the following form:

a＝[target,x _t ,y _t ,z _t ,v _t ]

Among them, target represents the number of the target aircraft selected by the aircraft, and x _t , y _t , z _t , v _t represent the command values made by the agent on the four track dimensions respectively;

Step 3: Reward function design: Design inventory rewards, distance rewards/penalties and radar locking items, and the reward function is written as follows:

r＝ω _α r _s +ω _β r _d +ω _γ r _r

Among them, r _s is the survival reward part, r _d is the distance reward/penalty item, r _r is the radar lock item; ω _α , ω _β and ω _γ are the proportions of each part;

Step 4: Reinforcement learning environment design: Use training mode and application mode to dynamically control the aircraft and opponents, and realize the data interface functions of state, action and reward value;

A training round consists of training steps, and each training round contains a finite number of training steps;

The agent uses the state information as the input of the deep neural network, and generates actions after calculation;

After the action is format converted, it forms an executable command for the aircraft and sends it to the environment.

2. The aircraft intelligent cooperative confrontation decision-making method based on reinforcement learning according to claim 1, wherein the simulation modeling of the radar is used to simulate air interception and air combat in the air/air function, specifically comprising:

Step 1.1: Radar Data Processing Modeling:

Step a: Perform data preprocessing on the target detection point trace information;

Step b: Enter the track management module, judge that the point track information is a real new target, then open a new track; if the point track can be associated with the existing track information at this time, it will become a stable motion point track;

Step c: Reversely convert the distance, azimuth, and pitch angle information of the quasi-target in the spherical coordinate system to the three-way position coordinates in the Cartesian coordinate system, thereby performing filtering and prediction, and sending the filtering result to the output interface;

Step d: If the target is lost for a period of time, it will be judged as the end of the track, and the interface will be cleared;

Step 1.2: Data Preprocessing

Combine the transmitter module, receiver module and target processing module in the overall design of the radar system to simplify the processing; for the transmission and reception of the beam, it is determined by the scanning range of the radar system, the transmission power, the target distance, and the target radar scattering cross-sectional area. The theoretical maximum detectable distance under the condition of no electronic jamming, fine weather, and target falling in the clutter-free area.

3. The aircraft intelligent cooperative confrontation decision-making method based on reinforcement learning according to claim 2, wherein the track management module carries out track association specifically as follows:

Calculate whether the currently detected target point information falls within the set range centered on the point track from the last prediction of the existing track to this moment:

1) If the target detection module detects the target track, and the detected target track fails to be associated with the established track, it is considered to be a new target, and at the same time the radar can successfully associate the target track twice consecutively , then enter the track start;

2) If the target detection module detects the target information, and the association between the quasi-target information and the established track fails, and in the following period of time, the target point track detected by the airborne radar is different from the previously established track. If the association is successful, it will be judged as a false alarm and the track will be terminated;

3) If the target detection module does not detect the target point track, the track association fails at this time, and arrange to execute the small search event at the next moment. When the radar performs the small search and does not detect the lost target, the track will be terminated;

4) If the target detection module detects the target point track, and the detected target point track is successfully associated with the established track in three dimensions of distance, azimuth, and pitch, then it is judged that the point track is a new observation point of this track , that is, to maintain the track.

4. An aircraft intelligent cooperative confrontation decision-making system based on reinforcement learning, characterized in that it includes an upper-level architecture and a lower-level architecture;

The upper layer structure includes guidance and control simulation nodes, tactical command simulation nodes, battlefield environment management nodes, tactical deduction nodes and tactical simulators, and data communication and interaction between each simulation node through the DIS network;

The lower layer structure is located in a single tactical simulator. Through the hybrid real-time communication network, the fire control calculation module, flight control calculation module, visual calculation module, visual display module, and instrument calculation module in the simulator are multifunctional The display module and the device control are connected with the acquisition module.

5. the aircraft intelligent cooperative confrontation decision system based on reinforcement learning according to claim 4, is characterized in that,

1) The pilot control simulation node is the management and monitoring center of the entire system, used to coordinate and control the operation of the entire system, monitor the system status and record and playback data for evaluation; it communicates with other simulation nodes through the DIS network Interaction, including order issuing, status inquiry and data transmission;

2) The tactical command simulation node is responsible for the command and coordination of the aircraft, and is used to realize the communication and collaboration between the aircraft, to ensure teamwork and to achieve the specified mission objectives; it receives instructions from the guidance control simulation node through the DIS network, Send instructions to the battlefield environment management node and tactical deduction node, and receive data from the aircraft to update the status;

3) The battlefield environment management node builds a simulated air combat environment by establishing a JSBSim dynamic model, and is responsible for managing and monitoring the entire battlefield environment, for realizing environmental modeling and simulation, and positioning and tracking the position of the aircraft on the battlefield; Receive instructions from the guidance control simulation node through the DIS network, update the battlefield environment information, and send it to the tactical deduction node;

4) The tactical deduction node is responsible for tactical deduction and planning, and is used to collect information from other nodes and analyze it, formulate tactical strategies and plan routes; it receives information from battlefield environment management nodes and tactical command simulation nodes through the DIS network information, analyze it and generate a corresponding action plan;

5) The tactical simulator is responsible for simulating the behavior of the aircraft, and is used to predict the behavior and performance of the aircraft through simulation, so as to guide the actions of the aircraft; it receives information from the battlefield environment management node and the tactical deduction node through the DIS network, and based on This information simulates the behavior of the aircraft.

6. The aircraft intelligent cooperative confrontation decision-making system based on reinforcement learning according to claim 4, characterized in that,

1) The fire control calculation module is responsible for calculating the fire control data of the aircraft, including missile launch azimuth and elevation angle, target distance, ballistic correction; this module receives data from the flight control calculation module and the visual scene calculation module, through Calculate and generate corresponding fire control data, and send it to the flight control calculation module;

2) The flight control calculation module block is responsible for calculating the flight control data of the aircraft, including flight speed, height and attitude; this module receives data from the fire control calculation module, the visual scene calculation module and the instrument calculation module, through Calculate and generate corresponding flight control data, and send it to the equipment control and acquisition module;

3) The visual scene calculation module is responsible for calculating the scene rendering of the aircraft. This module receives data from the fire control calculation module, the flight control calculation module and the equipment control and acquisition module, generates corresponding image data through calculation, and It is sent to the visual display module;

4) The visual display module is responsible for displaying the visual data generated by the visual calculation module in the form of an image. This module receives data from the visual calculation module and renders it as a visualized image;

5) The instrument calculation module is responsible for calculating various instrument data of the aircraft, including speed, altitude and attitude; this module receives data from the flight control calculation module, generates corresponding instrument data through calculation, and sends it to multiple Function display module;

6) The multifunctional display module is responsible for displaying the instrument data generated by the instrument calculation module, and other data related to the aircraft, including fire control data, mission information, and battery status; this module receives information from the instrument calculation module and equipment control and Collect module data and render it as visualized information;

7) The device control and acquisition module is responsible for communication and data acquisition with various devices of the aircraft.

7. The aircraft intelligent cooperative confrontation decision-making system based on reinforcement learning according to claim 4 is characterized in that, in the upper-level architecture and the lower-level architecture, the following four technologies are organically synthesized to form a set of HLA-based and hybrid Simulation architecture of real-time network:

1) Using DIS distributed management, time advancement mechanism and load balancing control technology

In the upper layer structure of the system, the distributed management and data communication are realized through the DIS network, so that the simulation nodes can work together efficiently; at the same time, the system uses the time advancement mechanism to ensure the accuracy and synchronization of the simulation results, and through Load balance control technology to ensure the stability and reliability of the system;

2) Utilize the high real-time characteristics and deterministic delay of the reflective memory network

In the underlying architecture of the system, reflective memory network technology is used to achieve high real-time and deterministic delays, enabling fast and accurate data interaction and collaborative work between modules;

3) Use RTX's precise clock and preemptive task scheduling mechanism

In the lower layer architecture of the system, the precise clock and preemptive task scheduling mechanism of the RTX real-time operating system are used to enable the system to perform fine control and scheduling of tasks, thereby ensuring the efficiency of data interaction and collaborative work between modules. accuracy;

4) Using the data communication mechanism of CAN bus

In the underlying architecture of the system, the CAN bus data communication mechanism is used to realize efficient data transmission and communication between modules, thus ensuring the stability and reliability of the system.

8. The aircraft intelligent cooperative confrontation decision-making system based on reinforcement learning according to claim 5, wherein the basic features of the JSBSim dynamic model include wingspan, chord length, wing area, pilot's eye position, air Dynamic reference point, position of center of gravity, moment of inertia, product of inertia, position of front takeoff and main takeoff and touchdown points, engine thrust line, landing gear model.