CN111857107A

CN111857107A - Auxiliary mobile robot navigation control system and method based on learning component library

Info

Publication number: CN111857107A
Application number: CN202010522452.1A
Authority: CN
Inventors: 孙长银; 何子辰; 董璐; 陈启军; 王嘉伟
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-10-30
Anticipated expiration: 2040-06-10
Also published as: CN111857107B

Abstract

The invention discloses a system and a method for controlling navigation of an auxiliary mobile robot based on a learning component library, wherein the system comprises the learning component library, and the learning component library comprises: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component and a visualization component. The components in the invention are mutually interacted and flexibly called, and a plurality of simulation or practically applied reinforcement learning training and visualization closed-loop learning systems which are suitable for different algorithm strategies of navigation task scenes can be quickly constructed according to the type of the mobile robot in the using process.

Description

Auxiliary mobile robot navigation control system and method based on learning component library

技术领域technical field

本发明涉及一种基于学习组件库的辅助型移动机器人导航控制系统和方法，属于机器人控制技术领域。The invention relates to an auxiliary mobile robot navigation control system and method based on a learning component library, and belongs to the technical field of robot control.

背景技术Background technique

近几年来，随着机器人技术的发展，功能辅助型移动机器人在农业、商业、物流、医疗辅助、军工等各个领域有着广泛的应用。比如，在国内新冠病毒疫情期间，辅助型移动机器人凭借其自主性，在医院、小区消毒、快递物流配送、体温检测、隔离区智能问诊等方面起到重要的作用，推动了我国的防疫抗议的进程。In recent years, with the development of robot technology, function-assisted mobile robots have been widely used in various fields such as agriculture, commerce, logistics, medical assistance, and military industry. For example, during the domestic new crown virus epidemic, auxiliary mobile robots, with their autonomy, played an important role in hospitals, community disinfection, express logistics and distribution, body temperature detection, intelligent consultation in isolation areas, etc., and promoted my country's epidemic prevention protests process.

辅助型移动机器人是一个综合系统，其集成了环境感知、自主定位、路径规划、底层导航控制与执行特定辅助功能等于一体。以疫情期间执行公共场合消毒任务的移动机器人为例，其在执行消毒工作的过程中，通过自身搭载的多种外部传感器，比如单目视觉摄像头、双目视觉摄像头、激光雷达、毫米波雷达，超声波传感器等获取需要消毒的区域环境信息；接着结合自身内部传感器，如惯性传感器、GPS等估计自身在当前所处区域的全局位置与姿态信息；在上述两步的基础上，结合具体任务需求，使用路径规划算法，如人工势场法、启发式快速扩展随机树等，规划出一条从初始位置到目标位置的最优路径；最后，结合自身动力学与运动学特性、执行器特点与底盘驱动构型，通过底层导航控制器对规划轨迹进行精确的导航跟踪控制，使移动机器人按照预先规划的路径行驶。Assisted mobile robot is a comprehensive system that integrates environment perception, autonomous positioning, path planning, low-level navigation control and execution of specific auxiliary functions. Take a mobile robot that performs disinfection tasks in public places during the epidemic as an example. During the process of disinfection, it uses a variety of external sensors on its own, such as monocular vision cameras, binocular vision cameras, lidar, and millimeter-wave radar. Ultrasonic sensors, etc. obtain the environmental information of the area that needs to be sterilized; then combine their own internal sensors, such as inertial sensors, GPS, etc. to estimate their global position and attitude information in the current area; on the basis of the above two steps, combined with specific task requirements, Use path planning algorithms, such as artificial potential field method, heuristic fast expanding random tree, etc., to plan an optimal path from the initial position to the target position; finally, combine its own dynamics and kinematics characteristics, actuator characteristics and chassis drive Configuration, through the underlying navigation controller to carry out precise navigation and tracking control of the planned trajectory, so that the mobile robot travels according to the pre-planned path.

但是目前传统的导航控制方法缺少特定的辅助型移动机器人的仿真平台，配置训练过程复杂繁琐，缺乏系统性；并且现阶段每一种强化学习导航控制算法都是建立在特定的机器人以及特定的场景之上，模拟场景与实际场景的强化学习环境搭建方法不同，缺乏灵活性。However, the current traditional navigation control method lacks a specific simulation platform for auxiliary mobile robots, the configuration and training process is complicated and tedious, and lacks systematicness; and each reinforcement learning navigation control algorithm at this stage is based on a specific robot and a specific scene. On top of that, the reinforcement learning environment construction methods for simulated scenarios and actual scenarios are different and lack flexibility.

发明内容SUMMARY OF THE INVENTION

本发明针对上述传统统辅助型移动机器人导航控制方法中存在的以上技术问题，提供一种基于学习组件库的辅助型移动机器人导航控制方法，方便使用者根据自身需求进行快速的强化学习闭环控制系统搭建，且方便进行参数调试与性能优化的移动机器人导航控制方法。Aiming at the above technical problems existing in the above-mentioned traditional system-assisted mobile robot navigation control method, the present invention provides an auxiliary mobile robot navigation control method based on a learning component library, which is convenient for users to quickly strengthen the learning closed-loop control system according to their own needs. A navigation control method for mobile robots that is easy to build and facilitate parameter debugging and performance optimization.

本发明采用以下技术方案。The present invention adopts the following technical solutions.

一方面，本发明提供一种基于学习组件库的辅助型移动机器人导航控制系统，包括学习组件库，所述学习组件库包括：初始化组件、环境建模组件、路径规划组件、核心算法组件、测试组件、优化组件和可视化组件；所述初始化组件，用于完成特定移动机器人类型对应的状态空间、动作空间的初始化，以及用于设立奖励函数；所述环境建模组件，用于读取并处理移动机器人搭载的传感器数据，以及用于确定定位机器人所处的全局位置数据以及在进行仿真任务时，建立虚拟的与移动机器人交互的环境；所述路径规划组件，用于提供能够选择的路径规划算法以实现最优导航路径规划；所述核心算法组件，用于提供多种强化学习算法供选择，配合底层控制算法组件或者直接使得输出控制器指令，动作后再次通过环境建模组件获得当前信息，以完成强化学习闭环控制；所述测试组件，用于提供供选择的扰动方法，以测试利用核心算法组件确定的强化学习算法的性能；所述优化组件，用于提供供选择的优化算法对利用核心算法组件确定的强化学习算法的选定参数进行调节，以提升导航控制算法的性能；所述可视化组件，用于将核心算法组件以及测试组件的输出数值实现可视化。In one aspect, the present invention provides an auxiliary mobile robot navigation control system based on a learning component library, including a learning component library, and the learning component library includes: an initialization component, an environment modeling component, a path planning component, a core algorithm component, a test component component, optimization component and visualization component; the initialization component is used to complete the initialization of the state space and action space corresponding to a specific mobile robot type, and used to set up a reward function; the environment modeling component is used to read and process The sensor data carried by the mobile robot, as well as the global position data used to determine the location where the robot is located, and to establish a virtual environment for interacting with the mobile robot when performing a simulation task; the path planning component is used to provide selectable path planning algorithm to achieve optimal navigation path planning; the core algorithm component is used to provide a variety of reinforcement learning algorithms for selection, cooperate with the underlying control algorithm component or directly output controller commands, and obtain current information through the environment modeling component after the action. , to complete the reinforcement learning closed-loop control; the test component is used to provide a perturbation method for selection to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used to provide an optimization algorithm for selection. The selected parameters of the reinforcement learning algorithm determined by the core algorithm component are used for adjustment to improve the performance of the navigation control algorithm; the visualization component is used to visualize the output values of the core algorithm component and the test component.

进一步地，所述核心组件库包括同策略模块、异策略模块以及综合策略模块，所述同策略模块用于封装同策略的强化学习算法，所述异策略用于封装异策略的强化学习算法；所述综合策略模块，用于封装综合策略算法，所述综合策略算法为综合同策略与异策略的数据驱动强化算法。所述综合策略算法包括：通过及时将学习的新策略反馈给移动机器人系统，收集特定系统数据来优化强化学习算法的适应能力；同时考虑系统的原始特性，将重新收集的数据与以往回放的经验数据结合，再次学习最终确定强化学习算法。Further, the core component library includes a same strategy module, a different strategy module and a comprehensive strategy module, the same strategy module is used to encapsulate the reinforcement learning algorithm of the same strategy, and the different strategy is used to encapsulate the reinforcement learning algorithm of the different strategy; The integrated strategy module is used to encapsulate an integrated strategy algorithm, and the integrated strategy algorithm is a data-driven enhancement algorithm that integrates the same strategy and different strategies. The comprehensive strategy algorithm includes: by timely feeding back the learned new strategy to the mobile robot system, and collecting specific system data to optimize the adaptive ability of the reinforcement learning algorithm; at the same time, considering the original characteristics of the system, the re-collected data is compared with the previous playback experience. The data is combined and learned again to finally determine the reinforcement learning algorithm.

进一步地，所述系统还包括：底层控制算法组件，所述底层控制算法组件能够直接用于提供作为与强化学习算法对比的基准组件，也能够与上层强化学习算法结合，搭建从状态直接到执行器指令的闭环控制强化学习系统。进一步地，所述环境建模组件包括：传感器数据处理模块、移动机器人定位模块和强化学习环境建模模块，所述传感器数据处理模块用于读取并处理移动机器人搭载的传感器数据，所述移动机器人定位模块用于实时定位机器人所处的全局位置数据；所述强化学习环境建模模块用于在进行仿真任务时，建立虚拟的与移动机器人交互的环境。Further, the system further includes: a low-level control algorithm component, the low-level control algorithm component can be directly used to provide a benchmark component for comparison with the reinforcement learning algorithm, and can also be combined with the upper-level reinforcement learning algorithm to build a state from direct to execution. Closed-loop control of reinforcement learning systems for machine instructions. Further, the environment modeling component includes: a sensor data processing module, a mobile robot positioning module and a reinforcement learning environment modeling module, the sensor data processing module is used to read and process sensor data carried by the mobile robot, the mobile robot The robot positioning module is used for locating the global position data of the robot in real time; the reinforcement learning environment modeling module is used for establishing a virtual environment for interacting with the mobile robot when the simulation task is performed.

进一步地，所述优化组件提供的供选择的优化算法包括正则化算法，所述正则化算法包括L1和L2正则化算法、熵正则化算法和/或早停算法。Further, the optional optimization algorithm provided by the optimization component includes a regularization algorithm, and the regularization algorithm includes an L1 and L2 regularization algorithm, an entropy regularization algorithm and/or an early stopping algorithm.

进一步地，所述路径规划组件和核心算法组建中分别设置评价函数模块，用于提供性能评价函数实现对所述路径规划组件和核心算法组件的参数调节和算法选择的性能评价。Further, an evaluation function module is respectively set in the path planning component and the core algorithm component to provide a performance evaluation function to realize the performance evaluation of the parameter adjustment and algorithm selection of the path planning component and the core algorithm component.

第二方面，本发明提供一种基于学习组件库的辅助型移动机器人导航控制方法，所述方法基于以上技术方案所述的基于学习组件库的辅助型移动机器人导航控制系统，所述方法包括以下步骤：从预先构建的初始化组件选择与特定移动机器人类型对应的状态空间和动作空间，并设立强化学习的奖励函数完成初始化； In a second aspect, the present invention provides an auxiliary mobile robot navigation control method based on a learning component library, the method is based on the learning component library-based auxiliary mobile robot navigation control system described in the above technical solution, and the method includes the following Steps: Select the state space and action space corresponding to the specific mobile robot type from the pre-built initialization components, and set up the reinforcement learning reward function to complete the initialization;

利用预先构建的环境建模组件构建强化学习仿真环境；通过环境建模组件获取障碍物相对位置与移动机器人自身位置，利用预先构建的路径规划组件选择所需的路径规划算法，规划最优导航路径；根据路径规划结果，调节导航控制算法的奖励函数；Use the pre-built environment modeling component to build a reinforcement learning simulation environment; obtain the relative position of the obstacle and the mobile robot's own position through the environment modeling component, use the pre-built path planning component to select the required path planning algorithm, and plan the optimal navigation path ; Adjust the reward function of the navigation control algorithm according to the path planning result;

从预先构建的核心算法组件选择确定强化学习算法，联合定义的动作空间、状态空间、奖励函数与强化学习环境，选择核心算法模块，进行训练；通过底层控制模块或者直接输出控制器指令进行动作，接着再次通过环境建模组件获取障碍物相对位置与移动机器人自身位置，重复步骤完成输出控制器指令完成强化学习闭环控制；Select and determine the reinforcement learning algorithm from the pre-built core algorithm components, combine the defined action space, state space, reward function and reinforcement learning environment, select the core algorithm module for training; perform actions through the underlying control module or directly output controller instructions, Then, obtain the relative position of the obstacle and the position of the mobile robot itself through the environment modeling component again, and repeat the steps to complete the output controller command to complete the reinforcement learning closed-loop control;

从测试组件选择扰动方法，测试从核心算法组件选择确定的强化学习算法的性能；Select the perturbation method from the test component, and test the performance of the reinforcement learning algorithm selected from the core algorithm component;

从优化组件中选择确定优化算法对利用核心算法组件确定的强化学习算法的选定参数进行调节，以提升导航控制算法的性能；Select and determine the optimization algorithm from the optimization component to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component, so as to improve the performance of the navigation control algorithm;

利用可视化组件将核心算法组件以及测试组件的输出数值实现可视化。Use the visualization component to visualize the output values of the core algorithm component and the test component.

第三方面，本发明提供一种基于学习组件库的辅助型移动机器人导航控制方法，所述方法基于学习组件库的辅助型移动机器人导航控制系统，所述系统包括学习组件库，所述学习组件库包括：初始化组件、环境建模组件、路径规划组件、核心算法组件、测试组件、优化组件、可视化组件和底层控制算法组件；所述初始化组件，用于完成特定移动机器人类型对应的状态空间、动作空间的初始化，以及用于设立奖励函数；所述路径规划组件，用于提供能够选择的路径规划算法以实现最优导航路径规划；所述核心算法组件，用于提供多种强化学习算法供选择，使得输出控制器指令完成强化学习闭环控制；所述测试组件，用于提供供选择的扰动方法，以测试利用核心算法组件确定的强化学习算法的性能；所述优化组件，用于提供供选择的优化算法对利用核心算法组件确定的强化学习算法的选定参数进行调节，以提升导航控制算法的性能；所述可视化组件，用于将核心算法组件以及测试组件的输出数值实现可视化；所述底层控制算法组件用于提供作为与强化学习算法对比的基准组件；In a third aspect, the present invention provides a navigation control method for an assisted mobile robot based on a learning component library. The method is based on an assisted mobile robot navigation control system based on a learning component library. The system includes a learning component library, and the learning component The library includes: initialization component, environment modeling component, path planning component, core algorithm component, testing component, optimization component, visualization component and underlying control algorithm component; the initialization component is used to complete the state space corresponding to a specific mobile robot type, Initialization of the action space, and for setting up a reward function; the path planning component for providing a selectable path planning algorithm to achieve optimal navigation path planning; the core algorithm component for providing a variety of reinforcement learning algorithms for selection, so that the output controller instruction completes the reinforcement learning closed-loop control; the testing component is used to provide a perturbation method for selection to test the performance of the reinforcement learning algorithm determined by using the core algorithm component; the optimization component is used to provide The selected optimization algorithm adjusts the selected parameters of the reinforcement learning algorithm determined by the core algorithm component, so as to improve the performance of the navigation control algorithm; the visualization component is used to visualize the output values of the core algorithm component and the test component; so The underlying control algorithm components described above are used to provide benchmark components for comparison with reinforcement learning algorithms;

所述方法包括以下步骤：The method includes the following steps:

从预先构建的初始化组件选择与特定移动机器人类型对应的状态空间和动作空间，并设立强化学习的奖励函数完成初始化；Select the state space and action space corresponding to the specific mobile robot type from the pre-built initialization components, and set up the reinforcement learning reward function to complete the initialization;

调用环境建模组件获得移动机器人搭载的传感器数据和移动机器人所处的全局位置数据；Call the environment modeling component to obtain the sensor data carried by the mobile robot and the global position data of the mobile robot;

结合定义的动作空间、状态空间、奖励函数与移动机器人搭载的传感器数据和移动机器人所处的全局位置数据，从预先构建的核心算法组件选择确定强化学习算法，配合底层控制组件或者直接使得输出控制器指令，动作后再次通过环境建模组件各个传感器数值，重复上述过程完成强化学习闭环控制；Combine the defined action space, state space, reward function with the sensor data carried by the mobile robot and the global position data of the mobile robot, select and determine the reinforcement learning algorithm from the pre-built core algorithm components, cooperate with the underlying control components or directly make the output control After the action, each sensor value of the environment modeling component is used again, and the above process is repeated to complete the reinforcement learning closed-loop control;

从测试组件选择扰动方法，利用测试组件进行算法评估与测试，并实时反馈传感器处理模块输出状态观测值，判断是否达到控制要求；Select the disturbance method from the test component, use the test component to evaluate and test the algorithm, and feedback the output state observation value of the sensor processing module in real time to judge whether the control requirements are met;

从优化组件中选择确定优化算法对利用核心算法组件确定的强化学习算法的选定参数进行调节，直到移动机器人在导航控制任务中获得预定的执行效果；利用可视化组件将核心算法组件以及测试组件的输出数值实现可视化。Select and determine the optimization algorithm from the optimization component to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component until the mobile robot obtains the predetermined execution effect in the navigation control task; use the visualization component to adjust the core algorithm component and the test component. The output value is visualized.

进一步的，所述路径规划组件和核心算法组建中分别设置性能评价函数模块，所述方法还包括，利用评价函数模块确定性能评价函数，对所述路径规划组件和核心算法组件的参数调节和算法选择的性能评价，利用可视化组件对性能评价函数的评价结果进行可视化。Further, a performance evaluation function module is respectively set in the path planning component and the core algorithm building, and the method further includes: using the evaluation function module to determine a performance evaluation function, and adjusting parameters and algorithms of the path planning component and the core algorithm component. The selected performance evaluation uses the visualization component to visualize the evaluation results of the performance evaluation function.

本发明所取得的有益技术效果：本发明中各个组件之间相互交互，灵活调用，使用过程中可以根据移动机器人类型，快速构建多种适应其导航任务场景的仿真或者实际应用的强化学习训练与可视化闭环学习系统；通过测试组件，可以对配置算法的稳定性、鲁棒性、泛化能力进行测试，若需优化算法，学习组件库中的优化组件，可以方便快捷得进行参数优化与正则化操作，避免算法过拟合，提高算法性能；同时，若要更改移动机器人传感器配置、或者驱动构型，无需重新搭建整个导航控制算法工作流，直接替换相应组件模块即可，具有良好的灵活性与通用性。The beneficial technical effect obtained by the present invention: in the present invention, each component interacts with each other, and can be called flexibly. During the use process, a variety of simulation or practical application of reinforcement learning training and training can be quickly constructed according to the type of the mobile robot. Visual closed-loop learning system; by testing components, you can test the stability, robustness and generalization ability of the configuration algorithm. If you need to optimize the algorithm, you can easily and quickly perform parameter optimization and regularization by learning the optimization components in the component library operation, avoid algorithm overfitting and improve algorithm performance; at the same time, if you want to change the sensor configuration or drive configuration of the mobile robot, you don’t need to rebuild the entire navigation control algorithm workflow, you can directly replace the corresponding component modules, which has good flexibility with versatility.

本发明所述基于学习组件的辅助型移动过机器人导航与控制方法，既可以通过组件之间完整的工作流应用于实际的移动机器人控制，也可以使用仿真环境对移动机器人的导航控制算法效果进行仿真测试；具有良好的灵活性和通用性，可以方便地应用于搭载多种传感器方案与驱动构型的辅助型移动机器人的导航控制任务之中。在实际应用过程中，即可以通过传统控制算法控制移动机器人进行导航任务，也可以通过快速搭建强化学习环境进行基于强化学习的导航控制，同时，该方法不仅可以利用于实际工作的机器人导航控制之中，也可以构建仿真环境进行算法性能研究。使用时可以模块化更改路径规划算法、奖励函数构建方法、核心学习算法模块等，并且可以方便监测各个算法的各种评价指标，设立作为对比的基准算法。The assisted mobile robot navigation and control method based on the learning component of the present invention can not only be applied to the actual mobile robot control through the complete workflow between components, but also can use the simulation environment to carry out the navigation control algorithm effect of the mobile robot. Simulation test; has good flexibility and versatility, and can be easily applied to the navigation control tasks of auxiliary mobile robots equipped with various sensor schemes and driving configurations. In the actual application process, the mobile robot can be controlled by traditional control algorithms to perform navigation tasks, and the reinforcement learning-based navigation control can also be performed by quickly building a reinforcement learning environment. In the simulation environment, you can also build a simulation environment to study the performance of the algorithm. During use, the path planning algorithm, the reward function construction method, the core learning algorithm module, etc. can be changed modularly, and various evaluation indicators of each algorithm can be easily monitored, and a benchmark algorithm for comparison can be established.

另一方面，该导航控制学习组件库中提供了包含主流控制算法的底层控制算法组件，方便对学习算法进行性能对比验证。On the other hand, the navigation control learning component library provides low-level control algorithm components including mainstream control algorithms to facilitate performance comparison and verification of the learning algorithms.

附图说明Description of drawings

图1是本发明实施例给出的一种基于学习组件的辅助型移动机器人导航控制系统的总体架构；Fig. 1 is the overall structure of a kind of auxiliary mobile robot navigation control system based on learning component provided by the embodiment of the present invention;

图2是本发明实施例给出的一种基于学习组件的辅助型移动机器人导航控制方法的第一种构建方法；2 is a first construction method of a learning component-based assisted mobile robot navigation control method provided in an embodiment of the present invention;

图 3 是本发明实施例给出的一种基于学习组件的辅助型移动机器人导航控制方法的第二种构建方法；3 is a second construction method of a learning component-based assisted mobile robot navigation control method provided by an embodiment of the present invention;

图4是本发明实施例中综合策略算法架构图。FIG. 4 is an architecture diagram of a comprehensive strategy algorithm in an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图和具体实施例对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and specific embodiments.

实施例一、一种基于学习组件库的辅助型移动机器人导航控制系统，包括预先建立的用于辅助型移动机器人的导航控制的学习组件库，所述学习组件库包括：初始化组件、环境建模组件、路径规划组件、核心算法组件、测试组件、优化组件和可视化组件；Embodiment 1. An auxiliary mobile robot navigation control system based on a learning component library, including a pre-established learning component library for navigation control of an auxiliary mobile robot, the learning component library includes: an initialization component, an environment modeling components, path planning components, core algorithm components, testing components, optimization components and visualization components;

所述初始化组件，用于完成特定移动机器人类型对应的状态空间、动作空间的初始化，以及用于设立奖励函数；The initialization component is used to complete the initialization of the state space and action space corresponding to the specific mobile robot type, and to set up a reward function;

所述环境建模组件，用于读取并处理移动机器人搭载的传感器数据，以及用于确定定位机器人所处的全局位置数据以及在进行仿真任务时，建立虚拟的与移动机器人交互的环境；The environment modeling component is used to read and process sensor data carried by the mobile robot, and to determine the global position data where the positioning robot is located and to establish a virtual environment for interacting with the mobile robot when performing a simulation task;

所述路径规划组件，用于提供能够选择的路径规划算法以实现最优导航路径规划；所述核心算法组件，用于提供多种强化学习算法供选择，使得输出控制器指令完成强化学习闭环控制；所述优化组件用于提供正则化方法以使得实现强化学习算法的优化；所述测试组件，用于提供供选择的扰动方法，以测试利用核心算法组件确定的强化学习算法的性能；所述优化组件，用于提供供选择的正则化算法对利用核心算法组件确定的强化学习算法的选定参数进行调节，以提升导航控制算法的性能；所述可视化组件，用于将核心算法组件以及测试组件的输出数值实现可视化。The path planning component is used to provide a selectable path planning algorithm to realize optimal navigation path planning; the core algorithm component is used to provide a variety of reinforcement learning algorithms for selection, so that the output controller instruction can complete the reinforcement learning closed-loop control The optimization component is used to provide a regularization method so as to realize the optimization of the reinforcement learning algorithm; the test component is used to provide a perturbation method for selection to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the The optimization component is used to provide a regularization algorithm for selection to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component, so as to improve the performance of the navigation control algorithm; the visualization component is used to adjust the core algorithm component and the test The output value of the component is visualized.

用于导航控制的学习组件库，是一种计算机数据库，是用于移动机器人导航控制的标准化计算机软件模块。导航控制学习组件库，根据输入信息，调用预先封装好的算法与模块，最后得到各个组件所返回结果。本发明提供的学习组件库可以直接应用于实际的移动机器人的导航控制，也可以将核心强化学习算法作为控制闭环中的一个上层控制环节，用于学习复杂的移动机器人行为，输出底层控制器的参考量。Learning Component Library for Navigation Control, a computer database, is a standardized computer software module for mobile robot navigation control. The navigation control learning component library calls the pre-packaged algorithms and modules according to the input information, and finally obtains the results returned by each component. The learning component library provided by the present invention can be directly applied to the navigation control of the actual mobile robot, and the core reinforcement learning algorithm can also be used as an upper-level control link in the control closed loop to learn the complex mobile robot behavior and output the underlying controller's behavior. reference volume.

在移动机器人跟踪规划路径的导航控制问题中，本实施例组件主要有以下几类：In the navigation control problem of the mobile robot tracking the planned path, the components in this embodiment mainly fall into the following categories:

包含不同驱动构型的移动机器人的动作空间、状态空间、奖励函数设计的初始化组件；包含搭建环境模型所需各个模块的环境建模组件；包含不同路径规划算法的路径规划组件；包含不同的同策略、异策略、综合策略算法的核心算法组件；包含提高控制算法鲁棒性、泛化性，避免过拟合的正则化模块的优化组件；包含用于测试算法性能的测试组件。包含实现各种性能参数可视化的可视化组件。It includes the initialization components for the action space, state space, and reward function design of mobile robots with different driving configurations; the environment modeling component includes various modules required to build the environment model; the path planning component includes different path planning algorithms; The core algorithm components of strategy, heterogeneous strategy, and comprehensive strategy algorithms; include optimization components of regularization modules that improve the robustness and generalization of control algorithms and avoid overfitting; include test components for testing algorithm performance. Contains visualization components to visualize various performance parameters.

例如，路径规划组件中，输入上一环境建模组件输出的环境信息，机器人自身位置与目标点位置，选择需要的路径规划算法后，可以得到规划的路径。For example, in the path planning component, input the environmental information output by the previous environment modeling component, the position of the robot itself and the target point, and after selecting the required path planning algorithm, the planned path can be obtained.

又例如，在核心算法组件中，通过输入所选算法类型，调用该算法模块进行训练，实时返回训练过程中的性能评价参数，用于监测训练过程中的算法表现。For another example, in the core algorithm component, by inputting the selected algorithm type, the algorithm module is called for training, and the performance evaluation parameters during the training process are returned in real time to monitor the algorithm performance during the training process.

在具体实施例中，可选地，所述初始化组件，主要包括状态空间设计模块、动作空间设计模块、奖励函数设计模块。所述环境建模组件，主要包括视觉传感器处理模块、激光雷达传感器处理模块、机器人定位传感器处理模块、强化学习环境建模模块。所述路径规划组件，包括启发式路径规划模块、人工势场路径规划模块、机器学习路径规划模块等。所述核心算法组件，包括同策略算法模块、异策略算法模块、综合策略算法模块。所述优化组件，包括超参数优化模块、正则化模块等，其中正则化模块中封装来常用的正则化算法，如L1/L2正则化算法、熵正则化算法、早停算法等，可根据需要进行添加，用于提高强化学习算法的泛化性能。所述扰动组件，包括动态障碍物扰动模块、风力扰动模块、水流扰动模块等。所述可视化组件，包括学习曲线可视化模块、导航控制误差可视化模块、执行器数值可视化模块等。In a specific embodiment, optionally, the initialization component mainly includes a state space design module, an action space design module, and a reward function design module. The environment modeling component mainly includes a visual sensor processing module, a lidar sensor processing module, a robot positioning sensor processing module, and a reinforcement learning environment modeling module. The path planning component includes a heuristic path planning module, an artificial potential field path planning module, a machine learning path planning module, and the like. The core algorithm component includes a same-policy algorithm module, a different-strategy algorithm module, and a comprehensive strategy algorithm module. The optimization component includes a hyperparameter optimization module, a regularization module, etc., wherein the regularization module is encapsulated with commonly used regularization algorithms, such as L1/L2 regularization algorithm, entropy regularization algorithm, early stop algorithm, etc. Added to improve the generalization performance of reinforcement learning algorithms. The disturbance component includes a dynamic obstacle disturbance module, a wind disturbance module, a water flow disturbance module, and the like. The visualization component includes a learning curve visualization module, a navigation control error visualization module, an actuator numerical visualization module, and the like.

本发明所提出的学习组件库中的各类组件，是用于辅助型移动机器人导航控制的学习算法的标准化计算机软件模块。各个组件根据输入信息，调用预先封装好的算法与模块，最后得到各个组件所返回结果。本领域的技术人员可以基于本发明所提供的系统架构，根据实际应用的需求，利用现有技术实现各组件的构建和组件之间的调用，即自行封装智能算法并转换成标准模块，加入到包含对应功能的组件中。本发明中学习组件库中的各个组件可以相互交互，相互调用。The various components in the learning component library proposed by the present invention are standardized computer software modules used for the learning algorithm of the assisted mobile robot navigation control. Each component calls the pre-packaged algorithms and modules according to the input information, and finally obtains the results returned by each component. Those skilled in the art can, based on the system architecture provided by the present invention and according to the requirements of practical applications, use the existing technology to realize the construction of each component and the invocation between components, that is, to encapsulate the intelligent algorithm and convert it into a standard module, and add it to the In the component that contains the corresponding function. The various components in the learning component library in the present invention can interact with each other and call each other.

本实施例中所述导航控制学习组件库，既可以通过组件之间完整的工作流应用于实际的移动机器人控制，也可以使用仿真环境对移动机器人的导航控制算法效果进行仿真测试。在对移动机器人进行仿真控制时，可以通过环境建模组件中的强化学习环境建模模块来进行虚拟环境的搭建，使用者可以直接测试核心算法库总的算法性能，也可以将自己设计的强化学习算法进行对接，进而快速完成算法训练环境的搭建。同时，避免来直接在实际机器人上训练带来的一些时间、硬件成本损耗。The navigation control learning component library in this embodiment can be applied to the actual mobile robot control through the complete workflow between the components, and can also use the simulation environment to simulate the effect of the navigation control algorithm of the mobile robot. When the mobile robot is simulated and controlled, the virtual environment can be built through the reinforcement learning environment modeling module in the environment modeling component. The user can directly test the overall algorithm performance of the core algorithm library, and can also strengthen the design by himself. The learning algorithm is connected, and then the construction of the algorithm training environment is quickly completed. At the same time, it avoids some time and hardware cost losses caused by training directly on the actual robot.

若使用场景相对单纯，可以直接应用于实际的移动机器人控制，一方面，通过各传感器的观测值，策略网络输出执行器指令，同时为了防止执行器损坏，对各个执行器指令的阈值进行来限定，保证实际运行时的安全性。另一方面，也可以基于该导航控制学习组件库，将核心强化学习算法作为控制闭环中的一个上层控制环节，用于学习复杂的移动机器人行为，输出底层控制器的参考量，这样通过结合强化学习于传统闭环控制器的优势，更加确保来最终算法性能。If the usage scenario is relatively simple, it can be directly applied to the actual mobile robot control. On the one hand, through the observation values of each sensor, the strategy network outputs the actuator commands, and at the same time, in order to prevent damage to the actuators, the thresholds of each actuator command are limited. , to ensure actual runtime security. On the other hand, based on the navigation control learning component library, the core reinforcement learning algorithm can also be used as an upper control link in the control closed loop to learn complex mobile robot behavior and output the reference quantity of the underlying controller. Learning from the advantages of traditional closed-loop controllers ensures the final algorithm performance.

优选地，所述导航控制学习组件库，各个组件参数调节与算法选择可由各个组件的性能评价函数来决定，各个性能评价函数均可以通过可视化模块进行可视化，方便监测与评估算法表现。Preferably, in the navigation control learning component library, the parameter adjustment and algorithm selection of each component can be determined by the performance evaluation function of each component, and each performance evaluation function can be visualized through a visualization module to facilitate monitoring and evaluation of algorithm performance.

在路径规划组件中，依据时间指标、能量消耗指标来评价最终规划路径结果。In the route planning component, the final planned route result is evaluated according to the time index and the energy consumption index.

可选地，在核心算法组件中，由最终学习曲线、跟踪精度、执行器数值变化曲线来评价最终导航控制效果。Optionally, in the core algorithm component, the final navigation control effect is evaluated by the final learning curve, the tracking accuracy, and the value change curve of the actuator.

优选地，组件中，各个模块的使用与否，由任务需求决定；可以根据不同的任务情况，灵活地增加、删除、替换各个模块。比如，希望比较深度确定性策略梯度算法与综合策略算法的导航控制效果，只需在核心算法组件中，进行替换，根据最终的评价指标进行比较即可。Preferably, in the component, whether each module is used or not is determined by task requirements; each module can be flexibly added, deleted and replaced according to different task conditions. For example, if you want to compare the navigation control effect of the deep deterministic strategy gradient algorithm and the comprehensive strategy algorithm, you only need to replace the core algorithm components and compare them according to the final evaluation index.

实施例二、在实施例一的基础上，本实施例提供了一种基于学习组件库的辅助型移动机器人导航控制系统，所述系统还包括：底层控制算法组件，所述底层控制算法组件用于提供作为与强化学习算法对比的基准组件。所述底层控制算法组件，是一个可选组件，包括一些常用的控制算法模块，该组件主要是作为对比基准或者配合和核心算法组件构建导航控制闭环系统。所述底层控制算法组件一方面可直接用于提供作为与强化学习算法对比的基准组件，另一方面可与上层强化学习算法结合，搭建从状态直接到执行器指令的闭环控制强化学习系统。将强化学习输出（x,y,psi..)上层指令作为控制器的输入，控制器通过调用底层控制算法组件输出执行器指令进行跟踪。这样的分层架构，可以有效减小强化学习的数据纬度，提高效率。Embodiment 2 On the basis of Embodiment 1, this embodiment provides an auxiliary mobile robot navigation control system based on a learning component library. The system further includes: a low-level control algorithm component, and the low-level control algorithm component uses To provide benchmark components against which reinforcement learning algorithms are compared. The underlying control algorithm component is an optional component, including some commonly used control algorithm modules. This component is mainly used as a comparison benchmark or cooperates with the core algorithm component to construct a navigation control closed-loop system. On the one hand, the underlying control algorithm component can be directly used as a benchmark component for comparison with the reinforcement learning algorithm, and on the other hand, it can be combined with the upper-level reinforcement learning algorithm to build a closed-loop control reinforcement learning system from the state directly to the actuator instruction. The reinforcement learning output (x, y, psi..) upper layer instruction is used as the input of the controller, and the controller outputs the actuator instruction by calling the underlying control algorithm component for tracking. Such a layered architecture can effectively reduce the data latitude of reinforcement learning and improve efficiency.

实施例三、在实施例二的基础上，本实施例提供了一种基于学习组件库的辅助型移动机器人导航控制系统（如图3所示），所述核心组件库包括同策略模块、异策略模块以及综合策略模块，所述同策略模块用于封装同策略的强化学习算法，所述异策略用于封装异策略的强化学习算法；所述综合策略模块，用于封装综合策略算法，所述综合策略算法为综合同策略与异策略的数据驱动强化算法。Embodiment 3 On the basis of Embodiment 2, this embodiment provides an auxiliary mobile robot navigation control system based on a learning component library (as shown in FIG. 3 ). The core component library includes a same strategy module, a different A strategy module and an integrated strategy module, the same strategy module is used to encapsulate the reinforcement learning algorithm of the same strategy, and the different strategy is used to encapsulate the reinforcement learning algorithm of different strategies; the integrated strategy module is used to encapsulate the integrated strategy algorithm, so The comprehensive strategy algorithm described above is a data-driven reinforcement algorithm that integrates the same strategy and different strategies.

传统的导航控制方法，通过传感器读取的数据，先要进行特征提取，融合，进行状态估计，然后进行底层的执行器控制或者上层的任务控制。常用的如反馈线性化控制、线性二次型控制、模型预测控制、反步控制等，这些方法存在一些局限性限制了移动机器人在复杂场景下的应用。比如，对于运动模型的线性化难以精准描述复杂系统的动态；并且，一些非线性控制方法，依赖于精准的被控对象的数学物理模型，这往往需要大量的先验知识与专家经验，控制器设计过程繁琐费时。随着人工智能技术的飞速发展，深度强化学习在智能体控制领域有着广泛的应用，基于深度强化学习的控制，直接避免来繁琐的数据处理过程，根据传感器的观测值，通过策略网络，直接输出需要执行的动作。但是，传统深度强化学习算法在机器人控制方面仍存在如下问题：1. 同策略的主流强化学习算法对环境变化具有较强的适应能力，但是由于其依赖于大量的实时数据，需要耗费巨大的计算资源，收敛速度较慢；2. 异策略的强化学习算法虽然具有较好的计算效率，但是其由于其重复采样原始状态序列数据，对环境变化适应能力不强；3. 倾向于对特定任务过拟合，算法泛化能力弱。In the traditional navigation control method, the data read by the sensor must first perform feature extraction, fusion, and state estimation, and then perform the underlying actuator control or the upper-level mission control. Commonly used are feedback linear control, linear quadratic control, model predictive control, backstepping control, etc. These methods have some limitations that limit the application of mobile robots in complex scenarios. For example, the linearization of the motion model is difficult to accurately describe the dynamics of complex systems; and some nonlinear control methods rely on accurate mathematical and physical models of the controlled object, which often require a lot of prior knowledge and expert experience. The design process is cumbersome and time-consuming. With the rapid development of artificial intelligence technology, deep reinforcement learning has a wide range of applications in the field of intelligent body control. The control based on deep reinforcement learning directly avoids the tedious data processing process, and directly outputs the output according to the observation value of the sensor and through the policy network. Action to be performed. However, traditional deep reinforcement learning algorithms still have the following problems in robot control: 1. The mainstream reinforcement learning algorithms with the same strategy have strong adaptability to environmental changes, but because they rely on a large amount of real-time data, they require huge computational costs resources, the convergence speed is slow; 2. Although the reinforcement learning algorithm of different strategies has good computational efficiency, it has poor adaptability to environmental changes due to its repeated sampling of the original state sequence data; 3. It tends to overuse specific tasks. Fitting, the generalization ability of the algorithm is weak.

本实施例通过提供综合策略算法，实现一种综合同策略与异策略算法优势且能够提高强化学习控制算法的泛化能力。所述综合策略算法包括：通过及时将学习的新策略反馈给移动机器人系统，收集特定系统数据来优化强化学习算法的适应能力；同时考虑系统的原始特性，将重新收集的数据与以往回放的经验数据结合，再次学习最终确定强化学习算法。具体说明如下：By providing a comprehensive strategy algorithm, this embodiment realizes the advantages of a comprehensive same strategy and a different strategy algorithm and can improve the generalization ability of the reinforcement learning control algorithm. The comprehensive strategy algorithm includes: by timely feeding back the learned new strategy to the mobile robot system, and collecting specific system data to optimize the adaptive ability of the reinforcement learning algorithm; at the same time, considering the original characteristics of the system, the re-collected data is compared with the previous playback experience. The data is combined and learned again to finally determine the reinforcement learning algorithm. The specific instructions are as follows:

本实施例中综合策略算法架构图如图4所述。主流的强化学习算法都是基于同策略或者异策略，这两种方法都存在上述问题，本实施例提出的综合策略算法集合来以上两种的优势，具体说明如下：The architecture diagram of the comprehensive strategy algorithm in this embodiment is shown in FIG. 4 . The mainstream reinforcement learning algorithms are based on the same strategy or different strategies. Both of these two methods have the above problems. The comprehensive strategy algorithm proposed in this embodiment combines the advantages of the above two types, and the details are as follows:

比如最典型的异策略强化学习算法，Q-Learning，其动作状态Q值的更新过程如下：For example, the most typical heterogeneous strategy reinforcement learning algorithm, Q-Learning, the update process of its action state Q value is as follows:

其中，R(s)是奖励函数，(s’,a’)是最优状态动作对；该算法在计算下一状态的预期收益时一直使用最优的Q值，选择最优动作，但是当前的策略并不一定能选择到最优的动作，所以其并不关心采取的策略是什么。生成样本的策略与学习时的策略不同，又叫异策略机制。异策略的优势在于可以得到全局最优，通用行强，但是训练过程曲折，收敛速度慢。Among them, R(s) is the reward function, and (s', a') is the optimal state-action pair; the algorithm always uses the optimal Q value when calculating the expected return of the next state, and selects the optimal action, but the current The strategy does not necessarily choose the optimal action, so it does not care what the strategy is. The strategy of generating samples is different from the strategy of learning, which is also called different strategy mechanism. The advantage of different strategies is that the global optimum can be obtained, and the general behavior is strong, but the training process is tortuous and the convergence speed is slow.

再比如最典型的同策略强化学习算法，SARSA算法，其Q值的更新过程如下For another example, the most typical same-strategy reinforcement learning algorithm, the SARSA algorithm, the update process of its Q value is as follows

可以发现，在典型同策略强化学习算法，网络参数更新时使用的策略与生成样本的策略是相同的，该种策略算法比较直接，具有很快的计算速度，但是由于其只是利用了目前已知的最优选择，有可能学不到最优解，陷入局部最优。It can be found that in the typical same-strategy reinforcement learning algorithm, the strategy used when the network parameters are updated is the same as the strategy for generating samples. This strategy algorithm is relatively straightforward and has a fast calculation speed, but because it only uses the currently known The optimal choice of , may not learn the optimal solution and fall into a local optimum.

本实施例中综合策略模块所封装的综合策略算法，建立在上述两种策略算法的基础之上，其主要流程如下：The comprehensive strategy algorithm encapsulated by the comprehensive strategy module in this embodiment is established on the basis of the above two strategy algorithms, and its main process is as follows:

S41进行状态、动作等初始化操作；S42执行初始动作，与DQN等异策略算法类似，进行经验池填充；S41 performs initialization operations such as states and actions; S42 performs initial actions, which are similar to different strategy algorithms such as DQN, and perform experience pool filling;

S43强化学习算法主题部分，进行策略评估与策略优化，并通过S44判断是否达到收敛，若没有，此时，进行S45，综合策略算法的不同在除了正常执行动作获得奖励后进行经验池填充以外，会从上一状态序列中抽取特定的数据填充进经验池，组成新的采样数据，接着再次重复以上步骤，直到收敛，这样不仅减少了数据之间的相关程度，而且又利用了之前的有用的数据，结合了异策略与同策略的算法优势，因此提高了收敛性能。S43 strengthens the learning algorithm subject part, carries out strategy evaluation and strategy optimization, and judges whether the convergence is achieved through S44, if not, at this time, go to S45. The difference of the comprehensive strategy algorithm is that the experience pool is filled after the normal execution of the action to obtain the reward. It will extract specific data from the previous state sequence and fill it into the experience pool to form new sampled data, and then repeat the above steps again until convergence, which not only reduces the degree of correlation between the data, but also utilizes the previous useful information. data, which combines the algorithm advantages of different strategies and same strategies, thus improving the convergence performance.

通过本实施例提供的核心算法组件提供的综合策略模块，通过综合策略机制，相比于异策略算法，对环境的变换具有较强的适应能力；另一方面，相比于同策略算法，具有更好的计算效率与收敛表现。Through the comprehensive strategy module provided by the core algorithm component provided in this embodiment, through the comprehensive strategy mechanism, compared with the different strategy algorithm, it has stronger adaptability to the transformation of the environment; on the other hand, compared with the same strategy algorithm, it has Better computational efficiency and convergence performance.

本实施例综合策略模块集成了一种综合策略的数据驱动强化学习算法，使得本方法不仅可以应用于一般的辅助型移动机器人应用场景，也可以用于具有强非线性且环境变化的复杂场景。同时，通过测试组件，可以方便的对算法的稳定性、鲁棒性、泛化能力进行测试，而且，优化组件的联动，可以方便的进行算法调节于优化。The comprehensive strategy module of this embodiment integrates a data-driven reinforcement learning algorithm for a comprehensive strategy, so that the method can be applied not only to general auxiliary mobile robot application scenarios, but also to complex scenarios with strong nonlinearity and environmental changes. At the same time, by testing the components, the stability, robustness and generalization ability of the algorithm can be easily tested, and the linkage of the optimized components can be easily adjusted and optimized.

结合附图阅读本发明实施方式的详细描述后，本发明的其他特点和优势将变得更加清楚。Other features and advantages of the present invention will become more apparent upon reading the detailed description of the embodiments of the present invention in conjunction with the accompanying drawings.

图1可以看出，基于学习组件库的辅助型移动机器人导航控制系统总共有八大组件，S11为初始化组件，其中包括S111状态空间设计模块、S112行为空间设计模块、S113奖励函数设计模块；状态空间设计和行为空间设计是强化学习工作流中的第一步，根据任务需求，可以设计为离散空间或者连续空间；奖励函数设计需要结合具体的路径规划路线进行，其形式主要有末状态奖励、单步奖励、连续奖励、非线性奖励等。As can be seen from Figure 1, the assisted mobile robot navigation control system based on the learning component library has a total of eight components. S11 is the initialization component, including the S111 state space design module, the S112 behavior space design module, and the S113 reward function design module; state space Design and behavior space design are the first steps in the reinforcement learning workflow. According to task requirements, it can be designed as discrete space or continuous space; reward function design needs to be combined with specific path planning routes, and its forms mainly include end-state rewards, single Step reward, continuous reward, nonlinear reward, etc.

S12为环境建模组件，其中包括S121传感器数据处理模块、S122移动机器人定位模块、S123强化学习环境建模模块；其中传感器数据处理模块用于读取并处理移动机器人搭载的传感器数据，比如对于视觉传感器，可以通过降噪、去雾等处理算法增强观测信息；移动机器人定位模块用于实时定位机器人所处的全局位置；强化学习环境建模模块用于在进行仿真任务时，建立虚拟的与智能体交互的环境。S12 is the environment modeling component, including S121 sensor data processing module, S122 mobile robot positioning module, S123 reinforcement learning environment modeling module; the sensor data processing module is used to read and process the sensor data carried by the mobile robot, such as for vision The sensor can enhance the observation information through processing algorithms such as noise reduction and dehazing; the mobile robot positioning module is used to locate the global position of the robot in real time; the reinforcement learning environment modeling module is used to establish virtual and intelligent body interaction environment.

S13为路径规划组件，目的在于利用环境信息，根据目标任务需求，实时规划出一条最优的移动机器人运动路径；其主要包括了一些常用的路径规划算法模块，如S131的启发式路径规划模块、S132人工势场路径规划模块、S133机器学习路径规划模块等。S13 is the path planning component, which aims to use the environmental information to plan an optimal mobile robot motion path in real time according to the target task requirements; it mainly includes some commonly used path planning algorithm modules, such as the heuristic path planning module of S131, S132 Artificial Potential Field Path Planning Module, S133 Machine Learning Path Planning Module, etc.

S14为核心算法组件，主要有三大模块，S141同策略算法模块、S142异策略算法模块、S143综合策略算法模块; 该组件包括了各种强化学习算法的封装，并集成了一种结合同策略与异策略优势的数据驱动的综合策略强化学习算法用于处理非线性强，对环境变化具有较强适应能力的任务场景。S14 is the core algorithm component, and there are three main modules, S141 same-strategy algorithm module, S142 different-strategy algorithm module, S143 comprehensive strategy algorithm module; this component includes the packaging of various reinforcement learning algorithms, and integrates a combination of same-strategy and The data-driven comprehensive strategy reinforcement learning algorithm with different strategy advantages is used to deal with task scenarios with strong nonlinearity and strong adaptability to environmental changes.

S15为可视化组件，其中包括S151路径规划可视化模块、S152学习曲线可视化模块、S153导航控制误差可视化模块、S154执行器数值可视化模块。S15 is a visualization component, including S151 path planning visualization module, S152 learning curve visualization module, S153 navigation control error visualization module, and S154 actuator numerical visualization module.

S16为优化组件，主要用于优化算法的稳定性、鲁棒性，提高泛化能力；其中包括了如S161参数优化模块、S162正则化模块等。S16 is an optimization component, which is mainly used to optimize the stability and robustness of the algorithm and improve the generalization ability; it includes S161 parameter optimization module, S162 regularization module, etc.

S17为测试组件，该组件通过给环境增加扰动，测试算法性能表现，如S171动态障碍物模块、S172风力扰动模块、S173水流扰动模块等。S17 is a test component, which tests the performance of the algorithm by adding disturbance to the environment, such as the S171 dynamic obstacle module, the S172 wind disturbance module, and the S173 water flow disturbance module.

S18为底层控制算法组件，该组件是可以作为与强化学习算法对比的基准组件，可以直接用来实际的移动机器人控制，也可以用于与强化学习算法组件结合提升实际移动机器人的算法性能；主要有S181线性二次型优化控制模块、S182模型预测控制模块、S183反馈线性化控制模块。S18 is the underlying control algorithm component. This component is a benchmark component that can be compared with reinforcement learning algorithms. It can be used directly for actual mobile robot control, or it can be used in combination with reinforcement learning algorithm components to improve the algorithm performance of actual mobile robots; mainly There are S181 linear quadratic optimization control module, S182 model prediction control module, and S183 feedback linearization control module.

实施例四、一种基于学习组件库的辅助型移动机器人导航控制方法，本方法基于实施三提供的基于学习组件库的辅助型移动机器人导航控制系统，所述控制方法包括：建立用于辅助型移动机器人的导航控制学习组件库；根据不同的辅助型移动机器人驱动构型、传感器方案等特点，从初始化组件中选择对应不同移动机器人类型的状态空间与动作空间；依据真实的使用场景需求，构建仿真环境，并选择所需的路径规划算法，规划最优导航路径；根据实际任务特点，设立奖励函数；并从算法组件中选择同策略、异策略或者结合两者优势的综合策略算法中的一种作为学习算法，对算法超参数进行配置；根据使用场景扰动的情况，从优化组件中选择所需的正则化方法；查看训练效果，并且可以依据使用场景的需求自行加入扰动组件，来测试所选择算法的稳定性、鲁棒性与泛化能力；根据控制要求，可以使用优化组件来对主要组件参数进行调节，以提升导航控制算法表现。Embodiment 4. An auxiliary mobile robot navigation control method based on a learning component library. The method is based on the learning component library-based auxiliary mobile robot navigation control system provided in Implementation 3. The control method includes: establishing an auxiliary mobile robot. The navigation control learning component library of mobile robots; according to the characteristics of different auxiliary mobile robot drive configurations, sensor schemes, etc., select the state space and action space corresponding to different mobile robot types from the initialization components; according to the needs of real use scenarios, build Simulate the environment, and select the required path planning algorithm to plan the optimal navigation path; according to the characteristics of the actual task, set up the reward function; and select one of the same strategy, different strategy or comprehensive strategy algorithm combining the advantages of both from the algorithm components. As a learning algorithm, configure the hyperparameters of the algorithm; select the required regularization method from the optimization component according to the disturbance of the usage scenario; check the training effect, and add the disturbance component according to the needs of the usage scenario to test all the Select the stability, robustness and generalization ability of the algorithm; according to the control requirements, the optimization components can be used to adjust the parameters of the main components to improve the performance of the navigation control algorithm.

实施例五、一种基于学习组件库的辅助型移动机器人导航控制方法,基于实施例三提供的基于学习组件库的辅助型移动机器人导航控制系统，本实施例提供的方法直接基于学习组件库的核心算法组件，直接输出控制器指令。下面根据说明书附图2进行介绍。本实例将以一台配备视觉传感器、激光雷达传感器、定位传感器的消毒移动机器人在室内公共场合自主向目标位置移动并进行消毒为例进行说明。Embodiment 5. An auxiliary mobile robot navigation control method based on a learning component library, based on the auxiliary mobile robot navigation control system based on the learning component library provided in the third embodiment, and the method provided in this embodiment is directly based on the learning component library. The core algorithm component, which directly outputs controller instructions. The following will be introduced according to Figure 2 of the specification. This example will take a disinfection mobile robot equipped with vision sensors, lidar sensors, and positioning sensors to autonomously move to the target position and disinfect in indoor public places as an example.

如图2所示，本发明给出的直接通过学习组件库的核心算法组件，输出控制器指令，完成强化学习闭环控制的构建步骤：As shown in Figure 2, the present invention directly passes through the core algorithm component of the learning component library, outputs the controller instruction, and completes the construction steps of the reinforcement learning closed-loop control:

步骤S21，根据该消毒移动机器人底盘构型与驱动方式，结合运动模型，利用预先构建的环境建模组件初始化该机器人的状态空间与动作空间；Step S21, according to the chassis configuration and driving mode of the disinfection mobile robot, combined with the motion model, initialize the state space and action space of the robot by using a pre-built environment modeling component;

步骤S22，分为两种情况，一种是进行该消毒机器人的导航控制算法仿真研究，此时只需根据其运动模型建立强化学习环境，即可进入下一步；Step S22 is divided into two situations, one is to carry out the simulation research of the navigation control algorithm of the disinfection robot, at this time, only need to establish a reinforcement learning environment according to its motion model, and then the next step can be entered;

若应用于实际控制场景，则需要调用传感器数据处理模块与定位模块，获得环境信息与消毒移动机器人状态观测值，同时获得定位信号，更新状态信息；If it is applied to the actual control scene, the sensor data processing module and the positioning module need to be called to obtain the environmental information and the observation value of the disinfection mobile robot state, and at the same time obtain the positioning signal and update the status information;

步骤S23，依据环境信息，通过环境建模组件获得障碍物相对位置与消毒移动机器人自身位置，利用预先构建的路径规划组件调用路径规划组件获得最优路径；Step S23, according to the environment information, obtain the relative position of the obstacle and the position of the disinfection mobile robot itself through the environment modeling component, and use the pre-built path planning component to call the path planning component to obtain the optimal path;

步骤24-A根据路径规划结果，调节根据环境建模组件设立的导航控制算法的奖励函数；Step 24-A adjusts the reward function of the navigation control algorithm established according to the environment modeling component according to the path planning result;

步骤24-B，联合定义的动作空间、状态空间、奖励函数与强化学习环境，从预先构建的核心算法组选择核心算法模块选择确定强化学习算法，进行训练，通过底层控制模块或者直接输出控制器指令完成强化学习闭环控制；Step 24-B, jointly define the action space, state space, reward function and reinforcement learning environment, select the core algorithm module from the pre-built core algorithm group, select and determine the reinforcement learning algorithm, conduct training, and output the controller through the underlying control module or directly Instruction completion reinforcement learning closed-loop control;

步骤S25，为可选步骤，利用底层控制算法组件选择用于最终比较的主流控制算法基准；Step S25 is an optional step, using the underlying control algorithm component to select the mainstream control algorithm benchmark for final comparison;

步骤S26对S24进行测试，评估，比如在仿真环境中，可以从测试组件动态障碍模块，来测试消毒机器人遇到行人时的行为等；Step S26 tests and evaluates S24, for example, in a simulation environment, the behavior of the disinfection robot when it encounters a pedestrian can be tested from the test component dynamic obstacle module;

S27验证导航控制效果是否达到要求，否则可以通过优化组件对利用核心算法组件确定的强化学习算法进行算法优化，提高任务执行效果。S27 verifies whether the navigation control effect meets the requirements, otherwise, the optimization component can be used to optimize the reinforcement learning algorithm determined by the core algorithm component to improve the task execution effect.

在同一个任务中，反复执行S24-S28，直到消毒移动机器人在导航控制任务中获得理想的执行效果。利用可视化组件将核心算法组件以及测试组件的输出数值实现可视化，以实时监测学习训练过程。In the same task, S24-S28 are repeatedly executed until the disinfection mobile robot obtains the ideal execution effect in the navigation control task. Use the visualization component to visualize the output values of the core algorithm component and the test component to monitor the learning and training process in real time.

实施例六、一种基于学习组件库的辅助型移动机器人导航控制方法,基于实施例三提供的基于学习组件库的辅助型移动机器人导航控制系统，如图3所示，本发明给出学习组件库的核心算法与传统控制方法结合，搭建闭环控制学习组件系统的构建步骤：Embodiment 6. An auxiliary mobile robot navigation control method based on a learning component library, based on the auxiliary mobile robot navigation control system based on the learning component library provided in the third embodiment, as shown in FIG. 3, the present invention provides a learning component. The core algorithm of the library is combined with the traditional control method to build a closed-loop control learning component system construction steps:

步骤S31，利用预先构建的初始化组件初始化该机器人的状态空间与动作空间；Step S31, initialize the state space and action space of the robot by using a pre-built initialization component;

步骤S32，调用环境建模组件获得移动机器人搭载的传感器数据和移动机器人所处的全局位置数据（可选的，环境建模组件包括传感器数据处理模块与移动机器人定位模块，通过传感器数据处理模块获得移动机器人搭载的传感器数据，通过移动机器人定位模块获得移动机器人所处的全局位置数据）；Step S32, call the environment modeling component to obtain the sensor data carried by the mobile robot and the global position data where the mobile robot is located (optionally, the environment modeling component includes a sensor data processing module and a mobile robot positioning module, which are obtained through the sensor data processing module. The sensor data carried by the mobile robot, the global position data of the mobile robot is obtained through the mobile robot positioning module);

步骤S33，结合任务需求，利用环境建模组件设计奖励函数，通过核心算法组件确定核心算法，通过输入观测值，集中于学习移动机器人的运动策略，作为传统控制器的参考输入；Step S33, combining the task requirements, using the environment modeling component to design the reward function, determining the core algorithm through the core algorithm component, and focusing on learning the motion strategy of the mobile robot by inputting the observed value, as the reference input of the traditional controller;

步骤S34与步骤S35，加入传统控制器，构建导航控制闭环，联合底层控制算法组件或者直接使得输出控制器指令，执行指令后，再次通过环境建模模块获得当前移动机器人信息，重复以上步骤，完成强化学习闭环控制；Steps S34 and S35, adding a traditional controller, constructing a closed loop of navigation control, combining the underlying control algorithm component or directly making the controller command output, after executing the command, obtain the current mobile robot information through the environment modeling module again, repeat the above steps to complete Reinforcement learning closed-loop control;

同时利用测试组件进行算法评估与测试，并实时反馈传感器处理模块输出状态观测值；步骤S36判断是否达到控制要求；At the same time, the test component is used to evaluate and test the algorithm, and the sensor processing module output state observation value is fed back in real time; step S36 judges whether the control requirement is met;

步骤S37，在上一步的基础上，若需继续优化，则调用优化组件进行参数优化与正则化进行算法优化，提高任务执行效果。Step S37: On the basis of the previous step, if the optimization needs to be continued, the optimization component is called to perform parameter optimization and regularization to perform algorithm optimization, so as to improve the task execution effect.

同样，反复执行S33-S38，直到消毒移动机器人在导航控制任务中获得理想的执行效果。利用可视化组件将核心算法组件以及测试组件的输出数值实现可视化。Likewise, S33-S38 are repeatedly executed until the sterilizing mobile robot obtains the ideal execution effect in the navigation control task. Use the visualization component to visualize the output values of the core algorithm component and the test component.

联合底层控制算法组件的方法为：将强化学习输出（x,y,psi..)上层指令作为控制器的输入，控制器通过调用底层控制算法组件输出执行器指令进行跟踪。这样的分层架构，可以有效减小强化学习的数据纬度，提高效率。其中底层控制算法组件封装的算法可包括LQR、PID、MPC或Backstepping。The method of combining the underlying control algorithm components is to use the reinforcement learning output (x, y, psi..) upper layer instructions as the input of the controller, and the controller can track by calling the underlying control algorithm components to output the actuator instructions. Such a layered architecture can effectively reduce the data latitude of reinforcement learning and improve efficiency. The algorithm encapsulated by the underlying control algorithm component may include LQR, PID, MPC or Backstepping.

此外，需要额外说明的是，第二种构建方法相较于第一种方法，图2中底层控制组件只是用作比较性能的环节，图3中底层控制算法组件是作为整个学习闭环中的一环，用于输出底层控制指令，这样的好处是可以结合传统控制与强化学习的优势，降低了状态空间与动作空间的维度，使强化学习集中于学习复杂行为策略，结合传统主流控制算法的优势，提高了算法的稳定性与算法性能，但是另一方面，引入传统控制算法，增加了总体的算法复杂度。特别地，利用控制学习组件库中的组件，可以按照任务需求，构建除了上述两种以外的多种闭环学习控制系统。In addition, it should be noted that, compared with the first method, the underlying control component in Figure 2 is only used as a link for comparing performance, and the underlying control algorithm component in Figure 3 is used as a part of the entire learning closed loop. The loop is used to output the underlying control instructions. The advantage of this is that it can combine the advantages of traditional control and reinforcement learning, reduce the dimensions of state space and action space, and make reinforcement learning focus on learning complex behavior strategies. Combine the advantages of traditional mainstream control algorithms , which improves the stability and performance of the algorithm, but on the other hand, the introduction of traditional control algorithms increases the overall algorithm complexity. In particular, by using the components in the control learning component library, various closed-loop learning control systems other than the above two can be constructed according to the task requirements.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质（包括但不限于磁盘存储器、CD-ROM、光学存储器等）上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备（系统）、和计算机程序产品的流程图和／或方框图来描述的。应理解可由计算机程序指令实现流程图和／或方框图中的每一流程和／或方框、以及流程图和／或方框图中的流程和／或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowcharts and/or block diagrams, and combinations of flows and/or blocks in the flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in one or more of the flowcharts and/or one or more blocks of the block diagrams.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions An apparatus implements the functions specified in a flow or flows of the flowcharts and/or a block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in one or more of the flowcharts and/or one or more blocks of the block diagrams.

以上结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，这些均属于本发明的保护之内。The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are only illustrative rather than restrictive. Under the inspiration of the present invention, without departing from the scope of protection of the present invention and the claims, many forms can be made, which all belong to the protection of the present invention.

Claims

1. An auxiliary type mobile robot navigation control system based on a learning component library is characterized by comprising the learning component library, wherein the learning component library comprises: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component and a visualization component; the initialization component is used for completing initialization of a state space and an action space corresponding to a specific mobile robot type and setting up a reward function; the environment modeling component is used for reading and processing sensor data carried by the mobile robot, determining global position data of the positioning robot and establishing a virtual environment interacting with the mobile robot when a simulation task is performed; the path planning component is used for providing a selectable path planning algorithm to realize an optimal navigation path; the core algorithm component is used for providing a reinforcement learning algorithm with a plurality of strategies, so that the instruction of the output controller is used for finishing the reinforcement learning closed-loop control; the testing component is used for providing a disturbance method in a simulation environment for selection so as to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used for providing a selected optimization algorithm to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm; the visualization component is used for visualizing the output values of the core algorithm component and the test component during simulation or actual learning tasks.

2. The system of claim 1, wherein the core component library comprises a same strategy module, a different strategy module and a comprehensive strategy module, the same strategy module is used for encapsulating reinforcement learning algorithms of the same strategy, and the different strategy module is used for encapsulating reinforcement learning algorithms of the different strategies; the comprehensive strategy module is used for packaging a comprehensive strategy algorithm, and the comprehensive strategy algorithm is a data-driven strengthening algorithm for synthesizing the same strategy and the different strategies.

3. The system of claim 2, wherein the integrated strategy algorithm comprises: the new learning strategy is fed back to the mobile robot system, and specific system data is collected to optimize the adaptability of the reinforcement learning algorithm; and combining the newly collected data with the experience data played back in the past, and learning again to finally determine the reinforcement learning algorithm.

4. The system of claim 1, wherein the system further comprises: the bottom layer control algorithm component can be directly used for providing a reference component for comparison with the reinforcement learning algorithm, and can also be combined with the upper layer reinforcement learning algorithm to build a closed-loop control reinforcement learning system from a state directly to an actuator instruction.

5. The system of claim 1, wherein the environment modeling component comprises: the system comprises a sensor data processing module, a mobile robot positioning module and a reinforcement learning environment modeling module, wherein the sensor data processing module is used for reading and processing sensor data carried by a mobile robot, and the mobile robot positioning module is used for positioning global position data of the robot in real time; the reinforcement learning environment modeling module is used for establishing a virtual environment interacting with the mobile robot when a simulation task is carried out.

6. A learning component library-based auxiliary mobile robot navigation control system as claimed in claim 1 wherein the optimization component provides alternative optimization algorithms including regularization algorithms L1 and L2, entropy regularization algorithms and/or early stop algorithms.

7. The navigation control system of an auxiliary mobile robot based on a learning component library as claimed in claim 1, wherein the path planning component and the core algorithm component are respectively provided with a performance evaluation function module for providing a performance evaluation function to realize performance evaluation of parameter adjustment and algorithm selection of the path planning component and the core algorithm component.

8. An auxiliary mobile robot navigation control method based on a learning component library is characterized in that the method is based on an auxiliary mobile robot navigation control system of the learning component library; the system includes a learning component library, the learning component library including: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component, a visualization component and a bottom control algorithm component; the initialization component is used for completing initialization of a state space and an action space corresponding to a specific mobile robot type and setting up a reward function; the environment modeling component is used for reading and processing sensor data carried by the mobile robot, determining global position data of the positioning robot and establishing a virtual environment interacting with the mobile robot when a simulation task is performed; the path planning component is used for providing a selectable path planning algorithm to realize optimal navigation path planning; the core algorithm component is used for providing a plurality of reinforcement learning algorithms for selection, so that the instruction of the output controller is used for finishing reinforcement learning closed-loop control; the test component is used for providing a selected perturbation method for performance test under different working conditions so as to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used for providing a selected optimization algorithm to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm; the visualization component is used for visualizing the output numerical values of the core algorithm component and the test component; the underlying control algorithm component is used for providing a benchmark component which is compared with a reinforcement learning algorithm;

The method comprises the following steps:

selecting a state space and an action space corresponding to a specific mobile robot type from a pre-constructed initialization component, and setting a reinforcement learning reward function to complete initialization;

constructing a reinforcement learning simulation environment by utilizing a pre-constructed environment modeling component; acquiring the relative position of the barrier and the position of the mobile robot through an environment modeling component, and selecting a required path planning algorithm by utilizing a pre-constructed path planning component to plan an optimal navigation path; adjusting a reward function of a navigation control algorithm according to a path planning result;

selecting and determining a reinforcement learning algorithm from a pre-constructed core algorithm component, combining a defined action space, a state space, a reward function and a reinforcement learning simulation environment, and selecting a core algorithm module for training; the bottom layer control module or the controller instruction is directly output to act, then the relative position of the barrier and the position of the mobile robot are obtained through the environment modeling component again, and the steps are repeated to complete the reinforcement learning closed-loop control;

selecting a perturbation method from the test component, and testing the performance of the reinforcement learning algorithm selected and determined from the core algorithm component;

Selecting and determining an optimization algorithm from the optimization components to adjust selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm;

and the visualization component is used for realizing visualization of the output values of the core algorithm component and the test component so as to monitor the learning training process in real time.

9. A method for controlling navigation of an auxiliary mobile robot based on a learning component library, wherein the method is based on a system for controlling navigation of an auxiliary mobile robot based on a learning component library, the system comprises a learning component library, and the learning component library comprises: the system comprises an initialization component, an environment modeling component, a path planning component, a core algorithm component, a testing component, an optimization component, a visualization component and a bottom control algorithm component; the initialization component is used for completing initialization of a state space and an action space corresponding to a specific mobile robot type and setting up a reward function; the environment modeling component is used for reading and processing sensor data carried by the mobile robot, determining global position data of the positioning robot and establishing a virtual environment interacting with the mobile robot when a simulation task is performed; the path planning component is used for providing a selectable path planning algorithm to realize optimal navigation path planning; the core algorithm component is used for providing a plurality of reinforcement learning algorithms for selection, so that the instruction of the output controller is used for finishing reinforcement learning closed-loop control; the optimization component is configured to provide an optimization method to enable optimization of a reinforcement learning algorithm; the test component is used for providing a selected perturbation method for performance test under different working conditions so as to test the performance of the reinforcement learning algorithm determined by the core algorithm component; the optimization component is used for providing a selected optimization algorithm to adjust the selected parameters of the reinforcement learning algorithm determined by the core algorithm component so as to improve the performance of the navigation control algorithm; the visualization component is used for visualizing the output numerical values of the core algorithm component and the test component; the underlying control algorithm component is used to provide a baseline component as a comparison to a reinforcement learning algorithm

The method comprises the following steps:

calling an environment modeling component to obtain sensor data carried by the mobile robot and global position data where the mobile robot is located;

combining the defined action space, state space, reward function, sensor data carried by the mobile robot and global position data of the mobile robot, selecting and determining a reinforcement learning algorithm from a pre-constructed core algorithm component, combining a bottom control algorithm component or directly outputting a controller instruction, obtaining current mobile robot information through an environment modeling module again after executing the instruction, and repeating the steps to complete reinforcement learning closed-loop control;

selecting a disturbance method from the test assembly, performing algorithm evaluation and test by using the test assembly, feeding back an output state observation value of the sensor processing module in real time, and judging whether the control requirement is met;

selecting and determining an optimization algorithm from the optimization component, and adjusting the selected parameters of the reinforcement learning algorithm determined by the core algorithm component until the mobile robot obtains a preset execution effect in the navigation control task;

And visualizing the output numerical values of the core algorithm component and the test component by utilizing the visualization component.

10. The navigation control method for the auxiliary mobile robot based on the learning component library of claim 9, wherein performance evaluation function modules are respectively arranged in the path planning component and the core algorithm component, the method further comprises the steps of determining a performance evaluation function by using the performance evaluation function modules, performing performance evaluation on parameter adjustment and algorithm selection of the path planning component and the core algorithm component, and visualizing the evaluation result of the performance evaluation function by using the visualization component.