TWI876514B

TWI876514B - Curriculum reinforcement learning unmanned vehicle autonomous navigation method and system and heterogeneous robot

Info

Publication number: TWI876514B
Application number: TW112133649A
Authority: TW
Inventors: 王學誠; 黃柏叡; 鄧奕辰; 柯于婷
Original assignee: 國立陽明交通大學
Priority date: 2023-09-05
Filing date: 2023-09-05
Publication date: 2025-03-11
Also published as: TW202511881A

Abstract

The invention discloses a curriculum reinforcement learning unmanned vehicle autonomous navigation method applied to a heterogeneous robot. The method includes: (a) performing unmanned vehicle autonomous navigation training for the heterogeneous robot through the curriculum reinforcement learning method, so that the heterogeneous robot can learn the function of real-time obstacle avoidance; and (b) controlling the heterogeneous robot that have completed the unmanned vehicle autonomous navigation training and learned the function of real-time obstacle avoidance to perform unmanned vehicle autonomous navigation tasks in virtual environment and real environment respectively.

Description

The course strengthens the learning of unmanned vehicle autonomous navigation methods and systems and heterogeneous robots

本發明係與自主導航有關，特別是關於一種課程強化學習無人載具自主導航方法與系統及異質機器人。The present invention relates to autonomous navigation, and more particularly to a curriculum-enhanced learning method and system for autonomous navigation of unmanned vehicles and heterogeneous robots.

一般而言，機器人導航通常需要具備避開障礙物並規劃前往目標路徑的能力。然而，傳統的機器人導航方法在處理不同感測器模式或車輛動力學的異質機器人時的效能較為有限。Generally speaking, robot navigation usually requires the ability to avoid obstacles and plan a path to a target. However, traditional robot navigation methods have limited performance when dealing with heterogeneous robots with different sensor modes or vehicle dynamics.

為了解決此一問題，深度強化學習已被廣泛應用。然而，一旦處理的是較為複雜的探索空間時，透過深度強化學習模型進行訓練的收斂速度可能會非常慢，有待進一步改善。To solve this problem, deep reinforcement learning has been widely used. However, once dealing with a more complex exploration space, the convergence speed of deep reinforcement learning models may be very slow, which needs further improvement.

有鑒於此，本發明提出一種課程強化學習無人載具自主導航方法與系統及異質機器人，藉以有效解決先前技術所遭遇到之上述問題。In view of this, the present invention proposes a curriculum-enhanced learning method and system for autonomous navigation of unmanned vehicles and heterogeneous robots to effectively solve the above-mentioned problems encountered by the prior art.

根據本發明之一較佳具體實施例為一種課程強化學習無人載具自主導航方法。於此實施例中，課程強化學習無人載具自主導航方法應用於異質機器人，其包括下列步驟：(a)透過課程強化學習方法對異質機器人進行無人載具自主導航訓練，以使異質機器人學會即時避障功能；以及(b)控制已完成無人載具自主導航訓練而學會即時避障功能的異質機器人分別執行虛擬環境及真實環境下的無人載具自主導航任務。According to a preferred embodiment of the present invention, a method for learning autonomous navigation of unmanned vehicles through curriculum reinforcement learning is applied to a heterogeneous robot, which includes the following steps: (a) training the heterogeneous robot for autonomous navigation of unmanned vehicles through curriculum reinforcement learning so that the heterogeneous robot can learn the real-time obstacle avoidance function; and (b) controlling the heterogeneous robot that has completed the autonomous navigation training of unmanned vehicles and learned the real-time obstacle avoidance function to perform autonomous navigation tasks of unmanned vehicles in virtual environments and real environments respectively.

於一實施例中，於步驟(b)中，異質機器人所執行的無人載具自主導航任務包括：從陸地無人載具自主導航轉移至水上無人載具自主導航、提升在水上無人載具自主導航時的避障性及抗風浪性。In one embodiment, in step (b), the unmanned vehicle autonomous navigation mission performed by the heterogeneous robot includes: transferring from the autonomous navigation of the land unmanned vehicle to the autonomous navigation of the water unmanned vehicle, and improving the obstacle avoidance and wind and wave resistance of the water unmanned vehicle autonomous navigation.

於一實施例中，異質機器人具有不同的感測器模態(Sensor modality)及載具動力學(Vehicle dynamics)。In one embodiment, heterogeneous robots have different sensor modalities and vehicle dynamics.

於一實施例中，步驟(a)中的課程強化學習方法係透過分層整合的動力學與感知模型(Hierarchically integrated dynamic and perception model)及樣本有效的深度強化學習模型(sample-efficient DRL model)使用不同視野訓練出的強化學習策略來解決異質機器人在密集障礙環境中的複雜自主控制問題。In one embodiment, the curriculum reinforcement learning method in step (a) uses a hierarchically integrated dynamic and perception model and a sample-efficient deep reinforcement learning model to solve the complex autonomous control problem of a heterogeneous robot in a dense obstacle environment using a reinforcement learning strategy trained with different fields of view.

於一實施例中，於步驟(a)中，課程強化學習方法係隨著異質機器人的學習進程分多階段(Multi-stage)逐步增加無人載具自主導航訓練的訓練難度。In one embodiment, in step (a), the curriculum reinforcement learning method gradually increases the difficulty of the unmanned vehicle autonomous navigation training in multiple stages as the heterogeneous robot learns.

於一實施例中，訓練難度係從避障及目標導航的基本導航任務逐步增加至感測器模態及載具動力學的進階導航任務。In one embodiment, the training difficulty increases gradually from basic navigation tasks such as obstacle avoidance and target navigation to advanced navigation tasks such as sensor modalities and vehicle dynamics.

於一實施例中，於步驟(b)中，虛擬環境包括陸地環境及水上環境。In one embodiment, in step (b), the virtual environment includes a land environment and an aquatic environment.

於一實施例中，於步驟(b)中，真實環境包括陸地環境及水上環境。In one embodiment, in step (b), the real environment includes a land environment and an aquatic environment.

根據本發明之另一較佳具體實施例為一種課程強化學習無人載具自主導航系統。於此實施例中，課程強化學習無人載具自主導航系統包括：異質機器人；訓練裝置，用以與異質機器人連線並透過課程強化學習方法對異質機器人進行無人載具自主導航訓練，以使異質機器人學會即時避障功能；以及控制裝置，用以與完成無人載具自主導航訓練而學會即時避障功能的異質機器人連線並控制異質機器人分別執行虛擬環境及真實環境下的無人載具自主導航任務。Another preferred embodiment of the present invention is a curriculum-enhanced learning unmanned vehicle autonomous navigation system. In this embodiment, the curriculum-enhanced learning unmanned vehicle autonomous navigation system includes: a heterogeneous robot; a training device for connecting with the heterogeneous robot and performing unmanned vehicle autonomous navigation training on the heterogeneous robot through a curriculum-enhanced learning method, so that the heterogeneous robot learns the real-time obstacle avoidance function; and a control device for connecting with the heterogeneous robot that has completed the unmanned vehicle autonomous navigation training and learned the real-time obstacle avoidance function and controlling the heterogeneous robot to perform unmanned vehicle autonomous navigation tasks in a virtual environment and a real environment respectively.

於一實施例中，控制裝置控制異質機器人所執行的無人載具自主導航任務包括：從陸地無人載具自主導航轉移至水上無人載具自主導航、提升在水上無人載具自主導航時的避障性及抗風浪性。In one embodiment, the control device controls the heterogeneous robot to perform unmanned vehicle autonomous navigation tasks including: transferring from the autonomous navigation of the land unmanned vehicle to the autonomous navigation of the water unmanned vehicle, and improving the obstacle avoidance and wind and wave resistance of the water unmanned vehicle during autonomous navigation.

於一實施例中，異質機器人具有不同的感測器模態及載具動力學。In one embodiment, heterogeneous robots have different sensor modalities and vehicle dynamics.

於一實施例中，訓練裝置所採用的課程強化學習方法係透過分層整合的動力學與感知模型及樣本有效的深度強化學習模型使用不同視野訓練出的強化學習策略來解決異質機器人在密集障礙環境中的複雜自主控制問題。In one embodiment, the curriculum reinforcement learning method adopted by the training device is to solve the complex autonomous control problem of heterogeneous robots in a dense obstacle environment by using a reinforcement learning strategy trained with different fields of view through a hierarchical integrated dynamics and perception model and a sample-effective deep reinforcement learning model.

於一實施例中，訓練裝置所採用的課程強化學習方法係隨著異質機器人的學習進程分多階段逐步增加無人載具自主導航訓練的訓練難度。In one embodiment, the curriculum reinforcement learning method adopted by the training device is to gradually increase the training difficulty of the unmanned vehicle autonomous navigation training in multiple stages along with the learning progress of the heterogeneous robot.

於一實施例中，虛擬環境包括陸地環境及水上環境。In one embodiment, the virtual environment includes a land environment and an aquatic environment.

於一實施例中，真實環境包括陸地環境及水上環境。In one embodiment, the real environment includes a land environment and an aquatic environment.

根據本發明之另一較佳具體實施例為一種異質機器人。於此實施例中，異質機器人應用於課程強化學習無人載具自主導航系統。課程強化學習無人載具自主導航系統還包括訓練裝置及控制裝置。異質機器人包括：連線模組，用以分別與訓練裝置及控制裝置連線，以分別接收來自訓練裝置的訓練指令及來自控制裝置的控制指令；訓練模組，耦接連線模組，用以根據訓練指令而依照課程強化學習方法進行無人載具自主導航訓練，以使異質機器人學會即時避障功能並發出訓練完成通知；以及控制模組，耦接連線模組及訓練模組，當控制模組接收到訓練完成通知時，控制模組根據控制指令控制異質機器人分別執行虛擬環境及真實環境下的無人載具自主導航任務。Another preferred embodiment of the present invention is a heterogeneous robot. In this embodiment, the heterogeneous robot is applied to a curriculum-enhanced learning unmanned vehicle autonomous navigation system. The curriculum-enhanced learning unmanned vehicle autonomous navigation system also includes a training device and a control device. The heterogeneous robot includes: a connection module, which is used to connect with a training device and a control device respectively to receive training instructions from the training device and control instructions from the control device respectively; a training module, which is coupled to the connection module, and is used to perform unmanned vehicle autonomous navigation training according to the training instructions and the curriculum reinforcement learning method, so that the heterogeneous robot can learn the real-time obstacle avoidance function and issue a training completion notification; and a control module, which is coupled to the connection module and the training module. When the control module receives the training completion notification, the control module controls the heterogeneous robot according to the control instructions to perform the unmanned vehicle autonomous navigation tasks in the virtual environment and the real environment respectively.

於一實施例中，控制模組控制異質機器人所執行的無人載具自主導航任務包括：從陸地無人載具自主導航轉移至水上無人載具自主導航、提升在水上無人載具自主導航時的避障性及抗風浪性。In one embodiment, the control module controls the heterogeneous robot to perform unmanned vehicle autonomous navigation tasks including: transferring from the autonomous navigation of the land unmanned vehicle to the autonomous navigation of the water unmanned vehicle, and improving the obstacle avoidance and wind and wave resistance of the water unmanned vehicle during autonomous navigation.

於一實施例中，課程強化學習方法係透過分層整合的動力學與感知模型及樣本有效的深度強化學習模型使用不同視野訓練出的強化學習策略來解決異質機器人在密集障礙環境中的複雜自主控制問題。In one embodiment, a curriculum reinforcement learning method uses a layered integrated dynamics and perception model and a sample-efficient deep reinforcement learning model to solve the complex autonomous control problem of a heterogeneous robot in a dense obstacle environment using a reinforcement learning strategy trained with different fields of view.

於一實施例中，課程強化學習方法係隨著異質機器人的學習進程分多階段逐步增加無人載具自主導航訓練的訓練難度。In one embodiment, the curriculum reinforcement learning method gradually increases the difficulty of the unmanned vehicle autonomous navigation training in multiple stages as the heterogeneous robot learns.

根據本發明之第一較佳具體實施例為一種課程強化學習無人載具自主導航方法。於此實施例中，課程強化學習無人載具自主導航方法係應用於異質機器人，並且異質機器人具有不同的感測器模態(Sensor modality)及載具動力學(Vehicle dynamics)，但不以此為限。The first preferred embodiment of the present invention is a method for learning autonomous navigation of unmanned vehicles through curriculum enhancement. In this embodiment, the method for learning autonomous navigation of unmanned vehicles through curriculum enhancement is applied to heterogeneous robots, and the heterogeneous robots have different sensor modalities and vehicle dynamics, but the invention is not limited thereto.

須注意的是，本發明所揭示之應用於異質機器人的課程強化學習無人載具自主導航方法係採用課程強化學習方法以及在水上自主導航時的抗風浪性等相關設計，藉以有效提升異質機器人在地面及水面上進行無人載具自主目標導航訓練的效果與速度，以避開位於地面及水面上的障礙物。It should be noted that the curriculum-enhanced learning method for autonomous navigation of unmanned vehicles for heterogeneous robots disclosed in the present invention adopts curriculum-enhanced learning methods and related designs such as wind and wave resistance during autonomous navigation on water, so as to effectively improve the effect and speed of autonomous target navigation training of unmanned vehicles for heterogeneous robots on the ground and on the water surface, so as to avoid obstacles on the ground and on the water surface.

請參照圖1，圖1繪示此實施例中的課程強化學習無人載具自主導航方法的流程圖。如圖1所示，課程強化學習無人載具自主導航方法包括下列步驟：Please refer to FIG. 1, which shows a flow chart of the course-enhanced learning method for autonomous navigation of unmanned vehicles in this embodiment. As shown in FIG. 1, the course-enhanced learning method for autonomous navigation of unmanned vehicles includes the following steps:

步驟100：透過課程強化學習方法對異質機器人進行無人載具自主導航訓練；Step 100: Conduct unmanned vehicle autonomous navigation training for heterogeneous robots through curriculum reinforcement learning methods;

步驟200：判斷異質機器人是否已完成無人載具自主導航訓練而學會即時避障功能；Step 200: Determine whether the heterogeneous robot has completed unmanned vehicle autonomous navigation training and learned the real-time obstacle avoidance function;

步驟300：若步驟200的判斷結果為是，控制已完成無人載具自主導航訓練而學會即時避障功能的異質機器人分別執行虛擬環境及真實環境下的無人載具自主導航任務；以及Step 300: If the result of the determination in step 200 is yes, the heterogeneous robot that has completed the unmanned vehicle autonomous navigation training and learned the real-time obstacle avoidance function is controlled to perform the unmanned vehicle autonomous navigation task in the virtual environment and the real environment respectively; and

若步驟200的判斷結果為否，重新執行步驟100。If the determination result of step 200 is no, step 100 is executed again.

於實際應用中，步驟100中的課程強化學習方法可透過分層整合的動力學與感知模型(Hierarchically integrated dynamic and perception model)及樣本有效的深度強化學習模型(sample-efficient DRL model)使用不同視野訓練出的強化學習策略來解決異質機器人在密集障礙環境中的複雜自主控制問題，但不以此為限。In practical applications, the curriculum reinforcement learning method in step 100 can use a hierarchically integrated dynamic and perception model and a sample-efficient deep reinforcement learning model to use reinforcement learning strategies trained with different fields of view to solve complex autonomous control problems of heterogeneous robots in dense obstacle environments, but is not limited thereto.

需說明的是，於步驟100中，課程強化學習方法可隨著異質機器人的學習進程分多階段(Multi-stage)逐步增加無人載具自主導航訓練的訓練難度。舉例而言，無人載具自主導航訓練的訓練難度可以從避障及目標導航的基本導航任務逐步增加至感測器模態及載具動力學的進階導航任務，但不以此為限。It should be noted that in step 100, the curriculum reinforcement learning method can gradually increase the difficulty of the unmanned vehicle autonomous navigation training in multiple stages as the heterogeneous robot learns. For example, the difficulty of the unmanned vehicle autonomous navigation training can be gradually increased from the basic navigation tasks of obstacle avoidance and target navigation to the advanced navigation tasks of sensor mode and vehicle dynamics, but is not limited thereto.

於實際應用中，於步驟200中，虛擬環境可包括陸地環境及水上環境且真實環境可包括陸地環境及水上環境，但不以此為限。異質機器人所執行的無人載具自主導航任務可包括：從陸地無人載具自主導航轉移至水上無人載具自主導航、提升在水上無人載具自主導航時的避障性及抗風浪性，但不以此為限。In actual applications, in step 200, the virtual environment may include a land environment and a water environment and the real environment may include a land environment and a water environment, but not limited thereto. The unmanned vehicle autonomous navigation tasks performed by the heterogeneous robot may include: transferring from the autonomous navigation of the land unmanned vehicle to the autonomous navigation of the water unmanned vehicle, and improving the obstacle avoidance and wind and wave resistance of the unmanned vehicle autonomous navigation on the water, but not limited thereto.

根據本發明之第二較佳具體實施例為一種課程強化學習無人載具自主導航系統。於此實施例中，課程強化學習無人載具自主導航系統係應用於異質機器人，並且異質機器人具有不同的感測器模態及載具動力學，但不以此為限。The second preferred embodiment of the present invention is a curriculum-enhanced learning unmanned vehicle autonomous navigation system. In this embodiment, the curriculum-enhanced learning unmanned vehicle autonomous navigation system is applied to heterogeneous robots, and the heterogeneous robots have different sensor modes and vehicle dynamics, but are not limited thereto.

須注意的是，本發明所揭示之應用於異質機器人的課程強化學習無人載具自主導航系統係採用課程強化學習方法以及在水上自主導航時的抗風浪性等相關設計，藉以控制異質機器人在地面及水面上進行無人載具自主目標導航且能有效地避開位於地面及水面上的障礙物。It should be noted that the curriculum-enhanced learning unmanned vehicle autonomous navigation system for heterogeneous robots disclosed in the present invention adopts curriculum-enhanced learning methods and related designs such as wind and wave resistance during autonomous navigation on water, so as to control the heterogeneous robots to perform unmanned vehicle autonomous target navigation on the ground and on the water surface and effectively avoid obstacles on the ground and on the water surface.

請參照圖2，圖2繪示此實施例中的課程強化學習無人載具自主導航系統的示意圖。如圖2所示，課程強化學習無人載具自主導航系統CRS包括異質機器人HR、訓練裝置TD及控制裝置CD。當訓練裝置TD與異質機器人HR連線時，訓練裝置TD會傳送訓練指令TC給異質機器人HR並透過課程強化學習方法對異質機器人HR進行無人載具自主導航訓練，以使異質機器人HR學會即時避障功能。當異質機器人HR已完成無人載具自主導航訓練而學會即時避障功能時，控制裝置CD與異質機器人HR連線並根據控制指令CC控制異質機器人HR分別執行虛擬環境及真實環境下的無人載具自主導航任務。Please refer to FIG. 2 , which shows a schematic diagram of the curriculum-enhanced learning unmanned vehicle autonomous navigation system in this embodiment. As shown in FIG. 2 , the curriculum-enhanced learning unmanned vehicle autonomous navigation system CRS includes a heterogeneous robot HR, a training device TD, and a control device CD. When the training device TD is connected to the heterogeneous robot HR, the training device TD transmits a training instruction TC to the heterogeneous robot HR and performs unmanned vehicle autonomous navigation training on the heterogeneous robot HR through the curriculum-enhanced learning method, so that the heterogeneous robot HR learns the real-time obstacle avoidance function. When the heterogeneous robot HR has completed the unmanned vehicle autonomous navigation training and learned the real-time obstacle avoidance function, the control device CD is connected to the heterogeneous robot HR and controls the heterogeneous robot HR to perform the unmanned vehicle autonomous navigation tasks in the virtual environment and the real environment respectively according to the control command CC.

於實際應用中，訓練裝置TD所採用的課程強化學習方法係透過分層整合的動力學與感知模型及樣本有效的深度強化學習模型使用不同視野訓練出的強化學習策略來解決異質機器人HR在密集障礙環境中的複雜自主控制問題，但不以此為限。In practical applications, the curriculum reinforcement learning method adopted by the training device TD is to solve the complex autonomous control problem of the heterogeneous robot HR in a dense obstacle environment by using a layered integrated dynamics and perception model and a sample-effective deep reinforcement learning model using reinforcement learning strategies trained with different fields of view, but not limited to this.

需說明的是，訓練裝置TD所採用的課程強化學習方法可隨著異質機器人HR的學習進程分多階段逐步增加無人載具自主導航訓練的訓練難度。舉例而言，無人載具自主導航訓練的訓練難度可以從避障及目標導航的基本導航任務逐步增加至感測器模態及載具動力學的進階導航任務，但不以此為限。It should be noted that the curriculum reinforcement learning method adopted by the training device TD can gradually increase the training difficulty of the unmanned vehicle autonomous navigation training in multiple stages as the learning progress of the heterogeneous robot HR. For example, the training difficulty of the unmanned vehicle autonomous navigation training can be gradually increased from the basic navigation tasks of obstacle avoidance and target navigation to the advanced navigation tasks of sensor mode and vehicle dynamics, but not limited to this.

於實際應用中，虛擬環境可包括陸地環境及水上環境且真實環境可包括陸地環境及水上環境，但不以此為限。控制裝置CD控制異質機器人HR所執行的無人載具自主導航任務可包括：從陸地無人載具自主導航轉移至水上無人載具自主導航、提升在水上無人載具自主導航時的避障性及抗風浪性，但不以此為限。In practical applications, the virtual environment may include a land environment and a water environment and the real environment may include a land environment and a water environment, but not limited thereto. The autonomous navigation tasks of the unmanned vehicle performed by the control device CD controlling the heterogeneous robot HR may include: transferring from the autonomous navigation of the land unmanned vehicle to the autonomous navigation of the water unmanned vehicle, and improving the obstacle avoidance and wind and wave resistance of the autonomous navigation of the water unmanned vehicle, but not limited thereto.

根據本發明之第三較佳具體實施例為一種異質機器人。於此實施例中，異質機器人係應用於課程強化學習無人載具自主導航系統且異質機器人具有不同的感測器模態及載具動力學，但不以此為限。A third preferred embodiment of the present invention is a heterogeneous robot. In this embodiment, the heterogeneous robot is applied to curriculum reinforcement learning of unmanned vehicle autonomous navigation system and the heterogeneous robot has different sensor modes and vehicle dynamics, but is not limited thereto.

請參照圖3，圖3繪示此實施例中的異質機器人應用於課程強化學習無人載具自主導航系統的示意圖。如圖3所示，課程強化學習無人載具自主導航系統CRS還包括訓練裝置TD及控制裝置CD。應用於課程強化學習無人載具自主導航系統CRS的異質機器人HR包括連線模組CNM、訓練模組TM及控制模組CTM。Please refer to FIG. 3, which shows a schematic diagram of the heterogeneous robot in this embodiment applied to the curriculum-enhanced learning unmanned vehicle autonomous navigation system. As shown in FIG. 3, the curriculum-enhanced learning unmanned vehicle autonomous navigation system CRS also includes a training device TD and a control device CD. The heterogeneous robot HR applied to the curriculum-enhanced learning unmanned vehicle autonomous navigation system CRS includes a connection module CNM, a training module TM and a control module CTM.

當連線模組CNM分別與訓練裝置TD及控制裝置CD連線時，連線模組CNM分別接收來自訓練裝置TD的訓練指令TC及來自控制裝置CD的控制指令CC。訓練模組TD耦接連線模組CNM，用以根據訓練指令TC而依照課程強化學習方法對異質機器人HR進行無人載具自主導航訓練，以使異質機器人HR學會即時避障功能並發出訓練完成通知FN。When the connection module CNM is connected to the training device TD and the control device CD respectively, the connection module CNM receives the training command TC from the training device TD and the control command CC from the control device CD respectively. The training module TD is coupled to the connection module CNM to perform unmanned vehicle autonomous navigation training on the heterogeneous robot HR according to the training command TC and the curriculum reinforcement learning method, so that the heterogeneous robot HR learns the real-time obstacle avoidance function and issues a training completion notification FN.

控制模組CTM耦接連線模組CNM及訓練模組TM。當控制模組CD接收到訓練模組TD發出的訓練完成通知FN時，控制模組CTM根據控制指令CC控制異質機器人HR分別執行虛擬環境及真實環境下的無人載具自主導航任務，但不以此為限。The control module CTM is coupled to the connection module CNM and the training module TM. When the control module CD receives the training completion notification FN sent by the training module TD, the control module CTM controls the heterogeneous robot HR to perform the unmanned vehicle autonomous navigation tasks in the virtual environment and the real environment respectively according to the control instruction CC, but not limited to this.

於實際應用中，訓練模組TD所採用的課程強化學習方法可透過分層整合的動力學與感知模型及樣本有效的深度強化學習模型使用不同視野訓練出的強化學習策略來解決異質機器人HR在密集障礙環境中的複雜自主控制問題，但不以此為限。In practical applications, the curriculum reinforcement learning method adopted by the training module TD can solve the complex autonomous control problem of heterogeneous robots HR in dense obstacle environments by using reinforcement learning strategies trained with different fields of view through a hierarchical integrated dynamics and perception model and a sample-effective deep reinforcement learning model, but is not limited to this.

需說明的是，課程強化學習方法可隨著異質機器人HR的學習進程分多階段逐步增加無人載具自主導航訓練的訓練難度。舉例而言，訓練難度可以從避障及目標導航的基本導航任務逐步增加至感測器模態及載具動力學的進階導航任務，但不以此為限。It should be noted that the curriculum reinforcement learning method can gradually increase the difficulty of the unmanned vehicle autonomous navigation training in multiple stages as the learning progress of the heterogeneous robot HR. For example, the training difficulty can be gradually increased from the basic navigation tasks of obstacle avoidance and target navigation to the advanced navigation tasks of sensor modality and vehicle dynamics, but not limited to this.

於實際應用中，虛擬環境可包括陸地環境及水上環境且真實環境可包括陸地環境及水上環境，但不以此為限。控制模組CTM控制異質機器人HR所執行的無人載具自主導航任務可包括：從陸地無人載具自主導航轉移至水上無人載具自主導航、提升在水上無人載具自主導航時的避障性及抗風浪性，但不以此為限。In actual applications, the virtual environment may include a land environment and a water environment and the real environment may include a land environment and a water environment, but not limited thereto. The control module CTM controls the unmanned vehicle autonomous navigation tasks performed by the heterogeneous robot HR to include: transferring from the autonomous navigation of the land unmanned vehicle to the autonomous navigation of the water unmanned vehicle, and improving the obstacle avoidance and wind and wave resistance of the unmanned vehicle autonomous navigation on the water, but not limited thereto.

相較於先前技術，本發明所揭示之應用於異質機器人的課程強化學習無人載具自主導航方法及系統係採用課程強化學習方法以及在水上自主導航時的抗風浪性等相關設計，藉以有效提升異質機器人在不同環境(例如地面或水面上)進行無人載具自主目標導航訓練的效果與速度，故能有效避開位於地面及水面上的障礙物。Compared to the prior art, the curriculum-enhanced learning unmanned vehicle autonomous navigation method and system disclosed in the present invention for heterogeneous robots adopts curriculum-enhanced learning methods and related designs such as wind and wave resistance during autonomous navigation on water, so as to effectively improve the effect and speed of unmanned vehicle autonomous target navigation training for heterogeneous robots in different environments (such as on the ground or on the water), so as to effectively avoid obstacles on the ground and on the water.

100:步驟 200:步驟 300:步驟 CRS:課程強化學習無人載具自主導航系統 TD:訓練裝置 CD:控制裝置 HR:異質機器人 TC:訓練指令 CC:控制指令 CNM:連線模組 CTM:控制模組 TM:訓練模組 FN:訓練完成通知 100: Steps 200: Steps 300: Steps CRS: Curriculum Enhanced Learning Unmanned Vehicle Autonomous Navigation System TD: Training Device CD: Control Device HR: Heterogeneous Robot TC: Training Command CC: Control Command CNM: Connection Module CTM: Control Module TM: Training Module FN: Training Completion Notice

圖1繪示本發明之第一較佳具體實施例的課程強化學習無人載具自主導航方法的流程圖。FIG. 1 is a flow chart showing a method for curriculum-enhanced learning of autonomous navigation of unmanned vehicles according to a first preferred embodiment of the present invention.

圖2繪示本發明之第二較佳具體實施例的課程強化學習無人載具自主導航系統的示意圖。FIG. 2 is a schematic diagram of a curriculum-enhanced learning unmanned vehicle autonomous navigation system according to a second preferred embodiment of the present invention.

圖3繪示本發明之第三較佳具體實施例的異質機器人的示意圖。FIG. 3 is a schematic diagram of a heterogeneous robot according to a third preferred embodiment of the present invention.

100:步驟 100: Steps

200:步驟 200: Steps

300:步驟 300: Steps

Claims

A curriculum-enhanced learning method for autonomous navigation of unmanned vehicles is applied to a heterogeneous robot, comprising the following steps: (a) training the heterogeneous robot for autonomous navigation of unmanned vehicles through a curriculum-enhanced learning method, so that the heterogeneous robot learns a real-time obstacle avoidance function; and (b) controlling the heterogeneous robot that has completed the autonomous navigation training of the unmanned vehicle and learned the real-time obstacle avoidance function to execute a virtual environment respectively. and autonomous navigation tasks of unmanned vehicles in real environments; wherein the heterogeneous robots have different sensor modalities and vehicle dynamics, and the curriculum reinforcement learning method in step (a) uses a layered integrated dynamics and perception model and a sample-effective deep reinforcement learning model to use a reinforcement learning strategy trained with different fields of view to solve the complex autonomous control problem of the heterogeneous robot in a dense obstacle environment.

The course-enhanced learning method of unmanned vehicle autonomous navigation as described in claim 1, wherein in step (b), the unmanned vehicle autonomous navigation task performed by the heterogeneous robot includes: transferring from the autonomous navigation of the land unmanned vehicle to the autonomous navigation of the water unmanned vehicle, and improving the obstacle avoidance and wind and wave resistance during the autonomous navigation of the water unmanned vehicle.

As described in claim 1, the curriculum-enhanced learning method for autonomous navigation of unmanned vehicles, wherein in step (a), the curriculum-enhanced learning method gradually increases the difficulty of the training of autonomous navigation of the unmanned vehicle in multiple stages as the learning progress of the heterogeneous robot progresses.

A course as described in claim 3 to enhance learning of autonomous navigation methods for unmanned vehicles, wherein the difficulty of training gradually increases from basic navigation tasks of obstacle avoidance and target navigation to advanced navigation tasks of sensor modalities and vehicle dynamics.

The course-enhanced learning method for autonomous navigation of unmanned vehicles as described in claim 1, wherein in step (b), the virtual environment includes a land environment and a water environment.

The course-enhanced learning method for autonomous navigation of unmanned vehicles as described in claim 1, wherein in step (b), the real environment includes a land environment and a water environment.

A curriculum-enhanced learning unmanned vehicle autonomous navigation system, comprising: a heterogeneous robot; a training device, used to connect with the heterogeneous robot and perform unmanned vehicle autonomous navigation training on the heterogeneous robot through a curriculum-enhanced learning method, so that the heterogeneous robot learns a real-time obstacle avoidance function; and a control device, used to connect with the heterogeneous robot that has completed the unmanned vehicle autonomous navigation training and learned the real-time obstacle avoidance function and control the heterogeneous robot The unmanned vehicle autonomous navigation tasks in virtual environment and real environment are respectively performed; wherein the heterogeneous robot has different sensor modalities and vehicle dynamics, and the curriculum reinforcement learning method adopted by the training device is to solve the complex autonomous control problem of the heterogeneous robot in a dense obstacle environment by using a reinforcement learning strategy trained with different fields of view through a hierarchical integrated dynamics and perception model and a sample-effective deep reinforcement learning model.

As described in claim 7, the course reinforces the learning of the unmanned vehicle autonomous navigation system, wherein the control device controls the unmanned vehicle autonomous navigation task performed by the heterogeneous robot to include: transferring from the autonomous navigation of the land unmanned vehicle to the autonomous navigation of the water unmanned vehicle, and improving the obstacle avoidance and wind and wave resistance during the autonomous navigation of the water unmanned vehicle.

As described in claim 7, the curriculum-enhanced learning unmanned vehicle autonomous navigation system, wherein the curriculum-enhanced learning method adopted by the training device is to gradually increase the training difficulty of the unmanned vehicle autonomous navigation training in multiple stages as the learning progress of the heterogeneous robot.

The course as claimed in claim 9 enhances learning of an autonomous navigation system for an unmanned vehicle, wherein the difficulty of the training gradually increases from basic navigation tasks of obstacle avoidance and target navigation to advanced navigation tasks of sensor modalities and vehicle dynamics.

The course as described in claim 7 enhances the learning of unmanned vehicle autonomous navigation system, wherein the virtual environment includes a land environment and a water environment.

The course as described in claim 7 enhances the learning of unmanned vehicle autonomous navigation system, wherein the real environment includes a land environment and a water environment.

A heterogeneous robot is applied to a curriculum-enhanced learning unmanned vehicle autonomous navigation system. The curriculum-enhanced learning unmanned vehicle autonomous navigation system further includes a training device and a control device. The heterogeneous robot includes: a connection module, which is used to connect to the training device and the control device respectively to receive a training instruction from the training device and a control instruction from the control device respectively; a training module, which is coupled to the connection module, and is used to perform unmanned vehicle autonomous navigation training according to the curriculum-enhanced learning method according to the training instruction, so that the heterogeneous robot learns the real-time obstacle avoidance function and issues a training completion notification; and a control module coupled to the connection module and the training module. When the control module receives the training completion notification, the control module controls the heterogeneous robot to perform autonomous navigation tasks of unmanned vehicles in virtual environments and real environments respectively according to the control instructions; wherein the heterogeneous robot has different sensor modalities and vehicle dynamics, and the curriculum reinforcement learning method uses a layered integrated dynamics and perception model and a sample-effective deep reinforcement learning model to use reinforcement learning strategies trained with different fields of view to solve the complex autonomous control problem of the heterogeneous robot in a dense obstacle environment.

The heterogeneous robot as described in claim 13, wherein the control module controls the autonomous navigation mission of the unmanned vehicle performed by the heterogeneous robot to include: transferring from the autonomous navigation of the unmanned vehicle on land to the autonomous navigation of the unmanned vehicle on water, and improving the obstacle avoidance and wind and wave resistance during the autonomous navigation of the unmanned vehicle on water.

A heterogeneous robot as described in claim 13, wherein the curriculum reinforcement learning method gradually increases the difficulty of the unmanned vehicle autonomous navigation training in multiple stages as the learning progress of the heterogeneous robot.

A heterogeneous robot as described in claim 15, wherein the training difficulty gradually increases from basic navigation tasks such as obstacle avoidance and target navigation to advanced navigation tasks such as sensor modalities and vehicle dynamics.

A heterogeneous robot as described in claim 13, wherein the virtual environment includes a terrestrial environment and an aquatic environment.

A heterogeneous robot as described in claim 13, wherein the real environment includes a land environment and an aquatic environment.