[go: up one dir, main page]

CN114815828B - A robot path planning and control method combining reinforcement learning with recurrent networks - Google Patents

A robot path planning and control method combining reinforcement learning with recurrent networks Download PDF

Info

Publication number
CN114815828B
CN114815828B CN202210442298.6A CN202210442298A CN114815828B CN 114815828 B CN114815828 B CN 114815828B CN 202210442298 A CN202210442298 A CN 202210442298A CN 114815828 B CN114815828 B CN 114815828B
Authority
CN
China
Prior art keywords
robot
path
reinforcement learning
target point
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210442298.6A
Other languages
Chinese (zh)
Other versions
CN114815828A (en
Inventor
张隆源
李伟
候梓越
王冀
刘翼
毕一飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202210442298.6A priority Critical patent/CN114815828B/en
Publication of CN114815828A publication Critical patent/CN114815828A/en
Application granted granted Critical
Publication of CN114815828B publication Critical patent/CN114815828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • G05D1/0278Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle using satellite positioning signals, e.g. GPS

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Manipulator (AREA)

Abstract

本发明涉及一种强化学习结合循环网络的机器人路径规划及控制方法,该方法包括:构建生成机器人路径的循环网络,所述的循环网络依次生成机器人路径中的路径点;采用强化学习方法训练所述的循环网络;利用训练的循环网络执行机器人路径规划;控制机器人按照规划的路径点依次移动。与现有技术相比,本发明能够在局部信息受限的同时极大程度上对未知环境进行推理,节约资源,提升效率,实现可观测范围内的可行路径规划,从而在复杂场景下能够找到目标点,实现机器人的移动控制。

The present invention relates to a robot path planning and control method combining reinforcement learning with a recurrent network, the method comprising: constructing a recurrent network for generating a robot path, the recurrent network sequentially generating path points in the robot path; training the recurrent network using a reinforcement learning method; executing robot path planning using the trained recurrent network; and controlling the robot to move sequentially according to the planned path points. Compared with the prior art, the present invention can reason about an unknown environment to a great extent while local information is limited, save resources, improve efficiency, and realize feasible path planning within an observable range, so as to find the target point in a complex scene and realize the mobile control of the robot.

Description

Robot path planning and control method combining reinforcement learning with circulation network
Technical Field
The invention relates to the technical field of robot path planning and control, in particular to a robot path planning and control method combining reinforcement learning with a circulation network.
Background
Much research is currently devoted to applying reinforcement learning to robots. In the aspect of robot movement control, reinforcement learning is also widely used. There are many applications in which robots use end-to-end methods for motion control. The end-to-end method can obtain good effects in dense obstacle and dynamic obstacle environments according to local information. But encountering some complex static corner walls or obstructions, the end-to-end approach takes more time to explore a viable path. In many cases, the end-to-end method can only find a locally optimal action, so that a target point cannot be found in a complex scene.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a robot path planning and control method combining reinforcement learning with a circulation network.
The aim of the invention can be achieved by the following technical scheme:
a robot path planning method combining reinforcement learning with a cyclic network, the method comprising:
constructing a circulation network for generating a robot path, wherein the circulation network sequentially generates path points in the robot path;
Training the circulation network by adopting a reinforcement learning method;
and performing robot path planning by using the trained loop network.
Preferably, the loop network comprises a plurality of cascaded path loop network models for outputting path points, and the input of the path network models comprises radar information, robot target point information and last path point information.
Preferably, the specific method for generating the robot path by the circulation network comprises the following steps:
establishing a robot local polar coordinate system, wherein the robot self coordinates are represented as Q o (0, 0) in the robot local polar coordinate system, and a path point set formed by path points in a robot path is represented as Wherein Q i represents an ith path point, ρ i、αi corresponds to the displacement distance and rotation angle of the ith path point Q i relative to the (i-1) th path point Q i-1, and n is the total number of path points constituting the robot path;
Acquiring current radar information O s, wherein O s is unchanged before the path generation is completed;
Determining a coordinate T (rho tt) of a robot target point T under a robot local polar coordinate system, wherein the T is unchanged before the generation of a path is completed;
For the kth waypoint to be generated, radar information O s, a robot target point T (ρ tt), and kth-1 one waypoint position information Q k-1k-1k-1 are input to the path circulation network model, which outputs kth waypoint position information Q kkk), k=1, 2.
Preferably, the reinforcement learning training cycle network comprises:
starting a plurality of processes, training a plurality of robots in the simulation map at the same time, and generating robot paths based on a circulation network respectively;
and constructing a return function for each robot path, and updating and optimizing a loop network of the generated path by using a reinforcement learning algorithm.
Preferably, the return function of the robot path is expressed as r:
r=rc+rn+rs
wherein r c is collision feedback of the generated robot path and the obstacle, r n is approach target point feedback, and r s is path smoothness feedback.
Preferably, the specific determination mode of the return function of the robot path includes:
To establish a robot local polar coordinate system, the robot's own coordinates are represented as Q o (0, 0) in the robot local polar coordinate system, and the set of path points consisting of path points in the robot path are represented as Wherein Q i represents an ith path point, ρ i、αi corresponds to the displacement distance and rotation angle of the ith path point Q i relative to the (i-1) th path point Q i-1, and n is the total number of path points constituting the robot path;
Collision feedback r c is calculated:
wherein a is a constant;
Calculating the proximity target point feedback r n:
Wherein d is the distance from the current position of the robot to the target point, and s i is the generated distance from the ith path point to the target point;
Calculating and generating path smoothness feedback r s:
wherein b is a constant;
R=r c+rn+rs is calculated.
Preferably, the reinforcement learning algorithm includes a PPO algorithm.
Preferably, the performing robot path planning using the trained loop network includes:
deploying the trained path circulation network model on the mobile robot;
Determining the current position of a robot, selecting a target point of the robot, starting a laser radar of the robot, and acquiring radar information;
The robot sequentially generates path points according to the current position of the robot, radar information, target point information and a path circulation network model, and the generated path points are sequentially arranged to form a robot path point set.
A robot control method of reinforcement learning combined with a cyclic network, the method comprising:
carrying out path planning on the robot by adopting the path planning method, and determining a robot path from the current position to the target point, wherein the robot path comprises a plurality of path points;
and controlling the robot to sequentially move according to the planned path points.
Preferably, after the robot moves for a period of time according to the planned route points, the route planning method is adopted again to carry out route planning on the robot, the robot is controlled to move according to the new robot route, and the above processes are executed circularly until the robot reaches the target point.
Compared with the prior art, the invention has the following advantages:
(1) The conventional path planning method needs to know global information so as to plan the global path, and the path planning method provided by the invention combines reinforcement learning and a circulation network and can generate a relatively optimized local path through limited information, therefore, the method does not need global information as the traditional method, does not need to predict the path, and does not optimize the generated path.
(2) Because the generation of the path is a rare task in reinforcement learning, the invention develops a circulation network model for the purpose, and in most cases, the previous path can influence the path generated later, and the process of sequentially generating the path points can be perfectly matched by utilizing the causal relationship between the front result and reasoning in the circulation network, according to the invention, a near-end optimization strategy (PPO) method in reinforcement learning is used for training a circulating network model to plan a path of the robot, the path planned by the method can generate a corresponding optimization result according to a real-time environment, meanwhile, the time consumption is low, the required reasoning times are low, the unknown environment can be greatly inferred while local information is limited, the resource is saved, and the efficiency is improved.
(3) The invention relates to a control method of a robot, which is characterized in that firstly, the gravity center of application of reinforcement learning is transferred to a path planning of the robot, the invention uses reinforcement learning to generate a local feasible path of the robot based on limited information which can be obtained by the robot, and then the robot is controlled to move based on the feasible path.
Drawings
FIG. 1 is a schematic view of a radar scan space O s according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a target point T and a path according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a process for generating a robot path using a cyclic network in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a path circulation network model according to an embodiment of the present invention;
Fig. 5 is a flow chart of a robot path planning and control method combining reinforcement learning with a circulation network according to an embodiment of the invention.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples. Note that the following description of the embodiments is merely an example, and the present invention is not intended to be limited to the applications and uses thereof, and is not intended to be limited to the following embodiments.
Examples
The embodiment provides a robot path planning method combining reinforcement learning with a circulation network, which comprises the following steps:
constructing a circulation network for generating a robot path, and sequentially generating path points in the robot path by the circulation network;
training a circulation network by adopting a reinforcement learning method;
and performing robot path planning by using the trained loop network.
The loop network comprises a plurality of cascaded path loop network models for outputting path points, and the input of the path loop network models comprises radar information, robot target point information and last path point information.
According to the invention, reinforcement learning is combined with a circulation network, an optimal path which can be safely followed by the wheeled robot is generated through environment information, and the surrounding environment is explored through limited sensor information, so that the robot can complete a moving task through the path.
The method specifically comprises the following steps:
s1, defining a path:
The S1-1, path, representation method is a set of points, this set of points is called the "path set of points". All points in a set of path points constitute a path. The set of path points is lp= { Q 1,Q2,Q3,…,Qn }, where n is the maximum number of path points, Q 1,Q2,Q3,…,Qn is the 1 st, 2 nd, 3 rd, respectively.
S1-2, taking the position of the robot as the origin of a local coordinate system of the robot, taking the facing direction of the robot as the x axis, and establishing a local polar coordinate system of the robot. The robot's own coordinates are denoted as Q o (0, 0) in the robot's local polar coordinate system.
S1-3, sequentially generating a displacement distance amount and a rotation angle size (ρ ii) of an i-th point Q i relative to an i-1-th point Q i-1 from 1-n (n is the maximum number of path points and is also the length of a path point set), i=1, 2.
S1-4, the set of path points can be expressed as: Where Q i denotes an i-th path point, ρ i、αi corresponds to a displacement distance and a rotation angle of the i-th path point Q i with respect to the i-1-th path point Q i-1, and n is a total number of path points constituting the robot path.
S2, building a circulation network:
S2-1, as shown in FIG. 1 and FIG. 2, the robot radar information O s in the present embodiment is obtained by:
The area is divided into 180 parts by taking the range of 90 degrees of the robot facing direction as the effective area scanned by the robot radar, and one distance data is recorded by 1 degree average. Three frames of sensor data which are scanned by a robot radar sensor recently are taken, namely:
s2-2, acquiring robot target point information T:
T((ρtt)
S2-3, path information Lp, wherein the path consists of n points, namely:
Lp=[Q1,Q2,Q3,…,Qn]
S2-4, constructing a path circulation network model, wherein the input of the path network model comprises radar information, robot target point information and last path point information. The loop network comprises a plurality of cascaded loop networks for outputting path points, wherein the loop network is formed by cascading n path loop network models in fig. 3.
S3, generating a path Lp by a circulation network:
As shown in fig. 3, all the path points are sequentially generated through the loop network, and all the path points form a path point set to generate a path Lp.
S3-1, input data required by a circulation network are as follows:
S3-1-1, radar information O s at the current moment. The robot facing direction is 0 degrees, the ranges of 90 degrees (namely (-90 degrees, 90 degrees) are respectively about 180 degrees, the 180 degrees are divided into 180 dimensions, and each dimension data represents the distance from the obstacle to the radar sensor within 1 degree. The closer the obstacle is to the sensor, the smaller the magnitude of this angle. Every time the radar sensor rotates for one circle, the radar data is updated for 180-dimension data once, and after the sensor rotates for three circles, the radar data is updated for three times, so that 3×180-dimension data (3 frames×180 dimensions) are obtained. Then O s is 540-dimensional data. O s is unchanged until path generation is completed.
S3-1-2, and coordinates T (ρ tt) of a robot target point T under a robot local polar coordinate system. T is unchanged until path generation is completed.
S3-1-3, last point Q k-1k-1k-1). Before the path generation is completed, when the i-th point Q i is generated, the input becomes the last point Q i-1 of the i-th point.
S3-2, path circulation network model:
The structure of the path circulation network model is shown in fig. 4, in which O s generated in step S3-1-1 is convolved into 256-dimensional data through two layers of convolution layers, the convolved 256-dimensional data is combined with two-dimensional data T (ρ tt) and two-dimensional data Q k-1k-1k-1) in S3-1-2 and S3-1-3 into 260-dimensional data, and then the 260-dimensional data is input to the full connection layer for processing, and the output point Q kkk is output.
S3-3, outputting the following by a circulation network:
S3-3-1, at the kth step, point Q kkk of output). After the robot generates the route point k, the robot updates its own state, updates its own virtual predicted position and orientation (state information) to the newly generated route point k, and then estimates the next point Q k+1k+1k+1 by combining the radar information O s and the target point information T, which are not changed during the route generation, with the virtual predicted position and orientation.
S3-3-2, after all n points in the path point set are generated, the path point set and path Lp generating process is completed, radar data O s is updated, and target point data T is updated.
S4, training phase:
The reinforcement learning training cycle network includes:
starting a plurality of processes, training a plurality of robots in the simulation map at the same time, and generating robot paths based on a circulation network respectively;
and constructing a return function for each robot path, and updating and optimizing a loop network of the generated path by using a reinforcement learning algorithm.
The method specifically comprises the following steps:
S4-1, after generating the path, judging the path by adopting a return function, wherein the return function of the robot path is expressed as r:
r=rc+rn+rs
wherein r c is collision feedback of the generated robot path and the obstacle, r n is approach target point feedback, and r s is path smoothness feedback.
The feedback r c for a collision is determined as follows:
where a is a constant, and in this embodiment, a takes a value of 10.
Here, the collision is a collision of the generated path with an obstacle, not a collision with the obstacle during the travel of the robot. If the generated path passes through collision with the scanned obstacle in the robot local polar coordinate system, r c = -10, and if the generated path can avoid the obstacle and finally reach the target point, r c = +10. In other cases r c = 0.
Feedback r n on whether the target point is approached:
Where d is the distance from the current position of the robot to the target point, s i is the generated i-th waypoint to the target point, and d-s i indicates that the generated i-th waypoint is closer to the length of the target point than the current position of the robot. If d-s i >0, then this indicates that the robot is farther from the target point, and the generated path point is closer to the target point, so the feedback is positive. In generating the path point Q i, if i is larger, the generated path point may be closer to the target point, (d-s i)/i can balance the feedback of all points to better calculate the effect of each point on the overall path.
The smoothness feedback r s for the generated path can reflect the smoothness of this path of the robot by the magnitude of the deviation of a i from the current direction, expressed as:
where b is a constant, and in this embodiment, b takes a value of 0.0005.
R=r c+rn+rs is calculated.
S4-2, starting a plurality of processes, training a plurality of robots in a simulation map simultaneously, and updating and optimizing a circulation network of a generated path by using a reinforcement learning near-end policy optimization (PPO) algorithm through paths generated by the plurality of robots and a return function r defined in S4-1.
S5, execution stage
S5-1, deploying the trained path circulation network model on the mobile robot.
S5-2, selecting a target point of the robot, starting a laser radar of the robot, and acquiring radar information.
S5-3, the robot generates a trackable path Lp at the current moment according to the radar information O s, the target point information T and the path circulation network model M.
Based on the path planning method, as shown in fig. 5, the embodiment also provides a robot control method combining reinforcement learning with a circulation network, which includes:
Carrying out path planning on the robot by adopting the path planning method, and determining that the robot moves from the current position to the target point in the path of the robot, wherein the path of the robot comprises a plurality of path points;
the robot controller controls the robot to act, including the release of information such as linear velocity, angular velocity and the like, and controls the robot to move sequentially according to the planned path points.
After the robot moves for a period of time deltat according to the planned path point, generating a path Lp 'of the robot at the new position according to the new radar data O s', the target point data T ', the starting position Q 0' and the network model M, and issuing a corresponding instruction again by the movement controller according to the new path until the robot reaches the target point.
The conventional path planning method needs to know global information so as to plan the global path, and the path planning method provided by the invention combines reinforcement learning and a circulation network and can generate a relatively optimized local path through limited information, therefore, the method does not need global information as the traditional method, does not need to predict the path, and does not optimize the generated path. Because the generation of the path is a rare task in reinforcement learning, the invention develops a circulation network model for the purpose, and in most cases, the previous path can influence the path generated later, and the process of sequentially generating the path points can be perfectly matched by utilizing the causal relationship between the front result and reasoning in the circulation network, according to the invention, a near-end optimization strategy (PPO) method in reinforcement learning is used for training a circulating network model to plan a path of the robot, the path planned by the method can generate a corresponding optimization result according to a real-time environment, meanwhile, the time consumption is low, the required reasoning times are low, the unknown environment can be greatly inferred while local information is limited, the resource is saved, and the efficiency is improved.
The robot is controlled on the basis of the path planning of the robot, so that the robot control method of the invention transfers the application gravity center of reinforcement learning to the path planning of the robot, the invention generates a locally feasible path of the robot by reinforcement learning based on the limited information which can be obtained by the robot, and then controlling the robot to move based on the feasible path, which is different from the end-to-end output robot movement control method, the invention needs to add the controller to enable the robot to move along the generated local feasible path, thereby finding the target point in the complex scene and realizing the movement control of the robot.
The above embodiments are merely examples, and do not limit the scope of the present invention. These embodiments may be implemented in various other ways, and various omissions, substitutions, and changes may be made without departing from the scope of the technical idea of the present invention.

Claims (7)

1. A robot path planning method combining reinforcement learning with a circulation network is characterized by comprising the following steps:
constructing a circulation network for generating a robot path, wherein the circulation network sequentially generates path points in the robot path;
Training the circulation network by adopting a reinforcement learning method;
Performing robot path planning by using a trained circulation network;
The reinforcement learning training cycle network includes:
starting a plurality of processes, training a plurality of robots in the simulation map at the same time, and generating robot paths based on a circulation network respectively;
Constructing a return function for each robot path, and updating and optimizing a circulation network of the generated path by using a reinforcement learning algorithm;
The return function of the robot path is denoted as r:
r=rc+rn+rs
Wherein r c is collision feedback of the generated robot path and the obstacle, r n is approach target point feedback, and r s is path smoothness feedback;
the specific determination mode of the return function of the robot path comprises the following steps:
To establish a robot local polar coordinate system, the robot's own coordinates are represented as Q o (0, 0) in the robot local polar coordinate system, and the set of path points consisting of path points in the robot path are represented as Wherein Q i represents an ith path point, ρ i、αi corresponds to the displacement distance and rotation angle of the ith path point Q i relative to the (i-1) th path point Q i-1, and n is the total number of path points constituting the robot path;
Collision feedback r c is calculated:
wherein a is a constant;
Calculating the proximity target point feedback r n:
Wherein d is the distance from the current position of the robot to the target point, and s i is the generated distance from the ith path point to the target point;
Calculating and generating path smoothness feedback r s:
wherein b is a constant;
R=r c+rn+rs is calculated.
2. The robot path planning method of reinforcement learning combined with a circulation network according to claim 1, wherein the circulation network comprises a plurality of cascaded path circulation network models for outputting path points, and inputs of the path network models comprise radar information, robot target point information and last path point information.
3. The method for planning a path of a robot by combining reinforcement learning with a circulation network according to claim 2, wherein the specific method for generating the path of the robot by the circulation network comprises the following steps:
Acquiring current radar information O s, wherein O s is unchanged before the path generation is completed;
Determining a coordinate T (rho tt) of a robot target point T under a robot local polar coordinate system, wherein the T is unchanged before the generation of a path is completed;
For the kth waypoint to be generated, radar information O s, a robot target point T (ρ tt), and kth-1 one waypoint position information Q k-1k-1k-1 are input to the path circulation network model, which outputs kth waypoint position information Q kkk), k=1, 2.
4. The method for planning a path of a robot by combining reinforcement learning with a cyclic network of claim 1, wherein the reinforcement learning algorithm comprises a PPO algorithm.
5. The method for planning a path of a robot by reinforcement learning in combination with a cyclic network of claim 1, wherein the performing the path planning of the robot by using the trained cyclic network comprises:
deploying the trained path circulation network model on the mobile robot;
Determining the current position of a robot, selecting a target point of the robot, starting a laser radar of the robot, and acquiring radar information;
The robot sequentially generates path points according to the current position of the robot, radar information, target point information and a path circulation network model, and the generated path points are sequentially arranged to form a robot path point set.
6. A robot control method combining reinforcement learning with a cyclic network, the method comprising:
carrying out path planning on a robot by adopting the method according to any one of claims 1-5, and determining a robot path from a current position to a target point of the robot, wherein the robot path comprises a plurality of path points;
and controlling the robot to sequentially move according to the planned path points.
7. The robot control method of reinforcement learning combined with a circulation network according to claim 6, wherein after the robot moves for a period of time according to the planned route points, the robot is subjected to route planning again by adopting the method according to any one of claims 1 to 5, and the robot is controlled to move according to a new robot route, and the above processes are circularly executed until the robot reaches the target point.
CN202210442298.6A 2022-04-25 2022-04-25 A robot path planning and control method combining reinforcement learning with recurrent networks Active CN114815828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210442298.6A CN114815828B (en) 2022-04-25 2022-04-25 A robot path planning and control method combining reinforcement learning with recurrent networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210442298.6A CN114815828B (en) 2022-04-25 2022-04-25 A robot path planning and control method combining reinforcement learning with recurrent networks

Publications (2)

Publication Number Publication Date
CN114815828A CN114815828A (en) 2022-07-29
CN114815828B true CN114815828B (en) 2025-04-04

Family

ID=82507390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210442298.6A Active CN114815828B (en) 2022-04-25 2022-04-25 A robot path planning and control method combining reinforcement learning with recurrent networks

Country Status (1)

Country Link
CN (1) CN114815828B (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10343279B2 (en) * 2015-07-10 2019-07-09 Board Of Trustees Of Michigan State University Navigational control of robotic systems and other computer-implemented processes using developmental network with turing machine learning
CN106970615B (en) * 2017-03-21 2019-10-22 西北工业大学 A real-time online path planning method for deep reinforcement learning
US20190184561A1 (en) * 2017-12-15 2019-06-20 The Regents Of The University Of California Machine Learning based Fixed-Time Optimal Path Generation
CN108459614B (en) * 2018-01-17 2020-12-04 哈尔滨工程大学 A real-time collision avoidance planning method for UUV based on CW-RNN network
US11131993B2 (en) * 2019-05-29 2021-09-28 Argo AI, LLC Methods and systems for trajectory forecasting with recurrent neural networks using inertial behavioral rollout
CN110221611B (en) * 2019-06-11 2020-09-04 北京三快在线科技有限公司 Trajectory tracking control method and device and unmanned vehicle
CN110716574B (en) * 2019-09-29 2023-05-02 哈尔滨工程大学 A Real-time Collision Avoidance Planning Method for UUV Based on Deep Q-Network
DE102020200165B4 (en) * 2020-01-09 2022-05-19 Robert Bosch Gesellschaft mit beschränkter Haftung Robot controller and method for controlling a robot
CN111780777B (en) * 2020-07-13 2022-10-21 江苏中科智能制造研究院有限公司 Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
CN113741528B (en) * 2021-09-13 2023-05-23 中国人民解放军国防科技大学 Deep reinforcement learning training acceleration method for collision avoidance of multiple unmanned aerial vehicles

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Robot Navigation with Reinforcement Learned Path Generation and Fine-Tuned Motion Control》;Longyuan Zhang 等;《IEEE》;20230708;第1-8页 *
《离散制造智能工厂场景的AGV路径规划方法》;郭心德 等;《广东工业大学学报》;20211130;第70-76页 *

Also Published As

Publication number Publication date
CN114815828A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN110632931B (en) A collision avoidance planning method for mobile robots based on deep reinforcement learning in dynamic environments
Zhao et al. The experience-memory Q-learning algorithm for robot path planning in unknown environment
CN116551703B (en) Motion planning method based on machine learning in complex environment
Debnath et al. A review on graph search algorithms for optimal energy efficient path planning for an unmanned air vehicle
CN108897215B (en) Multi-ocean-robot collaborative annular scanning method based on distributed model predictive control
Zhang et al. Robot navigation with reinforcement learned path generation and fine-tuned motion control
KR20240052808A (en) Multi-robot coordination using graph neural networks
Ou et al. GPU-based global path planning using genetic algorithm with near corner initialization
CN118778650A (en) Improved A-star algorithm for mobile robot motion planning
CN116578080A (en) Local path planning method based on deep reinforcement learning
CN117387635A (en) A UAV navigation method based on deep reinforcement learning and PID controller
Zhang et al. Path planning of mobile robot in dynamic obstacle avoidance environment based on deep reinforcement learning
CN113391633A (en) Urban environment-oriented mobile robot fusion path planning method
Liu et al. Path planning for material scheduling in Industrial Internet scenarios based on an improved RRT* algorithm
CN114815828B (en) A robot path planning and control method combining reinforcement learning with recurrent networks
Zhang et al. Cooperative path planning for heterogeneous UAV swarms: A Stackelberg game approach
Qin et al. Knowledge guided deep deterministic policy gradient
CN116300905A (en) A Constrained Multi-robot Reinforcement Learning Safe Formation Method Based on 2D Laser Observation
Zhou et al. Deep reinforcement learning with long-time memory capability for robot mapless navigation
CN114281087B (en) Path planning method based on lifetime planning A* and speed obstacle method
Chen et al. Improved path planning and controller design based on PRM
Zeng et al. An efficient path planning algorithm for mobile robots
CN118752484A (en) Space trajectory planning method for manipulator joints based on model-predicted path integration
Wang et al. MSGJO: a new multi-strategy AI algorithm for the mobile robot path planning
Shi et al. Improvement of Path Planning Algorithm based on Small Step Artificial Potential Field Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant