US20190299978A1 - Automatic Navigation Using Deep Reinforcement Learning - Google Patents
Automatic Navigation Using Deep Reinforcement Learning Download PDFInfo
- Publication number
- US20190299978A1 US20190299978A1 US15/944,563 US201815944563A US2019299978A1 US 20190299978 A1 US20190299978 A1 US 20190299978A1 US 201815944563 A US201815944563 A US 201815944563A US 2019299978 A1 US2019299978 A1 US 2019299978A1
- Authority
- US
- United States
- Prior art keywords
- neural network
- autonomous vehicle
- sensor
- location
- autonomous
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B62—LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
- B62D—MOTOR VEHICLES; TRAILERS
- B62D15/00—Steering not otherwise provided for
- B62D15/02—Steering position indicators ; Steering position determination; Steering aids
- B62D15/027—Parking aids, e.g. instruction means
- B62D15/0285—Parking performed automatically
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/06—Automatic manoeuvring for parking
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/0088—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/098—Distributed learning, e.g. federated learning
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2556/00—Input parameters relating to data
- B60W2556/45—External transmission of data to or from the vehicle
- B60W2556/50—External transmission of data to or from the vehicle of positioning data, e.g. GPS [Global Positioning System] data
Definitions
- This invention relates to navigation for vehicles.
- Parking a vehicle, especially parallel parking, is a skill that requires much practice and trial-and-error experience. Even experienced drivers tend to avoid this task since proper maneuvering depends not only on the skill of the driver, but also on largely unpredictable environmental factors, such as the slope and area of the available parking spot and the orientation and movement of adjacent vehicles. In addition, the high costs associated with even small mistakes often deter all but the most confident drivers.
- Automatic parking technology has been developed to autonomously move a vehicle into a desired parking spot from an initial starting location, such as a traffic lane.
- modern automatic parking systems engage in a step-by-step process where steering angle, brake and accelerator values are calculated in situ by an onboard vehicle network. Coordinated control of the steering angle and speed, taking into account the current pose of the vehicle and surrounding environment, virtually ensures collision-free orientation of the vehicle in an available parking space.
- automatic parking capability is also an integral component of autonomous vehicles. Such vehicles may be required to perform parallel parking maneuvers under the same wide range of initial conditions and/or operational parameters as human drivers. In addition, autonomous vehicles may be required to drive under special scenarios, such as accident zones or construction zones, that are not included as part of a pre-determined map. Successful navigation is critical in any case, as high costs may result from small mistakes.
- systems and methods to train an autonomous vehicle to automatically reach a desired target location would train an autonomous vehicle to efficiently and accurately respond to a wide range of initial locations, orientations, and operating parameters of the vehicle relative to a final target destination location.
- Such systems and methods would also be scalable, robust, and utilize trial-and-error training to enable a network to learn from its mistakes.
- FIG. 1 is a high-level block diagram showing one example of a computing system in which a system and method in accordance with the invention may be implemented;
- FIG. 2 is a high-level block diagram showing components of a system for training an autonomous vehicle to reach a target destination in accordance with certain embodiments of the invention
- FIG. 3 is a flow chart showing a process for automatic maneuvering in accordance with embodiments of the invention.
- FIG. 4 is a high-level schematic diagram showing training an autonomous vehicle to perform perpendicular parking in accordance with certain embodiments of the invention
- FIG. 5 is a high-level schematic diagram showing training an autonomous vehicle to perform angled parking in accordance with certain embodiments of the invention
- FIG. 6 is a high-level schematic diagram showing a simulated environment providing a parallel parking space and an accident zone in accordance with certain embodiments of the invention.
- FIG. 7 is a flow chart showing a process for automatic vehicle navigation using deep reinforcement learning in accordance with certain embodiments of the invention.
- the computing system 100 is presented to show one example of an environment where a system and method in accordance with the invention may be implemented.
- the computing system 100 may be embodied as a mobile device 100 such as a smart phone or tablet, a desktop computer, a workstation, a server, or the like.
- the computing system 100 is presented by way of example and is not intended to be limiting. Indeed, the systems and methods disclosed herein may be applicable to a wide variety of different computing systems in addition to the computing system 100 shown. The systems and methods disclosed herein may also potentially be distributed across multiple computing systems 100 .
- the computing system 100 includes at least one processor 102 and may include more than one processor 102 .
- the processor 102 may be operably connected to a memory 104 .
- the memory 104 may include one or more non-volatile storage devices such as hard drives 104 a, solid state drives 104 a, CD-ROM drives 104 a, DVD-ROM drives 104 a, tape drives 104 a, or the like.
- the memory 104 may also include non-volatile memory such as a read-only memory 104 b (e.g., ROM, EPROM, EEPROM, and/or Flash ROM) or volatile memory such as a random access memory 104 c (RAM or operational memory).
- a bus 106 or plurality of buses 106 , may interconnect the processor 102 , memory devices 104 , and other devices to enable data and/or instructions to pass therebetween.
- the computing system 100 may include one or more ports 108 .
- Such ports 108 may be embodied as wired ports 108 (e.g., USB ports, serial ports, Firewire ports, SCSI ports, parallel ports, etc.) or wireless ports 108 (e.g., Bluetooth, IrDA, etc.).
- the ports 108 may enable communication with one or more input devices 110 (e.g., keyboards, mice, touchscreens, cameras, microphones, scanners, storage devices, etc.) and output devices 112 (e.g., displays, monitors, speakers, printers, storage devices, etc.).
- the ports 108 may also enable communication with other computing systems 100 .
- the computing system 100 includes a wired or wireless network adapter 114 to connect the computing system 100 to a network 116 , such as a LAN, WAN, or the Internet.
- a network 116 may enable the computing system 100 to connect to one or more servers 118 , workstations 120 , personal computers 120 , mobile computing devices, or other devices.
- the network 116 may also enable the computing system 100 to connect to another network by way of a router 122 or other device 122 .
- a router 122 may allow the computing system 100 to communicate with servers, workstations, personal computers, or other devices located on different networks.
- Embodiments of the invention address this issue by training autonomous vehicles in a simulated environment to efficiently and accurately respond to a range of initial locations, orientations, and operating parameters of the vehicle relative to a final target destination location.
- a system for automatically navigating an autonomous vehicle using deep reinforcement learning may guide an autonomous vehicle from an initial location to a desired target location in a step-by-step process.
- steering angle, brake and accelerator values may be calculated in situ by an onboard neural network.
- the network may receive the current location and orientation of the vehicle as input from an array of sensors.
- A3N asynchronous advantage
- a system 200 for automatic navigation using deep reinforcement learning in accordance with the invention may include an autonomous vehicle having an array of sensors 208 and an automatic maneuvering system 206 . These subsystems may interface with a neural network onboard the autonomous vehicle to train the neural network to reach a target destination accurately and efficiently.
- Sensors 208 may include, for example, camera sensors, lidar sensors, radar sensors, location or GPS sensors, ultrasound sensors, and the like. Information gathered from the various sensors 208 may be processed by the onboard neural network and received by the automatic maneuvering system 206 . In this manner, the sensors 208 may inform and update the automatic maneuvering system 206 substantially continuously regarding a current state of the autonomous vehicle, including its location, orientation, and status.
- the sensors 208 may provide to a display compiler 210 information regarding a current state of the autonomous vehicle. Such information may be communicated to the display compiler 210 periodically or substantially continuously via the onboard network.
- the display compiler 210 may use this information, in combination with information from pre-determined maps 212 (such as those provided by GPS data) of the surrounding area, to make real-time calculations and produce graphical representations relevant to navigation of the autonomous vehicle. This compiled data may be communicated to a dashboard 214 for display to a user, as discussed in more detail below.
- a dashboard 214 or other user interface may be visible to a user to enable activation and control of the system 200 .
- the dashboard 214 may be displayed on a remotely-located computer, mobile phone, smart device, or the like, and may maintain connectivity with the neural network by way of an appropriate wireless communication technology, such as a Wi-Fi connection, cellular data connection, the internet, or other communication technology known to those in the art.
- the dashboard 214 may enable a user to activate the system via an activation mechanism 202 .
- the dashboard 214 may also include a monitor 204 or other display device to enable a user to monitor the state of the autonomous vehicle and/or its surrounding environment.
- the activation mechanism 202 may include a physical button, a virtual button on a screen, a voice command, a mouse click, a finger touch, or the like.
- the monitor 204 may provide a real-time initial location of the autonomous vehicle, and the activation mechanism 202 may operate in combination with the monitor 204 to enable the user to activate the automatic maneuvering system 206 by selecting a final destination on the monitor 204 .
- embodiments of the present invention may incorporate an automatic maneuvering system 206 which is scalable, efficient, robust, and can account for a wide range of initial locations and/or orientations of the autonomous vehicle relative to its final or target destination.
- the automatic maneuvering system 206 may include a deep reinforcement learning framework, and may be implemented in a simulated environment where numerous trials and errors may be used to train the onboard neural network.
- the automatic maneuvering system 206 may train the onboard neural network to learn from mistakes using an exploration-exploitation tradeoff.
- an automatic maneuvering system 206 in accordance with the invention may perform certain method 300 steps.
- the automatic maneuvering system 206 may be activated 302 by a user via an activation mechanism 202 such as a physical button, a virtual button on a screen, a voice command, a mouse click, a finger touch on a screen, or the like.
- the activation mechanism 202 may be visible and accessible to a user via a physical or virtual dashboard 214 of a remote device.
- the activation mechanism 202 may be located onboard the autonomous vehicle.
- the activation mechanism 202 may allow a user to select a target destination for the autonomous vehicle, or the user may select the target destination via a monitor 204 or other mechanism or device known to those in the art.
- the automatic maneuvering system 206 may confirm 304 the selected destination as the final destination for the autonomous vehicle by determining location and/or directional coordinates corresponding to the selected destination.
- Location coordinates may be determined by referencing data gathered by onboard sensors 208 , including GPS sensors, and/or predetermined maps 212 .
- Directional coordinates may include, for example, a final heading angle or steering angle for the autonomous vehicle.
- a final destination or target position may be expressed as (x, y, h) F , where x and y are locations on perpendicular lateral axes, and h is a final heading angle.
- the automatic maneuvering system 206 may ascertain 306 drive boundaries within a surrounding area to facilitate navigating the autonomous vehicle from an initial location to a final target destination without interference from objects or obstacles in the vicinity.
- Drive boundaries may include, for example, stationary objects or obstacles such as road signs, trees, buildings, bodies of water, and the like.
- Drive boundaries may be determined by referencing sensor 208 data and/or pre-determined maps 212 .
- the autonomous vehicle may be localized 308 using sensor 208 data and pre-determined maps 212 . Localizing 308 the autonomous vehicle may include determining an orientation of the vehicle, a location of the vehicle, a control status, a steering angle, and the like. This information, in addition to the final destination coordinates and drive boundaries, may be received 310 by the onboard neural network via onboard sensors 208 .
- the reinforcement learning control framework may include a deep Q-network that learns from mistakes using an exploration-exploitation tradeoff.
- the deep Q-network may utilize numerous trials and errors where it is rewarded for good actions and penalized for bad actions.
- an epsilon-greedy strategy may be used for exploration versus exploitation decisions during the training of the neural networks.
- the information received 310 by the onboard neural network may be processed and utilized to navigate 312 the vehicle from its initial location to its final location.
- the neural network may determine appropriate incremental adjustments to a vehicle steering angle, acceleration, and/or brake to enable the autonomous vehicle to reach the final target destination.
- the system may be initially activated 302 at time t t .
- the onboard neural network may receive 310 sensor information for the autonomous vehicle that corresponds to t t , and the reinforcement learning control framework may be utilized to process such information. Based on that information, appropriate vehicle controls or settings may be determined and used to navigate 312 the autonomous vehicle to a new position at time t t+1 .
- Location and directional coordinates corresponding to the new position may be compared 314 with the final destination. If the new position coordinates match the final destination coordinates, the method 300 may end. If not, the method 300 may return to localize 308 the vehicle and iterate the process 300 until the autonomous vehicle reaches the final destination.
- certain embodiments of the invention may provide a simulated environment 400 having perpendicular parking spaces 404 .
- a deep Q-network may be used to train an autonomous vehicle 402 to automatically occupy an available perpendicular parking space 404 .
- the autonomous vehicle 402 may include an array of onboard sensors to gather data from the external environment.
- the array of sensors may include, for example, image camera sensors, depth camera sensors, infrared camera sensors, lidar sensors, radar sensors, ultrasound sensors, and the like. This data may be input into the automatic maneuvering system 206 and used in combination with predetermined map data to train the autonomous vehicle 402 to properly and efficiently maneuver into the perpendicular parking space 404 .
- a user may activate the system and select a perpendicular parking space 404 as the target destination.
- the automatic maneuvering system 206 may determine location and/or directional coordinates corresponding to the perpendicular parking space 404 .
- the automatic maneuvering system 206 may determine a safe driving area by identifying and locating drive boundaries in the surrounding area. As shown, for example, drive boundaries may include a curb 406 and other vehicles 408 parked in adjacent parking spaces.
- Onboard sensors may further gather information regarding a current state of the autonomous vehicle 402 , including its location and orientation.
- the automatic maneuvering system 206 may input this information into the reinforcement learning framework of the onboard neural network for processing. Based on this information, the reinforcement learning framework may output appropriate vehicle control indications or settings to the autonomous vehicle 402 , such as steering angle, acceleration, and brake.
- the reinforcement learning framework may determine that the autonomous vehicle 402 should adjust its steering angle by 15 degrees and decelerate by 2 mph within a one second period of time. These indications may be input into the vehicle control system, resulting in a vehicle action. Upon expiration of the one second period of time, a new position of the autonomous vehicle 402 may be determined. This process may be repeated until the new position of the autonomous vehicle 402 matches the coordinates for the perpendicular parking space 404 such that the autonomous vehicle 402 is properly positioned within the perpendicular parking space 404 .
- the reinforcement learning framework may include an actor network and a critic network.
- the first neural network, or actor network may determine appropriate vehicle control indications or settings for implementation by the vehicle control system, while a second neural network, or critic network, may monitor actions taken by the autonomous vehicle 402 in accordance with those indications.
- the second neural network may analyze each action taken by the autonomous vehicle 402 to determine whether it was beneficial or detrimental to accurately and efficiently maneuvering the autonomous vehicle 402 into the perpendicular parking space 404 or other final target destination. If the action taken was desired, or beneficial, the second neural network may reward the first neural network by generating a certain signal. If the action taken was not desired, or detrimental, to effectively navigating the autonomous vehicle 402 to the target destination, the second neural network my penalize the first neural network via a temporal difference error signal. In this manner, the critic network trains the actor network to perform beneficial actions and to “learn” from its mistakes during the training phase.
- a replay buffer may store past vehicle states, actions taken at each state, and the corresponding rewards and penalties applied. For training, a small batch of data may be sampled from the replay buffer and used to train each neural network. When the replay buffer is full, the old data may be discarded and replaced by new data obtained from more recent performance episodes.
- another embodiment of the invention may provide a simulated environment 500 having angled parking spaces 504 .
- an actor-critic formulation such as A3C may be used.
- multiple autonomous vehicles 502 , 506 may navigate to a corresponding angled parking space 504 , 508 substantially simultaneously. Their resulting performances may be cumulated by a central master actor and used to train their respective neural networks.
- a first autonomous vehicle 502 may be located and oriented in a particular position relative to a first angled parking space 504 .
- a second autonomous vehicle 506 may be located and oriented in the same position relative to a second angled parking space 508 .
- the final target destination for each of the first and second autonomous vehicles 502 , 504 may be the first and second angled parking spaces 504 , 508 , respectively.
- An automatic maneuvering system 206 of each autonomous vehicle 502 , 506 may be activated by a user to automatically maneuver each of the autonomous vehicles 502 , 506 from their initial positions to their respective angled parking spaces 504 , 508 .
- Each automatic maneuvering system 206 may operate independently to explore the state-action space and thereby determine a good policy for navigation.
- an array of onboard sensors associated with each vehicle 502 , 506 may gather information substantially continuously regarding the current state of its respective autonomous vehicle 502 , 506 . This information may be communicated to onboard neural networks associated with each autonomous vehicle 502 , 506 for processing.
- a designated network corresponding to one of the autonomous vehicles 502 for example, or central master actor, may update the neural networks of both autonomous vehicles 502 , 506 based on information received from each autonomous vehicle 502 , 506 upon exploring the same environment 500 . Resulting weights or scores after rewards and penalties have been applied by each neural network may be shared across the different autonomous vehicle 502 , 506 networks. Training multiple autonomous vehicles 502 , 506 in this manner may result in faster learning, since multiple autonomous vehicles 502 , 506 execute the same task in parallel across multiple threads of a network.
- certain embodiments may incorporate a dual-framework system, where both a deep Q-network and an A3C actor-critic formulation may be used to train an autonomous vehicle 610 to reach a target destination in accordance with the invention.
- One embodiment of the invention may train an autonomous vehicle 610 to perform various tasks (i.e., parking, navigating accident or construction zones, or the like) utilizing both the deep Q-network framework and the A3C framework. The performance of each framework may then be analyzed to determine which framework performs better in which regions of the phase space.
- the deep Q-network framework may demonstrate better performance at the autonomous vehicle's initial location, while the A3C framework may demonstrate better performance at or near its final destination.
- This information may be stored in a look-up table that identifies various locations or regions where each of the frameworks is superior to the other in performance.
- the look-up table may be stored locally onboard the autonomous vehicle 610 .
- the look-up table may be stored remotely on a server or database, and communicated to the autonomous vehicle 610 via V2V communication, WiFi, the internet, or other communication method known to those in the art.
- activation of the automatic navigation system in accordance with embodiments of the invention may also trigger activation of the better-performing framework, depending on the state of the vehicle 610 and the task to be performed.
- one embodiment of a simulated environment 600 in accordance with the invention may include an autonomous vehicle 610 having an available parallel parking space 614 as its target destination.
- Embodiments of the invention may access a look-up table to determine that the deep Q-network is superior to the A3C framework at an initial vehicle 610 location, while the A3C framework is superior to the deep Q-network until the autonomous vehicle 610 nears the parallel parking space 614 .
- the deep Q-network may be automatically triggered in response to sensor data indicating that the autonomous vehicle 610 is situated in its initial position, while the A3C framework may be automatically triggered in response to changed sensor data indicating that the autonomous vehicle 610 has moved to a position nearer to the parallel parking space 614 .
- an autonomous vehicle 610 may have a target destination 612 that requires the autonomous vehicle 610 to make a left-hand turn 606 through an intersection.
- a direct path from the initial location of the autonomous vehicle 610 to the target destination 612 may be obstructed, however, due to a collision 616 between a preceding vehicle 602 attempting to make the same left-hand turn, and a bus 604 traveling in the opposite direction.
- Training the autonomous vehicle 610 to avoid the collision 616 in transit to its target destination 612 in accordance with embodiments of the invention may also utilize a dual framework to determine in which regions of the phase space each performs better.
- a score may be calculated for each region of the phase space, and may be associated with the corresponding framework. A discussed above, a score may be calculated according to the rewards and penalties received for corresponding actions. The framework with the highest score for a particular region of the phase space may be identified as the better performer for that region. This information may then recorded in a look-up table, as discussed above, and the appropriate framework may be triggered based on the region in which the autonomous vehicle 610 is located.
- a process 700 for automatic vehicle navigation using deep reinforcement learning may include detecting 702 a vehicle state.
- a vehicle state may include, for example, its location, orientation, steering angle, control status, and the like.
- the vehicle state may be determined by referencing sensor data, as well as referencing data from external sources such as predetermined maps of a surrounding area.
- the vehicle may then begin navigation 704 to a target destination.
- the target destination may be selected by a user, and location coordinates corresponding to the target destination may be input into the automatic maneuvering system.
- the automatic maneuvering system may process this information to enable the vehicle to take successive actions to reach the target destination.
- the process 700 may query 706 whether the action was desirable. If yes, the system may generate a signal to reward 708 the network for the action. If not, the system may generate a signal to penalize 710 the network for the action.
- the reward or penalty received may be associated with the action taken and stored 12 in a replay buffer.
- Data from the replay buffer may be sampled and used to train networks.
- the data may also be communicated 714 to a central master actor, such as a network or processor associated with a designated autonomous vehicle.
- the central master actor may process the information and cumulate it with information obtained from networks associated with other autonomous vehicles performing the same task under the same circumstances.
- the cumulated information may then be disseminated 716 back to the networks associated with those autonomous vehicles to facilitate faster learning.
- Implementations of the systems, devices, and methods disclosed herein may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed herein. Implementations within the scope of the present disclosure may also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are computer storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, implementations of the disclosure can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
- Computer storage media includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
- SSDs solid state drives
- PCM phase-change memory
- An implementation of the devices, systems, and methods disclosed herein may communicate over a computer network.
- a “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices.
- Transmissions media can include a network and/or data links, which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
- Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
- the computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code.
- the disclosure may be practiced in network computing environments with many types of computer system configurations, including, an in-dash vehicle computer, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, various storage devices, and the like.
- the disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks.
- program modules may be located in both local and remote memory storage devices.
- ASICs application specific integrated circuits
- a sensor may include computer code configured to be executed in one or more processors, and may include hardware logic/electrical circuitry controlled by the computer code.
- processors may include hardware logic/electrical circuitry controlled by the computer code.
- At least some embodiments of the disclosure have been directed to computer program products comprising such logic (e.g., in the form of software) stored on any computer useable medium.
- Such software when executed in one or more data processing devices, causes a device to operate as described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Automation & Control Theory (AREA)
- Mechanical Engineering (AREA)
- Transportation (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Chemical & Material Sciences (AREA)
- Combustion & Propulsion (AREA)
- Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
- Aviation & Aerospace Engineering (AREA)
- Traffic Control Systems (AREA)
- Navigation (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Medical Informatics (AREA)
Abstract
Description
- This invention relates to navigation for vehicles.
- Parking a vehicle, especially parallel parking, is a skill that requires much practice and trial-and-error experience. Even experienced drivers tend to avoid this task since proper maneuvering depends not only on the skill of the driver, but also on largely unpredictable environmental factors, such as the slope and area of the available parking spot and the orientation and movement of adjacent vehicles. In addition, the high costs associated with even small mistakes often deter all but the most confident drivers.
- Automatic parking technology has been developed to autonomously move a vehicle into a desired parking spot from an initial starting location, such as a traffic lane. To this end, modern automatic parking systems engage in a step-by-step process where steering angle, brake and accelerator values are calculated in situ by an onboard vehicle network. Coordinated control of the steering angle and speed, taking into account the current pose of the vehicle and surrounding environment, virtually ensures collision-free orientation of the vehicle in an available parking space.
- Though still under development, automatic parking capability is also an integral component of autonomous vehicles. Such vehicles may be required to perform parallel parking maneuvers under the same wide range of initial conditions and/or operational parameters as human drivers. In addition, autonomous vehicles may be required to drive under special scenarios, such as accident zones or construction zones, that are not included as part of a pre-determined map. Successful navigation is critical in any case, as high costs may result from small mistakes.
- In view of the foregoing, what are needed are systems and methods to train an autonomous vehicle to automatically reach a desired target location. Ideally, such systems and methods would train an autonomous vehicle to efficiently and accurately respond to a wide range of initial locations, orientations, and operating parameters of the vehicle relative to a final target destination location. Such systems and methods would also be scalable, robust, and utilize trial-and-error training to enable a network to learn from its mistakes.
- In order that the advantages of the invention will be readily understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered limiting of its scope, the invention will be described and explained with additional specificity and detail through use of the accompanying drawings, in which:
-
FIG. 1 is a high-level block diagram showing one example of a computing system in which a system and method in accordance with the invention may be implemented; -
FIG. 2 is a high-level block diagram showing components of a system for training an autonomous vehicle to reach a target destination in accordance with certain embodiments of the invention; -
FIG. 3 is a flow chart showing a process for automatic maneuvering in accordance with embodiments of the invention; -
FIG. 4 is a high-level schematic diagram showing training an autonomous vehicle to perform perpendicular parking in accordance with certain embodiments of the invention; -
FIG. 5 is a high-level schematic diagram showing training an autonomous vehicle to perform angled parking in accordance with certain embodiments of the invention; -
FIG. 6 is a high-level schematic diagram showing a simulated environment providing a parallel parking space and an accident zone in accordance with certain embodiments of the invention; and -
FIG. 7 is a flow chart showing a process for automatic vehicle navigation using deep reinforcement learning in accordance with certain embodiments of the invention. - Referring to
FIG. 1 , one example of acomputing system 100 is illustrated. Thecomputing system 100 is presented to show one example of an environment where a system and method in accordance with the invention may be implemented. Thecomputing system 100 may be embodied as amobile device 100 such as a smart phone or tablet, a desktop computer, a workstation, a server, or the like. Thecomputing system 100 is presented by way of example and is not intended to be limiting. Indeed, the systems and methods disclosed herein may be applicable to a wide variety of different computing systems in addition to thecomputing system 100 shown. The systems and methods disclosed herein may also potentially be distributed acrossmultiple computing systems 100. - As shown, the
computing system 100 includes at least oneprocessor 102 and may include more than oneprocessor 102. Theprocessor 102 may be operably connected to a memory 104. The memory 104 may include one or more non-volatile storage devices such ashard drives 104 a,solid state drives 104 a, CD-ROM drives 104 a, DVD-ROM drives 104 a,tape drives 104 a, or the like. The memory 104 may also include non-volatile memory such as a read-only memory 104 b (e.g., ROM, EPROM, EEPROM, and/or Flash ROM) or volatile memory such as arandom access memory 104 c (RAM or operational memory). Abus 106, or plurality ofbuses 106, may interconnect theprocessor 102, memory devices 104, and other devices to enable data and/or instructions to pass therebetween. - To enable communication with external systems or devices, the
computing system 100 may include one ormore ports 108.Such ports 108 may be embodied as wired ports 108 (e.g., USB ports, serial ports, Firewire ports, SCSI ports, parallel ports, etc.) or wireless ports 108 (e.g., Bluetooth, IrDA, etc.). Theports 108 may enable communication with one or more input devices 110 (e.g., keyboards, mice, touchscreens, cameras, microphones, scanners, storage devices, etc.) and output devices 112 (e.g., displays, monitors, speakers, printers, storage devices, etc.). Theports 108 may also enable communication withother computing systems 100. - In certain embodiments, the
computing system 100 includes a wired orwireless network adapter 114 to connect thecomputing system 100 to anetwork 116, such as a LAN, WAN, or the Internet. Such anetwork 116 may enable thecomputing system 100 to connect to one ormore servers 118,workstations 120,personal computers 120, mobile computing devices, or other devices. Thenetwork 116 may also enable thecomputing system 100 to connect to another network by way of arouter 122 orother device 122. Such arouter 122 may allow thecomputing system 100 to communicate with servers, workstations, personal computers, or other devices located on different networks. - As previously mentioned, autonomous vehicle technology is currently under development with the goal of providing a fully-autonomous vehicle capable of performing the same functions and maneuvers as a human operator, with even greater precision and efficiency. Automatic parking and navigation under a variety of circumstances is critical to autonomous vehicle functionality. Embodiments of the invention address this issue by training autonomous vehicles in a simulated environment to efficiently and accurately respond to a range of initial locations, orientations, and operating parameters of the vehicle relative to a final target destination location.
- As discussed in detail below, a system for automatically navigating an autonomous vehicle using deep reinforcement learning in accordance with the invention may guide an autonomous vehicle from an initial location to a desired target location in a step-by-step process. In certain embodiments, steering angle, brake and accelerator values may be calculated in situ by an onboard neural network. The network may receive the current location and orientation of the vehicle as input from an array of sensors. Two unique deep reinforcement learning frameworks—a deep Q-network and an asynchronous advantage (“A3N”) actor-critic network—may be implemented to train the onboard network. Output from these frameworks may be fed into the control system of the autonomous vehicle in real time to execute the maneuver.
- Referring now to
FIG. 2 , asystem 200 for automatic navigation using deep reinforcement learning in accordance with the invention may include an autonomous vehicle having an array ofsensors 208 and anautomatic maneuvering system 206. These subsystems may interface with a neural network onboard the autonomous vehicle to train the neural network to reach a target destination accurately and efficiently. -
Sensors 208 may include, for example, camera sensors, lidar sensors, radar sensors, location or GPS sensors, ultrasound sensors, and the like. Information gathered from thevarious sensors 208 may be processed by the onboard neural network and received by theautomatic maneuvering system 206. In this manner, thesensors 208 may inform and update theautomatic maneuvering system 206 substantially continuously regarding a current state of the autonomous vehicle, including its location, orientation, and status. - In addition, the
sensors 208 may provide to adisplay compiler 210 information regarding a current state of the autonomous vehicle. Such information may be communicated to thedisplay compiler 210 periodically or substantially continuously via the onboard network. Thedisplay compiler 210 may use this information, in combination with information from pre-determined maps 212 (such as those provided by GPS data) of the surrounding area, to make real-time calculations and produce graphical representations relevant to navigation of the autonomous vehicle. This compiled data may be communicated to adashboard 214 for display to a user, as discussed in more detail below. - In certain embodiments, a
dashboard 214 or other user interface may be visible to a user to enable activation and control of thesystem 200. In some embodiments, thedashboard 214 may be displayed on a remotely-located computer, mobile phone, smart device, or the like, and may maintain connectivity with the neural network by way of an appropriate wireless communication technology, such as a Wi-Fi connection, cellular data connection, the internet, or other communication technology known to those in the art. - The
dashboard 214 may enable a user to activate the system via anactivation mechanism 202. Thedashboard 214 may also include amonitor 204 or other display device to enable a user to monitor the state of the autonomous vehicle and/or its surrounding environment. In certain embodiments, theactivation mechanism 202 may include a physical button, a virtual button on a screen, a voice command, a mouse click, a finger touch, or the like. In some embodiments, themonitor 204 may provide a real-time initial location of the autonomous vehicle, and theactivation mechanism 202 may operate in combination with themonitor 204 to enable the user to activate theautomatic maneuvering system 206 by selecting a final destination on themonitor 204. - Referring now to
FIG. 3 , embodiments of the present invention may incorporate anautomatic maneuvering system 206 which is scalable, efficient, robust, and can account for a wide range of initial locations and/or orientations of the autonomous vehicle relative to its final or target destination. Theautomatic maneuvering system 206 may include a deep reinforcement learning framework, and may be implemented in a simulated environment where numerous trials and errors may be used to train the onboard neural network. In certain embodiments, theautomatic maneuvering system 206 may train the onboard neural network to learn from mistakes using an exploration-exploitation tradeoff. - To this end, embodiments of an
automatic maneuvering system 206 in accordance with the invention may performcertain method 300 steps. For example, theautomatic maneuvering system 206 may be activated 302 by a user via anactivation mechanism 202 such as a physical button, a virtual button on a screen, a voice command, a mouse click, a finger touch on a screen, or the like. In some embodiments, theactivation mechanism 202 may be visible and accessible to a user via a physical orvirtual dashboard 214 of a remote device. In other embodiments, theactivation mechanism 202 may be located onboard the autonomous vehicle. In certain embodiments, theactivation mechanism 202 may allow a user to select a target destination for the autonomous vehicle, or the user may select the target destination via amonitor 204 or other mechanism or device known to those in the art. - The
automatic maneuvering system 206 may confirm 304 the selected destination as the final destination for the autonomous vehicle by determining location and/or directional coordinates corresponding to the selected destination. Location coordinates may be determined by referencing data gathered byonboard sensors 208, including GPS sensors, and/orpredetermined maps 212. Directional coordinates may include, for example, a final heading angle or steering angle for the autonomous vehicle. In one embodiment, a final destination or target position may be expressed as (x, y, h)F, where x and y are locations on perpendicular lateral axes, and h is a final heading angle. - In some embodiments, the
automatic maneuvering system 206 may ascertain 306 drive boundaries within a surrounding area to facilitate navigating the autonomous vehicle from an initial location to a final target destination without interference from objects or obstacles in the vicinity. Drive boundaries may include, for example, stationary objects or obstacles such as road signs, trees, buildings, bodies of water, and the like. Drive boundaries may be determined by referencingsensor 208 data and/orpre-determined maps 212. - Upon determining a safe drive area based on the drive boundaries, the autonomous vehicle may be localized 308 using
sensor 208 data andpre-determined maps 212. Localizing 308 the autonomous vehicle may include determining an orientation of the vehicle, a location of the vehicle, a control status, a steering angle, and the like. This information, in addition to the final destination coordinates and drive boundaries, may be received 310 by the onboard neural network viaonboard sensors 208. - In certain embodiments, the reinforcement learning control framework may include a deep Q-network that learns from mistakes using an exploration-exploitation tradeoff. As discussed in more detail below, the deep Q-network may utilize numerous trials and errors where it is rewarded for good actions and penalized for bad actions. In one embodiment, an epsilon-greedy strategy may be used for exploration versus exploitation decisions during the training of the neural networks.
- The information received 310 by the onboard neural network may be processed and utilized to navigate 312 the vehicle from its initial location to its final location. In some embodiments, based on this information, the neural network may determine appropriate incremental adjustments to a vehicle steering angle, acceleration, and/or brake to enable the autonomous vehicle to reach the final target destination.
- For example, in one embodiment, the system may be initially activated 302 at time tt. The onboard neural network may receive 310 sensor information for the autonomous vehicle that corresponds to tt, and the reinforcement learning control framework may be utilized to process such information. Based on that information, appropriate vehicle controls or settings may be determined and used to navigate 312 the autonomous vehicle to a new position at time tt+1.
- Location and directional coordinates corresponding to the new position may be compared 314 with the final destination. If the new position coordinates match the final destination coordinates, the
method 300 may end. If not, themethod 300 may return to localize 308 the vehicle and iterate theprocess 300 until the autonomous vehicle reaches the final destination. - Referring now to
FIG. 4 , certain embodiments of the invention may provide asimulated environment 400 havingperpendicular parking spaces 404. As discussed above, in some embodiments, a deep Q-network may be used to train anautonomous vehicle 402 to automatically occupy an availableperpendicular parking space 404. - The
autonomous vehicle 402 may include an array of onboard sensors to gather data from the external environment. The array of sensors may include, for example, image camera sensors, depth camera sensors, infrared camera sensors, lidar sensors, radar sensors, ultrasound sensors, and the like. This data may be input into theautomatic maneuvering system 206 and used in combination with predetermined map data to train theautonomous vehicle 402 to properly and efficiently maneuver into theperpendicular parking space 404. - In some embodiments, a user may activate the system and select a
perpendicular parking space 404 as the target destination. Using data from the array of onboard sensors as well as predetermined map information, theautomatic maneuvering system 206 may determine location and/or directional coordinates corresponding to theperpendicular parking space 404. Theautomatic maneuvering system 206 may determine a safe driving area by identifying and locating drive boundaries in the surrounding area. As shown, for example, drive boundaries may include acurb 406 andother vehicles 408 parked in adjacent parking spaces. - Onboard sensors may further gather information regarding a current state of the
autonomous vehicle 402, including its location and orientation. Theautomatic maneuvering system 206 may input this information into the reinforcement learning framework of the onboard neural network for processing. Based on this information, the reinforcement learning framework may output appropriate vehicle control indications or settings to theautonomous vehicle 402, such as steering angle, acceleration, and brake. - In one embodiment, for example, the reinforcement learning framework may determine that the
autonomous vehicle 402 should adjust its steering angle by 15 degrees and decelerate by 2 mph within a one second period of time. These indications may be input into the vehicle control system, resulting in a vehicle action. Upon expiration of the one second period of time, a new position of theautonomous vehicle 402 may be determined. This process may be repeated until the new position of theautonomous vehicle 402 matches the coordinates for theperpendicular parking space 404 such that theautonomous vehicle 402 is properly positioned within theperpendicular parking space 404. - In embodiments utilizing a deep Q-network during the training phase, the reinforcement learning framework may include an actor network and a critic network. The first neural network, or actor network, may determine appropriate vehicle control indications or settings for implementation by the vehicle control system, while a second neural network, or critic network, may monitor actions taken by the
autonomous vehicle 402 in accordance with those indications. - The second neural network, or critic network, may analyze each action taken by the
autonomous vehicle 402 to determine whether it was beneficial or detrimental to accurately and efficiently maneuvering theautonomous vehicle 402 into theperpendicular parking space 404 or other final target destination. If the action taken was desired, or beneficial, the second neural network may reward the first neural network by generating a certain signal. If the action taken was not desired, or detrimental, to effectively navigating theautonomous vehicle 402 to the target destination, the second neural network my penalize the first neural network via a temporal difference error signal. In this manner, the critic network trains the actor network to perform beneficial actions and to “learn” from its mistakes during the training phase. - In certain embodiments, a replay buffer may store past vehicle states, actions taken at each state, and the corresponding rewards and penalties applied. For training, a small batch of data may be sampled from the replay buffer and used to train each neural network. When the replay buffer is full, the old data may be discarded and replaced by new data obtained from more recent performance episodes.
- Referring now to
FIG. 5 , another embodiment of the invention may provide asimulated environment 500 having angledparking spaces 504. In this embodiment, an actor-critic formulation such as A3C may be used. Specifically, multiple 502, 506 may navigate to a correspondingautonomous vehicles 504, 508 substantially simultaneously. Their resulting performances may be cumulated by a central master actor and used to train their respective neural networks.angled parking space - As shown, for example, a first
autonomous vehicle 502 may be located and oriented in a particular position relative to a firstangled parking space 504. A secondautonomous vehicle 506 may be located and oriented in the same position relative to a secondangled parking space 508. In each case, the final target destination for each of the first and second 502, 504 may be the first and secondautonomous vehicles 504, 508, respectively.angled parking spaces - An
automatic maneuvering system 206 of each 502, 506 may be activated by a user to automatically maneuver each of theautonomous vehicle 502, 506 from their initial positions to their respectiveautonomous vehicles 504, 508. Eachangled parking spaces automatic maneuvering system 206 may operate independently to explore the state-action space and thereby determine a good policy for navigation. As above, an array of onboard sensors associated with each 502, 506 may gather information substantially continuously regarding the current state of its respectivevehicle 502, 506. This information may be communicated to onboard neural networks associated with eachautonomous vehicle 502, 506 for processing.autonomous vehicle - A designated network corresponding to one of the
autonomous vehicles 502 for example, or central master actor, may update the neural networks of both 502, 506 based on information received from eachautonomous vehicles 502, 506 upon exploring theautonomous vehicle same environment 500. Resulting weights or scores after rewards and penalties have been applied by each neural network may be shared across the different 502, 506 networks. Training multipleautonomous vehicle 502, 506 in this manner may result in faster learning, since multipleautonomous vehicles 502, 506 execute the same task in parallel across multiple threads of a network.autonomous vehicles - Referring now to
FIG. 6 , certain embodiments may incorporate a dual-framework system, where both a deep Q-network and an A3C actor-critic formulation may be used to train anautonomous vehicle 610 to reach a target destination in accordance with the invention. - One embodiment of the invention may train an
autonomous vehicle 610 to perform various tasks (i.e., parking, navigating accident or construction zones, or the like) utilizing both the deep Q-network framework and the A3C framework. The performance of each framework may then be analyzed to determine which framework performs better in which regions of the phase space. - For example, in one embodiment, the deep Q-network framework may demonstrate better performance at the autonomous vehicle's initial location, while the A3C framework may demonstrate better performance at or near its final destination. This information may be stored in a look-up table that identifies various locations or regions where each of the frameworks is superior to the other in performance. The look-up table may be stored locally onboard the
autonomous vehicle 610. Alternatively, the look-up table may be stored remotely on a server or database, and communicated to theautonomous vehicle 610 via V2V communication, WiFi, the internet, or other communication method known to those in the art. In any case, activation of the automatic navigation system in accordance with embodiments of the invention may also trigger activation of the better-performing framework, depending on the state of thevehicle 610 and the task to be performed. - As shown, one embodiment of a
simulated environment 600 in accordance with the invention may include anautonomous vehicle 610 having an availableparallel parking space 614 as its target destination. Embodiments of the invention may access a look-up table to determine that the deep Q-network is superior to the A3C framework at aninitial vehicle 610 location, while the A3C framework is superior to the deep Q-network until theautonomous vehicle 610 nears theparallel parking space 614. Accordingly, the deep Q-network may be automatically triggered in response to sensor data indicating that theautonomous vehicle 610 is situated in its initial position, while the A3C framework may be automatically triggered in response to changed sensor data indicating that theautonomous vehicle 610 has moved to a position nearer to theparallel parking space 614. - In another embodiment, an
autonomous vehicle 610 may have atarget destination 612 that requires theautonomous vehicle 610 to make a left-hand turn 606 through an intersection. A direct path from the initial location of theautonomous vehicle 610 to thetarget destination 612 may be obstructed, however, due to acollision 616 between a precedingvehicle 602 attempting to make the same left-hand turn, and abus 604 traveling in the opposite direction. - Training the
autonomous vehicle 610 to avoid thecollision 616 in transit to itstarget destination 612 in accordance with embodiments of the invention may also utilize a dual framework to determine in which regions of the phase space each performs better. In some embodiments, a score may be calculated for each region of the phase space, and may be associated with the corresponding framework. A discussed above, a score may be calculated according to the rewards and penalties received for corresponding actions. The framework with the highest score for a particular region of the phase space may be identified as the better performer for that region. This information may then recorded in a look-up table, as discussed above, and the appropriate framework may be triggered based on the region in which theautonomous vehicle 610 is located. - Referring now to
FIG. 7 , aprocess 700 for automatic vehicle navigation using deep reinforcement learning in accordance with embodiments of the invention may include detecting 702 a vehicle state. A vehicle state may include, for example, its location, orientation, steering angle, control status, and the like. The vehicle state may be determined by referencing sensor data, as well as referencing data from external sources such as predetermined maps of a surrounding area. - The vehicle may then begin
navigation 704 to a target destination. The target destination may be selected by a user, and location coordinates corresponding to the target destination may be input into the automatic maneuvering system. The automatic maneuvering system may process this information to enable the vehicle to take successive actions to reach the target destination. For each action taken, theprocess 700 may query 706 whether the action was desirable. If yes, the system may generate a signal to reward 708 the network for the action. If not, the system may generate a signal to penalize 710 the network for the action. - In either case, the reward or penalty received may be associated with the action taken and stored 12 in a replay buffer. Data from the replay buffer may be sampled and used to train networks. In certain embodiments, the data may also be communicated 714 to a central master actor, such as a network or processor associated with a designated autonomous vehicle. The central master actor may process the information and cumulate it with information obtained from networks associated with other autonomous vehicles performing the same task under the same circumstances. The cumulated information may then be disseminated 716 back to the networks associated with those autonomous vehicles to facilitate faster learning.
- In the above disclosure, reference has been made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific implementations in which the disclosure may be practiced. It is understood that other implementations may be utilized and structural changes may be made without departing from the scope of the present disclosure. References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
- Implementations of the systems, devices, and methods disclosed herein may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed herein. Implementations within the scope of the present disclosure may also include physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. Computer-readable media that store computer-executable instructions are computer storage media (devices). Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example, and not limitation, implementations of the disclosure can comprise at least two distinctly different kinds of computer-readable media: computer storage media (devices) and transmission media.
- Computer storage media (devices) includes RAM, ROM, EEPROM, CD-ROM, solid state drives (“SSDs”) (e.g., based on RAM), Flash memory, phase-change memory (“PCM”), other types of memory, other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer.
- An implementation of the devices, systems, and methods disclosed herein may communicate over a computer network. A “network” is defined as one or more data links that enable the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer, the computer properly views the connection as a transmission medium. Transmissions media can include a network and/or data links, which can be used to carry desired program code means in the form of computer-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer. Combinations of the above should also be included within the scope of computer-readable media.
- Computer-executable instructions comprise, for example, instructions and data which, when executed at a processor, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, or even source code. Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the described features or acts described above. Rather, the described features and acts are disclosed as example forms of implementing the claims.
- Those skilled in the art will appreciate that the disclosure may be practiced in network computing environments with many types of computer system configurations, including, an in-dash vehicle computer, personal computers, desktop computers, laptop computers, message processors, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile telephones, PDAs, tablets, pagers, routers, switches, various storage devices, and the like. The disclosure may also be practiced in distributed system environments where local and remote computer systems, which are linked (either by hardwired data links, wireless data links, or by a combination of hardwired and wireless data links) through a network, both perform tasks. In a distributed system environment, program modules may be located in both local and remote memory storage devices.
- Further, where appropriate, functions described herein can be performed in one or more of: hardware, software, firmware, digital components, or analog components. For example, one or more application specific integrated circuits (ASICs) can be programmed to carry out one or more of the systems and procedures described herein. Certain terms are used throughout the description and claims to refer to particular system components. As one skilled in the art will appreciate, components may be referred to by different names. This document does not intend to distinguish between components that differ in name, but not function.
- It should be noted that the sensor embodiments discussed above may comprise computer hardware, software, firmware, or any combination thereof to perform at least a portion of their functions. For example, a sensor may include computer code configured to be executed in one or more processors, and may include hardware logic/electrical circuitry controlled by the computer code. These example devices are provided herein purposes of illustration, and are not intended to be limiting. Embodiments of the present disclosure may be implemented in further types of devices, as would be known to persons skilled in the relevant art(s).
- At least some embodiments of the disclosure have been directed to computer program products comprising such logic (e.g., in the form of software) stored on any computer useable medium. Such software, when executed in one or more data processing devices, causes a device to operate as described herein.
- While various embodiments of the present disclosure have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the disclosure. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents. The foregoing description has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. Further, it should be noted that any or all of the aforementioned alternate implementations may be used in any combination desired to form additional hybrid implementations of the disclosure.
Claims (20)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/944,563 US11613249B2 (en) | 2018-04-03 | 2018-04-03 | Automatic navigation using deep reinforcement learning |
| DE102019108477.6A DE102019108477A1 (en) | 2018-04-03 | 2019-04-01 | AUTOMATIC NAVIGATION USING DEEP REINFORCEMENT LEARNING |
| CN201910262817.9A CN110341700A (en) | 2018-04-03 | 2019-04-02 | The self-navigation learnt using deeply |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/944,563 US11613249B2 (en) | 2018-04-03 | 2018-04-03 | Automatic navigation using deep reinforcement learning |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20190299978A1 true US20190299978A1 (en) | 2019-10-03 |
| US11613249B2 US11613249B2 (en) | 2023-03-28 |
Family
ID=67991638
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/944,563 Active 2042-01-27 US11613249B2 (en) | 2018-04-03 | 2018-04-03 | Automatic navigation using deep reinforcement learning |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US11613249B2 (en) |
| CN (1) | CN110341700A (en) |
| DE (1) | DE102019108477A1 (en) |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190332109A1 (en) * | 2018-04-27 | 2019-10-31 | GM Global Technology Operations LLC | Systems and methods for autonomous driving using neural network-based driver learning on tokenized sensor inputs |
| US20190344797A1 (en) * | 2018-05-10 | 2019-11-14 | GM Global Technology Operations LLC | Method and system for customizing a driving behavior of an autonomous vehicle |
| CN111098852A (en) * | 2019-12-02 | 2020-05-05 | 北京交通大学 | A Reinforcement Learning-Based Parking Path Planning Method |
| US20200139973A1 (en) * | 2018-11-01 | 2020-05-07 | GM Global Technology Operations LLC | Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle |
| CN111275249A (en) * | 2020-01-15 | 2020-06-12 | 吉利汽车研究院(宁波)有限公司 | Driving behavior optimization method based on DQN neural network and high-precision positioning |
| CN111959495A (en) * | 2020-06-29 | 2020-11-20 | 北京百度网讯科技有限公司 | Vehicle control method, device and vehicle |
| CN113159430A (en) * | 2021-04-27 | 2021-07-23 | 广东电网有限责任公司清远供电局 | Route planning method, device, equipment and storage medium |
| CN113264064A (en) * | 2021-03-31 | 2021-08-17 | 志行千里(北京)科技有限公司 | Automatic driving method for intersection scene and related equipment |
| CN113311851A (en) * | 2021-04-25 | 2021-08-27 | 北京控制工程研究所 | Spacecraft pursuit-escape intelligent orbit control method and device and storage medium |
| CN113325704A (en) * | 2021-04-25 | 2021-08-31 | 北京控制工程研究所 | Spacecraft backlight approaching intelligent orbit control method and device and storage medium |
| US20210389773A1 (en) * | 2020-06-10 | 2021-12-16 | Toyota Research Institute, Inc. | Systems and methods for using a joint feature space to identify driving behaviors |
| US20220009510A1 (en) * | 2018-12-03 | 2022-01-13 | Psa Automobiles Sa | Method for training at least one algorithm for a control device of a motor vehicle, computer program product, and motor vehicle |
| US11480971B2 (en) * | 2018-05-01 | 2022-10-25 | Honda Motor Co., Ltd. | Systems and methods for generating instructions for navigating intersections with autonomous vehicles |
| US11482015B2 (en) * | 2019-08-09 | 2022-10-25 | Otobrite Electronics Inc. | Method for recognizing parking space for vehicle and parking assistance system using the method |
| US11567514B2 (en) * | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
| US11643086B2 (en) | 2017-12-18 | 2023-05-09 | Plusai, Inc. | Method and system for human-like vehicle control prediction in autonomous driving vehicles |
| US11650586B2 (en) | 2017-12-18 | 2023-05-16 | Plusai, Inc. | Method and system for adaptive motion planning based on passenger reaction to vehicle motion in autonomous driving vehicles |
| CN116461499A (en) * | 2023-03-02 | 2023-07-21 | 合众新能源汽车股份有限公司 | A parking control method and device |
| US12060066B2 (en) * | 2017-12-18 | 2024-08-13 | Plusai, Inc. | Method and system for human-like driving lane planning in autonomous driving vehicles |
| CN120003469A (en) * | 2025-03-17 | 2025-05-16 | 深蓝汽车科技有限公司 | Method, device, equipment and vehicle for constructing automatic parking path planning model based on deep reinforcement learning |
| US12515654B2 (en) | 2021-04-05 | 2026-01-06 | Ford Global Technologies, Llc | Counter-steering penalization during vehicle turns |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111141300A (en) * | 2019-12-18 | 2020-05-12 | 南京理工大学 | Intelligent mobile platform map-free autonomous navigation method based on deep reinforcement learning |
| DE102021107458A1 (en) | 2021-03-25 | 2022-09-29 | Dr. Ing. H.C. F. Porsche Aktiengesellschaft | Control device and method |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180032863A1 (en) * | 2016-07-27 | 2018-02-01 | Google Inc. | Training a policy neural network and a value neural network |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105109482B (en) | 2015-08-24 | 2017-09-12 | 奇瑞汽车股份有限公司 | Stop storage method and device |
| CN105527963B (en) | 2015-12-23 | 2019-01-25 | 奇瑞汽车股份有限公司 | Side azimuth parking method and system |
| US20170329331A1 (en) | 2016-05-16 | 2017-11-16 | Magna Electronics Inc. | Control system for semi-autonomous control of vehicle along learned route |
| CN107065567A (en) | 2017-05-22 | 2017-08-18 | 江南大学 | A kind of automatic stopping control system that control is constrained based on adaptive neural network |
| CN110651279B (en) * | 2017-06-28 | 2023-11-07 | 渊慧科技有限公司 | Using apprentices to train action selection neural networks |
| US10782694B2 (en) * | 2017-09-07 | 2020-09-22 | Tusimple, Inc. | Prediction-based system and method for trajectory planning of autonomous vehicles |
| US10545510B2 (en) * | 2017-12-12 | 2020-01-28 | Waymo Llc | Fleet management for autonomous vehicles |
-
2018
- 2018-04-03 US US15/944,563 patent/US11613249B2/en active Active
-
2019
- 2019-04-01 DE DE102019108477.6A patent/DE102019108477A1/en active Pending
- 2019-04-02 CN CN201910262817.9A patent/CN110341700A/en active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180032863A1 (en) * | 2016-07-27 | 2018-02-01 | Google Inc. | Training a policy neural network and a value neural network |
Cited By (25)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12071142B2 (en) | 2017-12-18 | 2024-08-27 | Plusai, Inc. | Method and system for personalized driving lane planning in autonomous driving vehicles |
| US12060066B2 (en) * | 2017-12-18 | 2024-08-13 | Plusai, Inc. | Method and system for human-like driving lane planning in autonomous driving vehicles |
| US11650586B2 (en) | 2017-12-18 | 2023-05-16 | Plusai, Inc. | Method and system for adaptive motion planning based on passenger reaction to vehicle motion in autonomous driving vehicles |
| US11643086B2 (en) | 2017-12-18 | 2023-05-09 | Plusai, Inc. | Method and system for human-like vehicle control prediction in autonomous driving vehicles |
| US20190332109A1 (en) * | 2018-04-27 | 2019-10-31 | GM Global Technology Operations LLC | Systems and methods for autonomous driving using neural network-based driver learning on tokenized sensor inputs |
| US11480971B2 (en) * | 2018-05-01 | 2022-10-25 | Honda Motor Co., Ltd. | Systems and methods for generating instructions for navigating intersections with autonomous vehicles |
| US20190344797A1 (en) * | 2018-05-10 | 2019-11-14 | GM Global Technology Operations LLC | Method and system for customizing a driving behavior of an autonomous vehicle |
| US20200139973A1 (en) * | 2018-11-01 | 2020-05-07 | GM Global Technology Operations LLC | Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle |
| US10940863B2 (en) * | 2018-11-01 | 2021-03-09 | GM Global Technology Operations LLC | Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle |
| US20220009510A1 (en) * | 2018-12-03 | 2022-01-13 | Psa Automobiles Sa | Method for training at least one algorithm for a control device of a motor vehicle, computer program product, and motor vehicle |
| US12164310B2 (en) | 2019-02-11 | 2024-12-10 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
| US11567514B2 (en) * | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
| US11482015B2 (en) * | 2019-08-09 | 2022-10-25 | Otobrite Electronics Inc. | Method for recognizing parking space for vehicle and parking assistance system using the method |
| CN111098852A (en) * | 2019-12-02 | 2020-05-05 | 北京交通大学 | A Reinforcement Learning-Based Parking Path Planning Method |
| CN111275249A (en) * | 2020-01-15 | 2020-06-12 | 吉利汽车研究院(宁波)有限公司 | Driving behavior optimization method based on DQN neural network and high-precision positioning |
| US11829150B2 (en) * | 2020-06-10 | 2023-11-28 | Toyota Research Institute, Inc. | Systems and methods for using a joint feature space to identify driving behaviors |
| US20210389773A1 (en) * | 2020-06-10 | 2021-12-16 | Toyota Research Institute, Inc. | Systems and methods for using a joint feature space to identify driving behaviors |
| CN111959495A (en) * | 2020-06-29 | 2020-11-20 | 北京百度网讯科技有限公司 | Vehicle control method, device and vehicle |
| CN113264064A (en) * | 2021-03-31 | 2021-08-17 | 志行千里(北京)科技有限公司 | Automatic driving method for intersection scene and related equipment |
| US12515654B2 (en) | 2021-04-05 | 2026-01-06 | Ford Global Technologies, Llc | Counter-steering penalization during vehicle turns |
| CN113325704A (en) * | 2021-04-25 | 2021-08-31 | 北京控制工程研究所 | Spacecraft backlight approaching intelligent orbit control method and device and storage medium |
| CN113311851A (en) * | 2021-04-25 | 2021-08-27 | 北京控制工程研究所 | Spacecraft pursuit-escape intelligent orbit control method and device and storage medium |
| CN113159430A (en) * | 2021-04-27 | 2021-07-23 | 广东电网有限责任公司清远供电局 | Route planning method, device, equipment and storage medium |
| CN116461499A (en) * | 2023-03-02 | 2023-07-21 | 合众新能源汽车股份有限公司 | A parking control method and device |
| CN120003469A (en) * | 2025-03-17 | 2025-05-16 | 深蓝汽车科技有限公司 | Method, device, equipment and vehicle for constructing automatic parking path planning model based on deep reinforcement learning |
Also Published As
| Publication number | Publication date |
|---|---|
| US11613249B2 (en) | 2023-03-28 |
| DE102019108477A1 (en) | 2019-10-10 |
| CN110341700A (en) | 2019-10-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11613249B2 (en) | Automatic navigation using deep reinforcement learning | |
| CN111273655B (en) | Motion planning methods and systems for autonomous vehicles | |
| CN110641472B (en) | Neural Network Based Safety Monitoring System for Autonomous Vehicles | |
| US10459441B2 (en) | Method and system for operating autonomous driving vehicles based on motion plans | |
| KR102279078B1 (en) | A v2x communication-based vehicle lane system for autonomous vehicles | |
| EP3580625B1 (en) | Driving scenario based lane guidelines for path planning of autonomous driving vehicles | |
| CN111923927B (en) | Method and apparatus for interactive perception of traffic scene prediction | |
| EP3335006B1 (en) | Controlling error corrected planning methods for operating autonomous vehicles | |
| CN108391429B (en) | Method and system for autonomous vehicle speed following | |
| CN110901656B (en) | Experimental design method and system for autonomous vehicle control | |
| CN111208814B (en) | Memory-based optimal motion planning using dynamic models for autonomous vehicles | |
| CN110389583A (en) | The method for generating the track of automatic driving vehicle | |
| CN108137015A (en) | For the sideslip compensating control method of automatic driving vehicle | |
| JP2020200033A (en) | Detecting adversarial samples by vision based perception system | |
| WO2022217209A1 (en) | Lane changing based only on local information | |
| CN113050618B (en) | Computer-implemented method for operating an autonomous vehicle | |
| US20230211800A1 (en) | Low-speed maneuver assisting system and method | |
| JP2021502915A (en) | 3-point turn plan for self-driving vehicles based on enumeration | |
| US20200257296A1 (en) | Plan buffering for low-latency policy updates | |
| US12233918B2 (en) | Determining perceptual spatial relevancy of objects and road actors for automated driving | |
| US20240278794A1 (en) | Systems and methods for controlling a vehicle | |
| Bienemann et al. | A Perception-Based Architecture for Autonomous Convoying in GNSS-Denied Areas | |
| US20250236301A1 (en) | Calibrating a gesture-based system for a vehicle | |
| US20250284287A1 (en) | Method for modelling a navigation environment of a motor vehicle |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: FORD GLOBAL TECHNOLOGIES, LLC, MICHIGAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BALAKRISHNAN, KAUSHIK;NARAYANAN, PRAVEEN;LAKEHAL-AYAT, MOHSEN;SIGNING DATES FROM 20180309 TO 20180312;REEL/FRAME:045428/0191 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |