DE102020210465A1

DE102020210465A1 - Method and device for supporting maneuver planning for an at least partially automated vehicle or a robot

Info

Publication number: DE102020210465A1
Application number: DE102020210465.4A
Authority: DE
Inventors: Micha Helbig
Original assignee: Volkswagen AG
Current assignee: Volkswagen AG
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2022-02-24
Also published as: CN114153199A

Abstract

Die Erfindung betrifft ein Verfahren zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug (50) oder einen Roboter, wobei ein Zustandsraum (10) mittels eines Markow-Entscheidungsproblems beschrieben wird, wobei zum Unterstützen einer Manöverplanung für das Fahrzeug (50) oder den Roboter ausgehend von dem Markow-Entscheidungsproblem durch Ausführen mindestens eines Optimierungsverfahrens optimale Aktionen ausgehend von diskreten Zuständen (11) im Zustandsraum (10) bestimmt werden, wobei eine Abbildung (30) mit Zuständen (11) im Zustandsraum (10) als Eingabewerten und mit optimalen Aktionen (34) im Zustandsraum (10) als Ausgabewerten bestimmt wird, wobei die bestimmte Abbildung (30) durch eine Funktionsapproximation approximiert wird, wobei Elemente der approximierten Abbildung (31), deren Ausgabewerte gegenüber den entsprechenden Ausgabewerten der bestimmten Abbildung (30) einen Fehler aufweisen, der einen Fehlerschwellenwert (32) überschreitet, in Abhängigkeit der jeweils zugehörigen Eingangswerte in einer Nachschlagetabelle (33) abgelegt werden, wobei die approximierte Abbildung (31) und die Nachschlagetabelle (33) zur Manöverplanung bereitgestellt werden. Ferner betrifft die Erfindung eine Vorrichtung (1), ein Steuergerät (51) und ein Fahrzeug (50) oder einen Roboter.

The invention relates to a method for supporting maneuver planning for an at least partially automated vehicle (50) or a robot, a state space (10) being described by means of a Markov decision problem, for supporting maneuver planning for the vehicle (50) or the robot Based on the Markov decision problem, optimal actions are determined based on discrete states (11) in the state space (10) by executing at least one optimization method, with a mapping (30) with states (11) in the state space (10) as input values and with optimal actions (34) is determined as output values in the state space (10), the specific mapping (30) being approximated by a function approximation, elements of the approximated mapping (31) whose output values have an error compared to the corresponding output values of the specific mapping (30). , which exceeds an error threshold (32), in Depending on the respectively associated input values, they are stored in a lookup table (33), with the approximated mapping (31) and the lookup table (33) being provided for maneuver planning. The invention also relates to a device (1), a control device (51) and a vehicle (50) or a robot.

Description

Die Erfindung betrifft ein Verfahren und eine Vorrichtung zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter. Ferner betrifft die Erfindung ein Steuergerät, ein Fahrzeug und einen Roboter.The invention relates to a method and a device for supporting maneuver planning for an at least partially automated vehicle or a robot. Furthermore, the invention relates to a control device, a vehicle and a robot.

In automatisiert fahrenden Fahrzeugen ist neben einer Trajektorienplanung, das heißt einem Bereitstellen einer in einer aktuellen Situation konkret abzufahrenden Trajektorie, im Rahmen einer Manöverplanung eine taktische Manöverplanung notwendig, um eine übergeordnete Strategie zu verwirklichen. Ein konkretes Beispiel hierfür ist eine Abbiegesituation mit mehreren Spuren und vielen anderen Verkehrsteilnehmern. Es muss dann entschieden werden, wann das Fahrzeug in welcher Fahrspur sein muss, um beispielsweise möglichst komfortabel für die Insassen und/oder möglichst zeitoptimal einen Abbiegevorgang durchzuführen, und welche anderen Verkehrsteilnehmer hierzu überholt werden müssen. Die vom Prinzip her gleiche Problemstellung ergibt sich auch für automatisiert handelnde Roboter.In automated vehicles, in addition to trajectory planning, ie providing a trajectory that is to be followed specifically in a current situation, tactical maneuver planning is necessary as part of maneuver planning in order to implement a higher-level strategy. A concrete example of this is a turning situation with several lanes and many other road users. A decision must then be made as to when the vehicle must be in which lane, for example in order to carry out a turning maneuver as comfortably as possible for the occupants and/or as quickly as possible, and which other road users must be overtaken for this purpose. The same problem in principle also arises for robots that act automatically.

Es sind Reinforcement-Learning-Verfahren bekannt, mit deren Hilfe ein Verhalten der anderen Verkehrsteilnehmer angelernt und basierend hierauf eine optimale Entscheidung getroffen werden kann. Hierbei wird eine Abbildung (engl. mapping) gelernt zwischen einem Zustand und einer hiermit korrespondierenden optimalen Aktion in Bezug auf eine Zielsetzung, die über eine Belohnungswert (engl. reward) ausgedrückt wird. Anders ausgedrückt versucht der Reinforcement Learning Agent die Aktion zu finden, die den Belohnungswert maximiert. Um eine optimale Lösung zu finden, muss ein Reinforcement Learning Agent ein Umfeld gründlich untersuchen, um sicherzustellen, dass eine optimale Lösung nicht übersehen wird. Andererseits kann der Agent bereits zu einem früheren Zeitpunkt erfahrene Situationen ausnutzen, in denen der Agent eine gute Lösung mit einem entsprechend hohen Belohnungswert gefunden hat.Reinforcement learning methods are known with the help of which the behavior of the other road users can be learned and based on this an optimal decision can be made. Here, a mapping is learned between a state and a corresponding optimal action in relation to a goal that is expressed as a reward. In other words, the Reinforcement Learning Agent tries to find the action that maximizes the reward value. In order to find an optimal solution, a reinforcement learning agent must examine an environment thoroughly to ensure that an optimal solution is not overlooked. On the other hand, the agent can exploit situations experienced at an earlier point in time, in which the agent has found a good solution with a correspondingly high reward value.

Ferner sind Markow-Entscheidungsprobleme und Verfahren der Dynamischen Programmierung bekannt.Furthermore, Markov decision problems and methods of dynamic programming are known.

Ein Problem bei der Beschreibung eines Zustandsraums mittels eines Markow-Entscheidungsproblems ist, dass der Zustandsraum mit jeder hinzugefügten weiteren Dimensionen exponentiell wächst („Fluch der Dimensionalität“) und dementsprechend ein Speicherbedarf ansteigt.One problem in describing a state space using a Markov decision problem is that the state space grows exponentially with each additional dimension added (“curse of dimensionality”) and memory requirements increase accordingly.

Der Erfindung liegt die Aufgabe zu Grunde, ein Verfahren und eine Vorrichtung zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter bereitzustellen, bei denen insbesondere ein geringerer Speicherbedarf erreicht werden kann.The invention is based on the object of providing a method and a device for supporting maneuver planning for an at least partially automated vehicle or a robot, in which, in particular, a lower memory requirement can be achieved.

Die Aufgabe wird erfindungsgemäß durch ein Verfahren mit den Merkmalen des Patentanspruchs 1, eine Vorrichtung mit den Merkmalen des Patentanspruchs 7 sowie ein Verfahren mit den Merkmalen des Patentanspruchs 5 und ein Steuergerät mit den Merkmalen des Patentanspruchs 9 gelöst. Vorteilhafte Ausgestaltungen der Erfindung ergeben sich aus den Unteransprüchen.The object is achieved according to the invention by a method having the features of patent claim 1, a device having the features of patent claim 7 and a method having the features of patent claim 5 and a control unit having the features of patent claim 9. Advantageous configurations of the invention result from the dependent claims.

In einem ersten Aspekt der Erfindung wird ein Verfahren zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter zur Verfügung gestellt, wobei mittels einer Aktionsbestimmungseinrichtung ein Zustandsraum eines Umfelds des Fahrzeugs oder des Roboters in diskreter Form mittels eines Markow-Entscheidungsproblems beschrieben wird, wobei zum Unterstützen einer Manöverplanung für das Fahrzeug oder den Roboter ausgehend von dem Markow-Entscheidungsproblem durch Ausführen mindestens eines Optimierungsverfahrens optimale (diskretisierte) Aktionen ausgehend von diskreten Zuständen im Zustandsraum bestimmt werden, wobei eine Abbildung mit Zuständen im Zustandsraum als Eingabewerten und mit optimalen Aktionen im Zustandsraum als Ausgabewerten bestimmt wird, wobei die bestimmte Abbildung mittels einer Approximierungseinrichtung durch eine Funktionsapproximation approximiert wird, wobei Elemente der approximierten Abbildung, deren Ausgabewerte gegenüber den entsprechenden Ausgabewerten der bestimmten Abbildung einen Fehler aufweisen, der einen vorgegebenen Fehlerschwellenwert überschreitet, in Abhängigkeit der jeweils zugehörigen Eingangswerte in einer Nachschlagtabelle abgelegt werden, und wobei die approximierte Abbildung und die Nachschlagetabelle zur Verwendung bei der Manöverplanung bereitgestellt werden.In a first aspect of the invention, a method for supporting maneuver planning for an at least partially automated vehicle or a robot is made available, with an action determination device describing a state space of an environment of the vehicle or the robot in discrete form using a Markov decision problem, wherein to support a maneuver planning for the vehicle or the robot based on the Markov decision problem by executing at least one optimization method, optimal (discretized) actions are determined based on discrete states in the state space, wherein a mapping with states in the state space as input values and with optimal actions in State space is determined as output values, the specific mapping being approximated by a function approximation using an approximation device, with elements of the approximated mapping whose output values g exhibit an error with respect to the corresponding output values of the determined mapping which exceeds a predetermined error threshold, are stored in a look-up table depending on the respective associated input values, and the approximate mapping and the look-up table being provided for use in maneuver planning.

Ferner wird in einem zweiten Aspekt der Erfindung insbesondere eine Vorrichtung zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter geschaffen, umfassend eine Aktionsbestimmungseinrichtung und eine Approximierungseinrichtung, wobei die Aktionsbestimmungseinrichtung dazu eingerichtet ist, einen Zustandsraum eines Umfelds des Fahrzeugs oder des Roboters in diskreter Form mittels eines Markow-Entscheidungsproblems zu beschreiben, zum Unterstützen einer Manöverplanung für das Fahrzeug oder den Roboter ausgehend von dem Markow-Entscheidungsproblem durch Ausführen mindestens eines Optimierungsverfahrens optimale (diskretisierte) Aktionen ausgehend von diskreten Zuständen im Zustandsraum zu bestimmen, eine Abbildung mit Zuständen im Zustandsraum als Eingabewerten und mit optimalen Aktionen im Zustandsraum als Ausgabewerten zu bestimmen, und wobei die Approximierungseinrichtung dazu eingerichtet ist, die bestimmte Abbildung mittels einer Funktionsapproximation zu approximieren, wobei Elemente der approximierten Abbildung, deren Ausgabewerte gegenüber den entsprechenden Ausgabewerten der bestimmten Abbildung einen Fehler aufweisen, der einen vorgegebenen Fehlerschwellenwert überschreitet, in Abhängigkeit der jeweils zugehörigen Eingangswerte in einer Nachschlagtabelle abgelegt werden, und wobei die Vorrichtung dazu eingerichtet ist, die approximierte Abbildung und die Nachschlagetabelle zur Verwendung bei der Manöverplanung bereitzustellen.Furthermore, in a second aspect of the invention, a device for supporting maneuver planning for an at least partially automated vehicle or a robot is created, comprising an action determination device and an approximation device, wherein the action determination device is set up to convert a state space of an environment of the vehicle or the robot into discrete form by means of a Markov decision problem, to support a maneuver planning for the vehicle or the robot based on the Markov decision problem by executing at least one optimization method optimal (discretized) actions based on discrete states in the state space determine, to determine a mapping with states in the state space as input values and with optimal actions in the state space as output values, and wherein the approximation device is set up to approximate the determined mapping by means of a function approximation, with elements of the approximated mapping whose output values compared to the corresponding output values of the specific mapping have an error that exceeds a predetermined error threshold value, are stored in a lookup table depending on the respective associated input values, and wherein the device is set up to provide the approximated mapping and the lookup table for use in maneuver planning.

In einem dritten Aspekt der Erfindung wird insbesondere auch ein Verfahren zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter zur Verfügung gestellt, wobei mittels eines Steuergeräts des Fahrzeugs oder des Roboters eine gemäß einem Verfahren gemäß dem ersten Aspekt erzeugte approximierte Abbildung und eine Nachschlagetabelle erhalten und/oder bereitgestellt werden, und zur Manöverplanung optimale Aktionen in Abhängigkeit von einem erkannten diskreten Zustand eines Zustandsraums bereitgestellt werden, wobei hierbei zuerst überprüft wird, ob für den erkannten Zustand eine optimale Aktion in der Nachschlagetabelle hinterlegt ist; falls dies der Fall ist, wird die hinterlegte optimale Aktion abgerufen und für die Manöverplanung bereitgestellt, anderenfalls wird eine optimale Aktion mittels der approximierten Abbildung geschätzt und bereitgestellt.In a third aspect of the invention, a method for supporting maneuver planning for an at least partially automated vehicle or a robot is also made available, with a control unit of the vehicle or the robot generating an approximated image generated according to a method according to the first aspect and a Lookup tables are obtained and/or provided, and optimal actions are provided for maneuver planning as a function of a recognized discrete state of a state space, it being checked first whether an optimal action is stored in the lookup table for the recognized state; if this is the case, the stored optimal action is retrieved and made available for maneuver planning, otherwise an optimal action is estimated using the approximated mapping and made available.

Sodann wird in einem vierten Aspekt der Erfindung insbesondere ein Steuergerät für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter geschaffen, wobei das Steuergerät dazu eingerichtet ist, eine gemäß einem Verfahren gemäß dem ersten Aspekt erzeugte approximierte Abbildung und eine Nachschlagetabelle zu erhalten und/oder bereitzustellen, und zur Manöverplanung optimale Aktionen in Abhängigkeit von einem erkannten diskreten Zustand eines Zustandsraums bereitzustellen, und hierzu zuerst zu überprüfen, ob für den erkannten Zustand eine optimale Aktion in der Nachschlagetabelle hinterlegt ist; falls dies der Fall ist, die hinterlegte optimale Aktion abzurufen und für die Manöverplanung bereitzustellen, anderenfalls eine optimale Aktion mittels der approximierten Abbildung zu schätzen und für die Manöverplanung bereitzustellen.Then, in a fourth aspect of the invention, in particular a control unit for an at least partially automated vehicle or a robot is created, the control unit being set up to receive and/or provide an approximated image and a look-up table generated using a method according to the first aspect, and to provide optimal actions for maneuver planning as a function of a detected discrete state of a state space, and for this purpose first checking whether an optimal action is stored in the look-up table for the detected state; if this is the case, to call up the stored optimal action and make it available for the maneuver planning, otherwise to estimate an optimal action using the approximated mapping and make it available for the maneuver planning.

Die verschiedenen Aspekte ermöglichen es, auch bei wachsenden Zustandsräumen einen Speicherbedarf nicht exponentiell wachsen zu lassen. Dies wird erreicht, indem eine zur Manöverplanung bestimmte Abbildung, in der diskrete Zustände im Zustandsraum eines Markow-Entscheidungsproblems als Eingabewerte mit optimalen Aktionen im Zustandsraum als Ausgabewerten verknüpft sind, sowohl mittels einer Funktionsapproximation als auch mittels einer Nachschlagetabelle ausgedrückt wird. Hierbei ist insbesondere einer der Grundgedanken, dass ein Großteil der bestimmten Abbildung mittels einer Funktion approximiert werden kann. Diejenigen Elemente der approximierten Abbildung jedoch (d.h. diejenigen Verknüpfungen zwischen diskreten Zuständen als Eingangswerten der Abbildung und optimalen Aktionen als Ausgabewerten), für die ein Fehler zu den entsprechenden Elementen in der (nicht approximierten) bestimmten Abbildung einen Fehlerschwellenwert überschreitet, werden in der Nachschlagetabelle hinterlegt. Hierdurch kann ein Kompromiss gefunden werden zwischen einem Speicherbedarf und einer Genauigkeit der bereitgestellten optimalen Aktionen. Beim Verwenden der approximierten Abbildung und der Nachschlagetabelle zur Manöverplanung wird zuerst in der Nachschlagetabelle nachgesehen, ob für einen aktuell erfassten bzw. erkannten diskreten Zustand im Zustandsraum eine optimale Aktion hinterlegt ist. Ist eine optimale Aktion hinterlegt, das heißt gibt es zu dem erkannten diskreten Zustand einen Eintrag in der Nachschlagetabelle, wird dieser abgerufen und für die Manöverplanung bereitgestellt. Ist hingegen für den erkannten diskreten Zustand keine optimale Aktion in der Nachschlagetabelle hinterlegt, so wird eine zugehörige optimale Aktion mittels der approximierten Abbildung geschätzt.The various aspects make it possible not to let a memory requirement grow exponentially, even with growing state spaces. This is achieved by expressing a mapping intended for maneuver planning, in which discrete states in the state space of a Markov decision problem as input values are associated with optimal actions in the state space as output values, using both a function approximation and a look-up table. In particular, one of the basic ideas here is that a large part of the specific mapping can be approximated by means of a function. However, those elements of the approximated map (i.e., those links between discrete states as input values of the map and optimal actions as output values) for which an error to the corresponding elements in the (unapproximated) particular map exceeds an error threshold are stored in the lookup table. As a result, a compromise can be found between a memory requirement and an accuracy of the optimal actions provided. When using the approximated mapping and the lookup table for maneuver planning, the lookup table is first checked to see whether an optimal action is stored for a currently detected or recognized discrete state in the state space. If an optimal action is stored, ie if there is an entry in the look-up table for the detected discrete state, this is retrieved and made available for maneuver planning. On the other hand, if no optimal action is stored in the look-up table for the identified discrete state, then an associated optimal action is estimated using the approximated mapping.

Einer der Vorteile der verschiedenen Aspekte ist, dass auch bei großen und insbesondere wachsenden Zustandsräumen ein Kompromiss zwischen einem Speicherbedarf und einer Genauigkeit gefunden werden kann. Insbesondere weisen alle der bereitgestellten optimalen Aktionen einen Fehler zu den in der (nicht approximierten) Abbildung hinterlegten optimalen Aktionen auf, der nicht größer ist als ein vorgegebener Fehlerschwellenwert.One of the advantages of the various aspects is that a compromise can be found between memory requirements and accuracy even in the case of large and, in particular, growing state spaces. In particular, all of the optimal actions provided have an error in relation to the optimal actions stored in the (non-approximated) mapping, which is not greater than a predetermined error threshold value.

Durch Vorgabe eines geeigneten Fehlerschwellenwertes kann insbesondere eine Größe der Nachschlagetabelle beeinflusst werden. Je kleiner der vorgegebene Fehlerschwellenwert ist, desto genauer sind die bereitgestellten optimalen Aktionen im Hinblick auf die korrespondierenden optimalen Aktionen in der bestimmten Abbildung. Gleichzeitig steigt mit kleinerem Fehlerschwellenwert aber auch ein Speicherplatzbedarf, da die Nachschlagetabelle hierdurch größer wird und mehr Speicherplatz benötigt.In particular, a size of the look-up table can be influenced by specifying a suitable error threshold value. The smaller the predetermined error threshold, the more accurate the provided optimal actions are with respect to the corresponding optimal actions in the particular mapping. At the same time, the smaller the error threshold value, the more storage space is required, since the look-up table becomes larger as a result and requires more storage space.

Es kann insbesondere vorgesehen sein, dass der Fehlerschwellenwert derart vorgegeben ist oder vorgegeben wird, dass ein vorgegebener Speicherplatz zum Aufnehmen der approximierten Abbildung und der Nachschlagetabelle nicht überschritten wird. Ein solcher Speicherplatz ist insbesondere durch eine Verwendung der approximierten Abbildung und der Nachschlagetabelle in einem Steuergerät in einem Fahrzeug oder einem Roboter begrenzt bzw. festgelegt.In particular, it can be provided that the error threshold value is predetermined or is predetermined in such a way that a predetermined memory space for recording the approximated image tion and the lookup table is not exceeded. Such a memory space is limited or fixed in particular by using the approximated mapping and the look-up table in a control device in a vehicle or a robot.

Ein Markow-Entscheidungsproblem (engl. Markov Decision Process, MDP) ist ein Modell von Entscheidungsproblemen. Hierbei ist ein Nutzen eines Agenten von einer Abfolge von Entscheidungen abhängig, wobei die Abfolge sequentielle Zustandsübergänge zwischen diskreten Zuständen in einem Zustandsraum umfasst. Für die einzelnen Zustandsübergänge gilt hierbei die Markow-Annahme, das heißt eine Übergangswahrscheinlichkeit, einen Zustand s' von Zustand s aus zu erreichen, ist nur von s abhängig und nicht von einer in der Vergangenheit liegenden Historie, das heißt von Vorgängern von s. Der Zustandsraum bildet insbesondere diskrete Zustände in einem Umfeld des Fahrzeugs oder des Roboters ab. Prinzipiell kann das Markow-Entscheidungsproblem auch als Factored-Markow-Entscheidungsproblem (engl. Factored Markov Decision Processes, FMDP) ausgestaltet sein.A Markov Decision Process (MDP) is a model of decision problems. Here, an agent's utility depends on a sequence of decisions, the sequence comprising sequential state transitions between discrete states in a state space. The Markov assumption applies to the individual state transitions, i.e. a transition probability of reaching a state s' from state s depends only on s and not on a history lying in the past, i.e. on predecessors of s In particular, state space depicts discrete states in an environment of the vehicle or the robot. In principle, the Markov decision problem can also be configured as a factored Markov decision problem (Factored Markov Decision Processes, FMDP).

Ein Zustand im Zustandsraum kann insbesondere mehrere Größen bzw. Eigenschaften umfassen, d.h. ein Zustand ist insbesondere mehrdimensional. Ein Zustand ist hierbei insbesondere definiert als eine bestimmte Ausprägung dieser Größen bzw. Eigenschaften. Die Zustände im Zustandsraum sind insbesondere diskret gewählt. Der Zustandsraum ist insbesondere ein Zustandsraum auf einer höheren Ebene, das heißt Zustände werden nicht über Sensorrohdaten abgebildet, sondern über höherwertigere Merkmale und Eigenschaften, die aus den Sensorrohdaten, beispielsweise mittels einer Objekt- und/oder Mustererkennung, abgeleitet wurden. Zustände können beispielsweise Hindernispositionen und/oder Hindernisgeschwindigkeiten und/oder eine Art oder Klasse von Hindernissen im Umfeld umfassen. Zumindest bei einer Anwendung im Fahrzeug wird ein Zustand insbesondere aus Sensordaten, die mittels mindestens eines Sensors erfasst wurden, abgeleitet.A state in the state space can in particular include a number of variables or properties, i.e. a state is in particular multi-dimensional. In this case, a state is defined in particular as a specific manifestation of these variables or properties. In particular, the states in the state space are chosen to be discrete. The state space is in particular a state space at a higher level, ie states are not mapped using raw sensor data, but rather using higher-value features and properties that were derived from the raw sensor data, for example by means of object and/or pattern recognition. For example, states can include obstacle positions and/or obstacle speeds and/or a type or class of obstacles in the environment. At least in the case of an application in the vehicle, a state is derived in particular from sensor data that was recorded using at least one sensor.

Zum Bestimmen der optimalen Aktionen für die Abbildung wird ausgehend von dem Markow-Entscheidungsproblem mindestens ein Optimierungsverfahren ausgeführt. Hierzu kann insbesondere vorgesehen sein, dass mittels Dynamischer Programmierung optimale Aktionswerte für diskretisierte Aktionen ausgehend von diskreten Zuständen im Zustandsraum bestimmt werden, wobei eine Abbildung mit Zuständen im Zustandsraum als Eingabewerten und mit Aktionswerten für Aktionen im Zustandsraum als Ausgabewerten mittels eines Reinforcement Learning-Verfahren gelernt wird, wobei ein Reinforcement Learning Agent hierbei auf Grundlage der mittels der Dynamischen Programmierung bestimmten optimalen Aktionswerte initialisiert wird, und wobei die gelernte Abbildung für eine Manöverplanung bereitgestellt wird. Dies hat als Vorteil, dass der Reinforcement Learning Agent beim Lernen nicht von Null auf beginnen muss, sondern bereits mit einer, zumindest hinsichtlich einer Anzahl diskreter Zustände im Zustandsraum, optimalen Lösung starten kann. Dies wird dadurch ermöglicht, dass optimale Aktionswerte für einzelne Aktionen für diskrete Zustände im Zustandsraum bereits vor Anwenden des Reinforcement Learning mittels einer Dynamischen Programmierung bestimmt werden. Mit Hilfe der derart bestimmten optimalen Aktionswerte wird die Abbildung, die von dem Reinforcement Learning Agenten angelernt wird, initialisiert. Der Reinforcement Learning Agent muss hierdurch nicht bei Null beginnen, sondern kann auf die mittels der Dynamischen Programmierung bestimmten Aktionswerte aufbauen.At least one optimization procedure is performed to determine the optimal actions for the mapping based on the Markov decision problem. For this purpose, it can be provided in particular that dynamic programming is used to determine optimal action values for discretized actions based on discrete states in the state space, wherein a mapping with states in the state space as input values and with action values for actions in the state space as output values is learned using a reinforcement learning method , a reinforcement learning agent being initialized on the basis of the optimal action values determined by means of dynamic programming, and the learned mapping being provided for maneuver planning. This has the advantage that the reinforcement learning agent does not have to start from scratch when learning, but can already start with an optimal solution, at least with regard to a number of discrete states in the state space. This is made possible by the fact that optimal action values for individual actions for discrete states in the state space are determined by means of dynamic programming before application of reinforcement learning. The mapping, which is learned by the reinforcement learning agent, is initialized with the aid of the optimal action values determined in this way. As a result, the Reinforcement Learning Agent does not have to start from scratch, but can build on the action values determined by means of dynamic programming.

Grundsätzlich kann auch nur eine Verwendung eines Reinforcement Learning-Verfahrens vorgesehen sein, ohne dass das Reinforcement-Learning-Verfahren mittels einer durch Dynamische Programmierung erzeugten Abbildung initialisiert wird. Hierbei ist die Vorgehensweise analog zu der voranstehend beschriebenen. Prinzipiell können jedoch auch andere Optimierungsverfahren vorgesehen sein. Das verwendete mindestens eine Optimierungsverfahren arbeitet jedoch stets auf Grundlage des Markow-Entscheidungsproblems.In principle, only one use of a reinforcement learning method can be provided, without the reinforcement learning method being initialized by means of a mapping generated by dynamic programming. The procedure here is analogous to that described above. In principle, however, other optimization methods can also be provided. However, the at least one optimization method used always works on the basis of the Markov decision problem.

Die Dynamische Programmierung ist ein Verfahren zum Lösen eines Optimierungsproblems durch Aufteilung eines komplexen Problems in einfachere Unter- oder Teilprobleme. Eine Lösung erfolgt hierbei auf rekursive Weise. Insbesondere ist die Dynamische Programmierung ein algorithmisches Paradigma, das eine Klasse von Optimierungsverfahren beschreibt, die zur Lösung einer vorgegebenen Problemstellung ein perfektes Modell eines Umfelds als Markow-Entscheidungsproblem verwenden. Die Dynamische Programmierung wird insbesondere in dem Zustandsraum mit diskretisierten Zuständen angewendet. Insbesondere liefert die dynamische Programmierung als Ergebnis optimale Aktionswerte als Maß für eine Belohnung für diskretisierte Aktionen ausgehend von den diskreten Zuständen im Zustandsraum.Dynamic programming is a technique for solving an optimization problem by breaking down a complex problem into simpler sub-problems. A solution takes place here in a recursive manner. In particular, dynamic programming is an algorithmic paradigm that describes a class of optimization methods that use a perfect model of an environment as a Markov decision problem to solve a given problem. Dynamic programming is used in particular in the state space with discretized states. In particular, dynamic programming results in optimal action values as a measure of a reward for discretized actions based on the discrete states in the state space.

Reinforcement Learning (auch als bestärkendes oder verstärkendes Lernen bezeichnet) ist ein Verfahren des Maschinellen Lernens, bei dem ein Agent selbständig eine Strategie erlernt, um erhaltene Belohnungen zu maximieren. Eine Belohnung kann hierbei sowohl positiv als auch negativ sein. Anhand der erhaltenen Belohnungen approximiert der Agent eine Belohnungsfunktion, die beschreibt, welchen Wert ein Zustand oder eine Aktion hat. Im Zusammenhang mit Aktionen wird ein solcher Wert als Aktionswert (engl. action value) bezeichnet. Verfahren des Reinforcement Learning betrachten insbesondere eine Interaktion des Agenten mit seiner Umwelt, die in Form eines Markow-Entscheidungsproblems formuliert ist. Der Agent kann ausgehend von einem gegebenen, beispielsweise aus erfassten Sensordaten mindestens eines Sensors abgeleiteten, Zustand durch eine aus mehreren Aktionen ausgewählte Aktion in einen anderen Zustand gelangen. In Abhängigkeit der getroffenen Entscheidung, d.h. der ausgeführten Aktion, erhält der Agent eine Belohnung (engl. reward). Der Agent hat hierbei die Aufgabe, einen zukünftig erwarteten Gewinn, der sich aus diskontierten Belohnungen, also der Gesamtbelohnung zusammensetzt, zu maximieren. Am Ende des Verfahrens steht für eine vorgegebene Strategie eine approximierte Belohnungsfunktion, mit der für jede Aktion ein Belohnungswert bzw. Aktionswert bereitgestellt oder geschätzt werden kann.Reinforcement learning (also known as reinforcing or reinforcement learning) is a machine learning technique in which an agent autonomously learns a strategy to maximize the rewards received. A reward can be both positive and negative. Based on the rewards received, the agent approximates a reward function that describes what value a state or action has. In the context of actions, such a value is referred to as an action value. Methods of reinforcement learning consider in particular an interaction of the agent with its environment, which is formulated in the form of a Markov decision problem. Starting from a given state, for example derived from detected sensor data of at least one sensor, the agent can change to another state by an action selected from a plurality of actions. Depending on the decision made, ie the action taken, the agent receives a reward. The agent has the task of maximizing an expected future profit, which is made up of discounted rewards, i.e. the total reward. At the end of the method, there is an approximate reward function for a given strategy, with which a reward value or action value can be provided or estimated for each action.

Es kann vorgesehen sein, dass das mindestens eine Optimierungsverfahren auf einer hierfür optimierten Berechnungseinrichtung ausgeführt wird, beispielsweise auf einem Quantencomputer.Provision can be made for the at least one optimization method to be executed on a calculation device optimized for this purpose, for example on a quantum computer.

Eine Aktion kann für ein Fahrzeug beispielsweise die folgenden Handlungen umfassen: Geradeausfahren mit aktiviertem Abstandsregeltempomat (ACC) (d.h. auf der Fahrspur bleiben und keinen Spurwechsel durchführen), Geradeausfahren (keine Beschleunigung), Geradeausfahren und Bremsen, Fahrspurwechsel auf die linke Fahrspur oder Fahrspurwechsel auf die rechte Fahrspur etc.For example, an action for a vehicle may include: driving straight ahead with adaptive cruise control (ACC) activated (i.e. staying in lane and not changing lanes), driving straight ahead (not accelerating), driving straight ahead and braking, changing lanes to the left lane, or changing lanes to the left lane right lane etc.

Eine optimale Aktion für einen gegebenen Zustand ist insbesondere eine Aktion mit einem optimalen Aktionswert, das heißt eine Aktion, für die in dem gegebenen Zustand mittels des mindestens einen Optimierungsverfahrens ein optimaler Aktionswert bestimmt wird oder bestimmt wurde.An optimal action for a given state is in particular an action with an optimal action value, ie an action for which an optimal action value is or was determined in the given state using the at least one optimization method.

Eine Belohnung (engl. reward) bzw. ein Aktionswert für eine Aktion im Zustandsraum kann insbesondere die folgenden Einflüsse berücksichtigen: eine Kollisionsvermeidung, eine Pfadtreue (d.h. kein oder nur ein geringes Abweichen von einem von einer Navigationseinrichtung vorgegebenen Pfad), ein zeitoptimales Verhalten und/oder einen Komfort bzw. eine Zweckmäßigkeit für Fahrzeuginsassen.A reward or an action value for an action in the state space can, in particular, take into account the following influences: collision avoidance, path fidelity (ie no or only slight deviation from a path specified by a navigation device), time-optimal behavior and/or or comfort or convenience for vehicle occupants.

Es ist insbesondere vorgesehen, dass die bestimmte Abbildung für eine vorgegebene Strategie (z.B. Energieeffizienz oder Komfort etc.), die über die Belohnungen bzw. die Aktionswerte beeinflusst wird, bestimmt wird oder bestimmt wurde. Dies bedeutet insbesondere, dass die in der bestimmten Abbildung hinterlegten optimalen Aktionen im Hinblick auf die vorgegebene Strategie optimal sind.In particular, it is provided that the specific mapping is determined or was determined for a predetermined strategy (e.g. energy efficiency or comfort, etc.) that is influenced via the rewards or the action values. This means in particular that the optimal actions stored in the specific mapping are optimal with regard to the given strategy.

Es ist insbesondere vorgesehen, dass die mittels des mindestens einen Optimierungsverfahrens, insbesondere mittels der Dynamischen Programmierung und des Reinforcement-Learning-Verfahrens, bestimmte Abbildung eine tabellenartige Form aufweist.In particular, it is provided that the mapping determined by means of the at least one optimization method, in particular by means of dynamic programming and the reinforcement learning method, has a table-like form.

Es kann alternativ insbesondere vorgesehen sein, dass die bestimmte Abbildung mittels eines Neuronalen Netzes bereitgestellt wird, wobei das Neuronale Netz zum Initialisieren ausgehend von den, insbesondere mittels der Dynamischen Programmierung, bestimmten optimalen Aktionen im Wege des überwachten Lernens trainiert wird.As an alternative, provision can be made in particular for the specific mapping to be provided by means of a neural network, with the neural network being trained for initialization on the basis of the optimal actions determined, in particular by means of dynamic programming, by way of supervised learning.

Teile der Vorrichtung, insbesondere die Aktionsbestimmungseinrichtung und die Approximierungseinrichtung, sowie das Steuergerät können einzeln oder zusammengefasst als eine Kombination von Hardware und Software ausgebildet sein, beispielsweise als Programmcode, der auf einem Mikrocontroller oder Mikroprozessor ausgeführt wird.Parts of the device, in particular the action determination device and the approximation device, and the control device can be designed individually or combined as a combination of hardware and software, for example as program code that runs on a microcontroller or microprocessor.

Ein Fahrzeug ist insbesondere ein Kraftfahrzeug. Prinzipiell kann ein Fahrzeug jedoch ein anderes Land-, Wasser-, Luft-, Schienen- oder Raumfahrzeug sein. Ein Roboter kann prinzipiell beliebig ausgebildet sein, beispielsweise als Transportroboter, als Produktionsroboter oder als Pflegeroboter etc.A vehicle is in particular a motor vehicle. In principle, however, a vehicle can be another land, water, air, rail or space vehicle. In principle, a robot can be designed in any way, for example as a transport robot, as a production robot or as a care robot, etc.

In einer Ausführungsform ist vorgesehen, dass das Bereitstellen ein Einladen der approximierten Abbildung und der Nachschlagetabelle in einen Speicher eines Steuergeräts mindestens eines Fahrzeugs oder mindestens eines Roboters umfasst, sodass beim Betreiben des mindestens einen Fahrzeugs oder des mindestens einen Roboters zum Bereitstellen von optimalen Aktionswerten für erkannte diskrete Zustände eines Zustandsraum mittels des Steuergeräts zuerst überprüft werden kann, ob für den erkannten Zustand eine optimale Aktion in der Nachschlagetabelle hinterlegt ist; falls dies der Fall ist, die hinterlegte optimale Aktion abgerufen und für die Manöverplanung bereitgestellt werden kann, anderenfalls die optimale Aktion mittels der approximierten Abbildung geschätzt und für die Manöverplanung bereitgestellt werden kann.In one embodiment it is provided that the provision includes loading the approximated mapping and the look-up table into a memory of a control unit of at least one vehicle or at least one robot, so that when the at least one vehicle or the at least one robot is operated, optimal action values for the recognized discrete states of a state space can first be checked by means of the control unit whether an optimal action is stored in the look-up table for the recognized state; if this is the case, the stored optimal action can be retrieved and made available for the maneuver planning, otherwise the optimal action can be estimated using the approximated mapping and made available for the maneuver planning.

Das Bereitstellen kann insbesondere ein Übermitteln der approximierten Abbildung und der Nachschlagetabelle an mindestens ein Steuergerät umfassen. Das Übermitteln erfolgt hierbei insbesondere mittels entsprechend hierfür eingerichteter Kommunikationsschnittstellen der Vorrichtung und des mindestens einen Steuergeräts. Das mindestens eine Steuergerät erhält, insbesondere empfängt, die approximierte Abbildung und die Nachschlagetabelle und lädt diese in einen Speicher, sodass diese zur Manöverplanung bereitgestellt werden können, insbesondere indem optimale Aktionen für erkannte Zustände abgerufen und/oder bereitgestellt werden können.The provision can in particular include a transmission of the approximated mapping and the look-up table to at least one control device. In this case, the transmission takes place in particular by means of communication interfaces of the device and of the at least one control unit that are set up accordingly for this purpose. The at least one control device receives, in particular receives, the approximated mapping and the look-up table and loads them into a memory so that they can be provided for maneuver planning, in particular by being able to retrieve and/or provide optimal actions for recognized states.

In einer Ausführungsform ist vorgesehen, dass zur Funktionsapproximation der bestimmten Abbildung mindestens ein Neuronales Netz trainiert und bereitgestellt wird. Das Neuronale Netz wird insbesondere mittels der, insbesondere mittels der Dynamischen Programmierung und des Reinforcement-Learning-Verfahrens, bestimmten Abbildung im Wege des überwachten Lernens trainiert. Wurde die bestimmte Abbildung bereits durch ein trainiertes Neuronales Netz ausgebildet, so ist insbesondere vorgesehen, dass das Neuronale Netz für die Funktionsapproximation vom Umfang und einer Komplexität her, d.h. von einer Struktur und einem benötigten Speicherbedarf und einer zum Ausführen benötigten Rechenleistung, kleiner ausgebildet ist als das zum Ausbilden der bestimmten Abbildung verwendete Neuronale Netz.One embodiment provides that at least one neural network is trained and provided for functional approximation of the specific mapping. The neural network is trained in particular by means of the mapping determined, in particular by means of dynamic programming and the reinforcement learning method, by way of monitored learning. If the specific mapping has already been formed by a trained neural network, it is provided in particular that the neural network for the function approximation is smaller in scope and complexity, ie in terms of a structure and a required memory requirement and a computing power required for execution the neural network used to form the particular mapping.

In einer alternativen Ausführungsform ist vorgesehen, dass zur Funktionsapproximation der Abbildung mindestens ein Entscheidungsbaum (engl. Decision Tree) verwendet wird. Das Vorgehen ist hierbei grundsätzlich analog zu der voranstehend beschriebenen Ausführungsform.In an alternative embodiment, at least one decision tree is used for the functional approximation of the mapping. The procedure here is basically analogous to the embodiment described above.

Grundsätzlich können auch andere Verfahren zur Funktionsapproximation der bestimmten Abbildung verwendet werden. Das Vorgehen ist hierbei grundsätzlich analog zu den voranstehend beschriebenen Ausführungsformen.In principle, other methods for function approximation of the specific mapping can also be used. The procedure here is basically analogous to the embodiments described above.

In einer Ausführungsform ist vorgesehen, dass das Bereitstellen der approximierten Abbildung und der Nachschlagetabelle mittels eines Backendservers durchgeführt wird. Hierdurch kann ein leistungsstarker Rechner, beispielsweise ein Supercomputer, dazu eingesetzt werden, die Abbildung auf Grundlage des vorgegebenen Markow-Entscheidungsproblems durch Ausführen des mindestens einen Optimierungsverfahrens, insbesondere der Dynamischen Programmierung und des Reinforcement-Learning-Verfahrens, zu bestimmen, die Abbildung zu approximieren und die Nachschlagetabelle zu erzeugen und bereitzustellen. Bei einer Anwendung der approximierten Abbildung und der Nachschlagetabelle in einem Steuergerät eines Fahrzeugs oder eines Roboters wird hingegen weniger Rechenleistung benötigt, sodass Ressourcen (z.B. Rechenleistung, Speicher, Bauraum und Energie) eingespart werden können.In one embodiment it is provided that the provision of the approximated mapping and the look-up table is carried out by means of a backend server. As a result, a powerful computer, for example a supercomputer, can be used to determine the mapping based on the specified Markov decision problem by executing the at least one optimization method, in particular dynamic programming and the reinforcement learning method, to approximate the mapping and generate and provide the lookup table. On the other hand, when the approximate mapping and the look-up table are used in a control unit of a vehicle or a robot, less computing power is required, so that resources (e.g. computing power, memory, installation space and energy) can be saved.

In einer Ausführungsform der Vorrichtung ist entsprechend vorgesehen, dass die Vorrichtung als Backendserver ausgebildet ist. Ein solcher Backendserver kann beispielsweise als leistungsstarker Supercomputer ausgebildet sein.In one embodiment of the device, provision is accordingly made for the device to be in the form of a backend server. Such a backend server can be designed, for example, as a powerful supercomputer.

Es wird weiter insbesondere auch ein Verfahren zum Planen eines Manövers für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter zur Verfügung gestellt, wobei eine gemäß einem Verfahren nach dem ersten Aspekt approximierte Abbildung und eine Nachschlagetabelle bei einer Manöverplanung verwendet werden.In particular, a method for planning a maneuver for an at least partially automated vehicle or a robot is also made available, with a mapping approximated according to a method according to the first aspect and a look-up table being used in maneuver planning.

Es kann vorgesehen sein, dass das Verfahren zum Planen des Manövers auch das Ausführen des Manövers durch Erzeugen und/oder Bereitstellen von Steuersignalen und/oder Steuerdaten für eine Aktorik des Fahrzeugs oder den Roboter, insbesondere für eine Quer- und Längsführung, umfasst. Das Erzeugen und Bereitstellen der entsprechenden Steuersignale und/oder Steuerdaten dient hierbei insbesondere einer Umsetzung der jeweils abgerufenen oder geschätzten optimalen Aktion. Ein Steuergerät des Fahrzeugs oder des Roboters ist entsprechend zum Ausführen dieser Maßnahmen ausgebildet.It can be provided that the method for planning the maneuver also includes the execution of the maneuver by generating and/or providing control signals and/or control data for actuators of the vehicle or the robot, in particular for lateral and longitudinal guidance. The generation and provision of the corresponding control signals and/or control data serves in particular to implement the respectively retrieved or estimated optimal action. A control unit of the vehicle or the robot is designed to carry out these measures.

Ferner wird insbesondere auch ein Fahrzeug oder ein Roboter geschaffen, umfassend mindestens ein Steuergerät nach einer der beschriebenen Ausführungsformen.Furthermore, in particular a vehicle or a robot is also created, comprising at least one control device according to one of the described embodiments.

Weiter wird auch ein System geschaffen, umfassend mindestens eine Vorrichtung gemäß einer der beschriebenen Ausführungsformen und mindestens ein Steuergerät gemäß einer der beschriebenen Ausführungsformen.Furthermore, a system is also created, comprising at least one device according to one of the described embodiments and at least one control unit according to one of the described embodiments.

Weitere Merkmale zur Ausgestaltung der Vorrichtung ergeben sich aus der Beschreibung von Ausgestaltungen des Verfahrens. Die Vorteile der Vorrichtung sind hierbei jeweils die gleichen wie bei den Ausgestaltungen des Verfahrens.Further features for the configuration of the device result from the description of configurations of the method. The advantages of the device are in each case the same as in the embodiments of the method.

Nachfolgend wird die Erfindung anhand bevorzugter Ausführungsbeispiele unter Bezugnahme auf die Figuren näher erläutert. Hierbei zeigen:

1 eine schematische Darstellung einer Ausführungsform der Vorrichtung zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter;
2 eine schematische Darstellung zur Verdeutlichung des Verfahrens zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter;
3 eine schematische Darstellung zur Verdeutlichung des Verfahrens zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter.

The invention is explained in more detail below on the basis of preferred exemplary embodiments with reference to the figures. Here show:

1 a schematic representation of an embodiment of the device for supporting a maneuver planning for an at least partially automated driving vehicle or a robot;
2 a schematic representation to clarify the method for supporting a maneuver planning for an at least partially automated driving vehicle or a robot;
3 a schematic representation to clarify the method for supporting a maneuver planning for an at least partially tomato moving vehicle or a robot.

In 1 ist eine schematische Darstellung einer Ausführungsform der Vorrichtung 1 zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug 50 gezeigt. Die Vorrichtung 1 führt insbesondere das in dieser Offenbarung beschriebene Verfahren zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug 50 aus. Das gezeigte Beispiel bezieht sich auf ein Fahrzeug 50, für einen Roboter ist die Vorrichtung 1 jedoch grundsätzlich analog ausgebildet.In 1 a schematic representation of an embodiment of the device 1 for supporting a maneuver planning for an at least partially automated vehicle 50 is shown. The device 1 executes in particular the method described in this disclosure for supporting a maneuver planning for a vehicle 50 driving at least partially automatically. The example shown relates to a vehicle 50, but the device 1 is basically designed analogously for a robot.

Die Vorrichtung 1 umfasst eine Aktionsbestimmungseinrichtung 2 und eine Approximierungseinrichtung 3. Die Aktionsbestimmungseinrichtung 2 und die Approximierungseinrichtung 3 können einzeln oder zusammengefasst als eine Kombination von Hardware und Software ausgebildet sein, beispielsweise als Programmcode, der auf einem Mikrocontroller oder Mikroprozessor ausgeführt wird. Die Vorrichtung 1 ist insbesondere als Backendserver 100 ausgebildet, wobei der Backendserver 100 insbesondere ein leistungsstarker Supercomputer sein kann.The device 1 comprises an action determination device 2 and an approximation device 3. The action determination device 2 and the approximation device 3 can be designed individually or together as a combination of hardware and software, for example as program code that runs on a microcontroller or microprocessor. The device 1 is designed in particular as a backend server 100, wherein the backend server 100 can in particular be a powerful supercomputer.

Die Aktionsbestimmungseinrichtung 2 ist dazu eingerichtet, einen Zustandsraum 10 eines Umfelds des Fahrzeugs 50 in diskreter Form mittels eines Markow-Entscheidungsproblems zu beschreiben. Die Aktionsbestimmungseinrichtung 2 führt zum Unterstützen einer Manöverplanung für das Fahrzeug 50 ausgehend von dem Markow-Entscheidungsproblem mindestens eines Optimierungsverfahrens aus. Das mindestens eine Optimierungsverfahren kann insbesondere eine Dynamische Programmierung und/oder ein Reinforcement-Learning-Verfahren umfassen.The action determination device 2 is set up to describe a state space 10 of an environment of the vehicle 50 in discrete form using a Markov decision problem. The action determination device 2 carries out at least one optimization method to support a maneuver planning for the vehicle 50 based on the Markov decision problem. The at least one optimization method can in particular include dynamic programming and/or a reinforcement learning method.

Im Rahmen des mindestens einen Optimierungsverfahrens bestimmt die Aktionsbestimmungseinrichtung 2 für jeden Zustand 11 im Zustandsraum 10 optimale Aktionen 34. Hierbei geht die Aktionsbestimmungseinrichtung 2 von Zuständen 11 im Zustandsraum 10 und von Aktionswerten aus, die für einzelne diskrete Aktionen im Zustandsraum 10 jeweils in Anbetracht einer vorgegebenen Strategie (z.B. Energieeffizienz oder Komfort etc.) bestimmt wurden. Die Aktionsbestimmungseinrichtung 2 bestimmt aus den bestimmten optimalen Aktionen 34 eine mit Zuständen 11 im Zustandsraum 10 als Eingabewerten und mit optimalen Aktionen 34 im Zustandsraum 10 als Ausgabewerten. Die bestimmte wird der Approximierungseinrichtung 3 zugeführt.As part of the at least one optimization method, the action determination device 2 determines optimal actions 34 for each state 11 in the state space 10. Here, the action determination device 2 assumes states 11 in the state space 10 and action values, which for individual discrete actions in the state space 10, in each case in consideration of a predefined strategy (e.g. energy efficiency or comfort etc.) were determined. The action determination device 2 determines one of the determined optimal actions 34 with states 11 in state space 10 as input values and with optimal actions 34 in state space 10 as output values. The certain is fed to the approximation device 3 .

Die Approximierungseinrichtung 3 ist dazu eingerichtet, die bestimmte mittels einer Funktionsapproximation zu approximieren. Hierbei ist vorgesehen, dass Elemente der approximierten , deren Ausgabewerte gegenüber den entsprechenden Ausgabewerten der bestimmten einen Fehler aufweisen, der einen vorgegebenen Fehlerschwellenwert 32 überschreitet, in Abhängigkeit der jeweils zugehörigen Eingangswerte in einer Nachschlagtabelle 33 abgelegt werden. Der Fehler wird insbesondere mittels eines geeigneten Abstandsmaßes zwischen einer von der bestimmten gelieferten optimalen Aktion und der entsprechenden Aktion, die von der approximierten geliefert wird, bestimmt. Insbesondere erfolgt nach dem Approximieren der bestimmten eine elementeweise Berechnung eines Fehlers zwischen der bestimmten und der approximierten , wobei alle Elemente der , jeweils miteinander verglichen werden. Für alle Elemente, bei denen der jeweils bestimmte Fehler den vorgegebenen Fehlerschwellenwert überschreitet, wird die zugehörige optimale Aktion, verknüpft mit dem zugehörigen Zustand 11 im Zustandsraum 10 in der Nachschlagetabelle 33 hinterlegt.The approximation device 3 is set up to determine the to be approximated by means of a function approximation. It is provided that elements of the approximated , whose output values are compared to the corresponding output values of the particular have an error that exceeds a predetermined error threshold value 32, are stored in a look-up table 33 depending on the respectively associated input values. The error is determined in particular by means of a suitable distance between one of the delivered optimal action and the corresponding action approximated by the will be delivered. In particular, after approximating the specific an element-wise calculation of an error between the particular one and the approximated , where all elements of the , be compared with each other. For all elements in which the error determined in each case exceeds the predetermined error threshold value, the associated optimum action, linked to the associated state 11 in the state space 10, is stored in the look-up table 33.

Über eine Kommunikationsschnittstelle 4 der Vorrichtung 1 werden die approximierte und die Nachschlagetabelle 33 zur Verwendung bei der Manöverplanung bereitgestellt. Hierbei ist vorgesehen, dass die approximierte und die Nachschlagetabelle 33 mittels der Kommunikationsschnittstelle 4 an mindestens ein Fahrzeug 50 übermittelt wird und dort mittels einer Kommunikationsschnittstelle 52 eines Steuergeräts 51 des Fahrzeugs 50 empfangen wird.Via a communication interface 4 of the device 1, the approximated and the look-up table 33 for use in maneuver planning. It is provided that the approximated and the look-up table 33 is transmitted to at least one vehicle 50 by means of the communication interface 4 and is received there by means of a communication interface 52 of a control unit 51 of the vehicle 50 .

Die approximierte und die Nachschlagetabelle 33 werden in einen Speicher (nicht gezeigt) des Steuergeräts 51 geladen und dort zur Manöverplanung verwendet. Zur Manöverplanung werden dem Steuergerät 51 aktuelle (diskretisierte) Zustände 11 aus dem Zustandsraum 10 des Markow-Entscheidungsproblems zugeführt. Die Zustände 11 werden insbesondere aus erfassten Sensordaten mindestens eines Sensors (nicht gezeigt) des Fahrzeugs 50, z.B. aus mittels einer Kamera erfassten Kamerabildern, abgeleitet und diskretisiert. In Abhängigkeit des zugeführten aktuellen Zustands 11 stellt das Steuergerät 51 optimale Aktionen 34 bereit. Hierzu überprüft das Steuergerät 51 zuerst, ob für den erkannten Zustand 11 eine optimale Aktion 34 in der Nachschlagetabelle 33 hinterlegt ist. Ist dies der Fall, so wird die hinterlegte optimale Aktion 34 aus der Nachschlagetabelle 33 abgerufen und für die Manöverplanung bereitgestellt. Ist für den erkannten Zustand 11 hingegen keine optimale Aktion 34 in der Nachschlagetabelle 33 hinterlegt, so wird eine optimale Aktion 34 mittels der approximierten geschätzt und für die Manöverplanung bereitgestellt.The approximated and the look-up table 33 are loaded into a memory (not shown) of the controller 51 and used there for maneuver planning. Current (discretized) states 11 from the state space 10 of the Markov decision problem are supplied to the control unit 51 for maneuver planning. The states 11 are derived and discretized in particular from sensor data recorded by at least one sensor (not shown) of the vehicle 50, for example from camera images recorded by a camera. Depending on the supplied current state 11, the control unit 51 provides optimal actions 34. For this purpose, control unit 51 first checks whether an optimal action 34 is stored in look-up table 33 for recognized state 11 . If this is the case, the stored optimal action 34 is retrieved from the look-up table 33 and made available for maneuver planning. If, on the other hand, no optimal action 34 is stored in the look-up table 33 for the recognized state 11, then an optimal action 34 is determined using the approximated estimated and made available for maneuver planning.

Das Bereitstellen der optimalen Aktionen 34 kann insbesondere umfassen, dass die optimalen Aktionen 34 einem weiteren Steuergerät 53 zugeführt werden, beispielsweise einem Trajektorienplaner, der zum Ausführen der optimalen Aktion 34 eine Trajektorie plant und beispielsweise einer Aktorik des Fahrzeugs zuführt.The provision of the optimal actions 34 can in particular include the optimal actions 34 being supplied to a further control unit 53, for example a trajectory planner, which plans a trajectory for executing the optimal action 34 and, for example, supplies it to an actuator system of the vehicle.

Es kann grundsätzlich auch vorgesehen sein, dass die Vorrichtung 1 Teil des Fahrzeugs 50 ist.In principle, it can also be provided that the device 1 is part of the vehicle 50 .

Es kann vorgesehen sein, dass zur Funktionsapproximation der mindestens ein Neuronales Netz trainiert und bereitgestellt wird. Die approximierte wird dann durch Anwenden des trainierten Neuronalen Netzes bereitgestellt. Alternativ können beispielsweise auch Decision Trees verwendet werden, um die bestimmte zu approximieren.It can be provided that the functional approximation at least one neural network is trained and provided. The approximated is then provided by applying the trained neural network. Alternatively, for example, decision trees can also be used to to approximate.

Es kann ferner vorgesehen sein, dass das Bereitstellen der approximierten und der Nachschlagetabelle 33 mittels eines Backendservers 100 durchgeführt wird.It can also be provided that the provision of the approximated and the look-up table 33 is performed by means of a backend server 100.

In 2 ist eine schematische Darstellung zur Verdeutlichung des Verfahrens zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter gezeigt. Gezeigt ist lediglich ein stark vereinfachtes Beispiel, das jedoch die Vorgehensweise beim Approximieren der bestimmten verdeutlicht.In 2 a schematic representation is shown to clarify the method for supporting a maneuver planning for an at least partially automated vehicle or a robot. Only a greatly simplified example is shown, which, however, describes the procedure for approximating the specific clarified.

Die bestimmte umfasst in diesem einfachen Beispiel eine Zuordnung zwischen Zuständen 11 eines Zustandsraumes und optimalen Aktionen 34. Im gezeigten Beispiel weisen die Zustände 11 nur zwei Dimensionen „A“ und „B“ auf. Die optimalen Aktionen 34 weisen ebenfalls nur zwei Ausprägungen „r“ und „g“ auf. Dies ist stark vereinfacht. Reale Zustände 11 können eine Vielzahl an Dimensionen aufweisen und reale optimale Aktionen 34 weisen ebenfalls eine Vielzahl an Ausprägungen auf.The certain includes in this simple example an association between states 11 of a state space and optimal actions 34. In the example shown, the states 11 have only two dimensions “A” and “B”. The optimal actions 34 also have only two characteristics “r” and “g”. This is greatly simplified. Real states 11 can have a large number of dimensions and real optimal actions 34 also have a large number of characteristics.

Im Rahmen des in dieser Offenbarung beschriebenen Verfahrens wird die bestimmte , das heißt die jeweiligen zustandsabhängigen optimalen Aktionen 34, mittels mindestens eines Optimierungsverfahrens bestimmt, insbesondere mittels Dynamischer Programmierung und einem Reinforcement-Learning-Verfahren.As part of the method described in this disclosure, the specific , that is to say the respective state-dependent optimal actions 34, are determined using at least one optimization method, in particular using dynamic programming and a reinforcement learning method.

Die bestimmte wird durch eine Funktionsapproximation approximiert, welche hier beispielhaft und stark vereinfacht einer Klassifizierung in die Ausprägungen „r“ und „g“ in Abhängigkeit der Dimensionen „A“ und „B“ entspricht. Das Klassifizieren kann insbesondere mittels eines Neuronalen Netzes erfolgen, dass mittels der bestimmten trainiert wird, um ausgehend von den gegebenen Zuständen 11 die jeweiligen optimalen Aktionen 34 („r“ oder „g“) zu schätzen. Ergebnis dieses Trainings ist eine approximierte , die das schätzen der optimalen Aktionen 34 in Abhängigkeit der Zustände 11 (mit den Dimensionen „A“ und „B“) erlaubt.The certain is approximated by a function approximation, which here, as an example and greatly simplified, corresponds to a classification into the characteristics "r" and "g" depending on the dimensions "A" and "B". The classification can be done in particular by means of a neural network that by means of the specific is trained in order to estimate the respective optimal actions 34 ("r" or "g") based on the given states 11. The result of this training is an approximated , which allows the optimal actions 34 to be estimated as a function of the states 11 (with dimensions “A” and “B”).

Es werden zusätzlich auch Ausreißer 35 bestimmt, das heißt diejenigen Kombinationen aus Zustand 11 und optimaler Aktion 34, die sich mittels der approximierten nicht richtig bzw. nicht genau genug erfassen lassen. In dem einfachen Beispiel, in dem es für die optimalen Aktionen nur die beiden Ausprägungen „r“ und „g“ gibt, sind dies Ausreißer 35, wo eine optimale Aktion „g“ in einem Bereich liegt, für den die approximierte die optimale Aktion „r“ schätzt und Ausreißer 35, wo eine optimale Aktion „r“ in einem Bereich liegt, für den die approximierte die optimale Aktion „g“ schätzt. Die Differenz zwischen der geschätzten optimalen Aktion und der optimalen Aktion 34 in der bestimmten wird als Fehler betrachtet. Für jedes Element der bestimmten wird dieser Fehler bestimmt und mit einem Fehlerschwellenwert verglichen. Im vorliegenden einfachen Beispiel ist der Fehlerschwellenwert als eine falsche Ausprägung definiert. Bei einer realen Anwendung des Verfahrens entspricht der vorgegebene Fehlerschwellenwert einem vorgegebenen Differenzschwellenwert zwischen optimalen Aktionen 34, wobei hierbei ein jeweils geeignetes Abstandsmaß (z.B. Skalarprodukt, wenn die Aktionen als Vektoren ausgedrückt werden können etc.) zur Anwendung kommt.In addition, outliers 35 are also determined, ie those combinations of state 11 and optimal action 34 which are approximated by means of the cannot be recorded correctly or precisely enough. In the simple example in which there are only the two forms “r” and “g” for the optimal actions, these are outliers 35 where an optimal action “g” lies in a range for which the approximated estimates the optimal action "r" and outlier 35, where an optimal action "r" lies in a range for which the approximated estimates the optimal action "g". The difference between the estimated optimal action and the optimal action 34 in the determined is considered an error. For each item of the specific this error is determined and compared to an error threshold value. In this simple example, the error threshold is defined as a wrong level. In a real application of the method, the predetermined error threshold value corresponds to a predetermined difference threshold value between optimal actions 34, with a respective suitable distance measure (eg dot product if the actions can be expressed as vectors, etc.) being used.

Für die Ausreißer 35 wird eine Nachschlagetabelle 33 erzeugt, in denen die Verknüpfungen zwischen den Zuständen 11 und den optimalen Aktionen 34 hinterlegt werden. Die vorhandenen Elemente dieser Nachschlagetabelle 33 entsprechen hierbei der bestimmten . Die Nachschlagetabelle 33 umfasst nur die Einträge für die Ausreißer 35, für andere Zustände 11 sind keine Einträge vorhanden.A look-up table 33 is generated for the outliers 35, in which the links between the states 11 and the optimal actions 34 are stored. The existing elements of this look-up table 33 here correspond to the specified ones . The lookup table 33 includes only the entries for the outliers 35, for other states 11 there are no entries.

Es kann alternativ auch vorgesehen sein, dass ein Aktionswert für eine mittels der approximierten für einen Zustand 11 geschätzte optimale Aktion bestimmt wird. Zum Bestimmen des Aktionswertes kann eine mittels der approximierten geschätzte optimale Aktion beispielsweise mit Aktionen verglichen werden, die im Rahmen der Dynamischen Programmierung zum Auffinden der optimalen Aktion 34 für den zugehörigen diskreten Zustand 11 bestimmt wurden. Der geschätzten optimalen Aktion wird dann der Aktionswert derjenigen dieser Aktionen zugeordnet, die der geschätzten optimalen Aktion am nächsten kommt (in einem einfachen Beispiel umfassen beide Aktionen beispielsweise eine gleichgroße Beschleunigung des Fahrzeugs um 2 m/s^2). Der auf diese Weise der geschätzten optimalen Aktion zugeordnete Aktionswert kann dann mit dem Aktionswert der in der bestimmten hinterlegten optimalen Aktion 34 verglichen werden. Der Aktionswert der optimalen Aktion 34 kann beispielsweise ebenfalls im Rahmen der Dynamischen Programmierung erhalten werden. In Abhängigkeit eines als Fehlerschwellenwertes vorgegebenen Differenzschwellwertes für die Aktionswerte kann dann entschieden werden, ob die approximierte die Kombination aus Zustand 11 und optimaler Aktion 34 der bestimmten richtig abbildet oder nicht. Unterschreitet eine Differenz zwischen dem zugeordneten Aktionswert der geschätzten optimalen Aktion und dem Aktionswert der in der bestimmten hinterlegten optimalen Aktion 34 den Differenzschwellenwert, wird die optimale Aktion 34 für den Zustand 11 über die approximierte geschätzt. Erreicht oder überschreitet die Differenz den Differenzschwellenwert, so wird die optimale Aktion 34 für den Zustand 11 in der Nachschlagetabelle 33 hinterlegt. Die alternative Vorgehensweise kann allgemein angewandt werden und ist nicht auf das beschriebene einfache Beispiel beschränkt.Alternatively, it can also be provided that an action value for a for a state 11 estimated optimal action is determined. To determine the action value, a using the approximated estimated optimal action can be compared, for example, with actions that were determined in the context of dynamic programming to find the optimal action 34 for the associated discrete state 11. The estimated optimal action is then assigned the action value of that action that comes closest to the estimated optimal action (in a simple example, both actions include, for example, an equal acceleration of the driving around 2 m/s^2). The action value assigned in this way to the estimated optimal action can then be compared with the action value in the determined stored optimal action 34 are compared. The action value of the optimal action 34 can also be obtained within the framework of dynamic programming, for example. Depending on a difference threshold value for the action values, which is specified as an error threshold value, it can then be decided whether the approximated the combination of state 11 and optimal action 34 of the particular correctly depicted or not. If a difference between the assigned action value of the estimated optimal action and the action value in the determined stored optimal action 34 the difference threshold value, the optimal action 34 for the state 11 is approximated via the estimated. If the difference reaches or exceeds the difference threshold value, then the optimal action 34 for the state 11 is stored in the look-up table 33 . The alternative procedure can be applied in general and is not limited to the simple example described.

Es kann in der Alternative weiterbildend vorgesehen sein, Aktionswerte für die mittels der approximierten geschätzten optimalen Aktionen ebenfalls zu schätzen, beispielsweise durch ein entsprechend hierfür eingerichtetes und trainiertes Neuronales Netz. Die geschätzten Aktionswerte können dann - wie voranstehend beschrieben - mit einem Aktionswert der jeweils zugehörigen in der bestimmten hinterlegten optimalen Aktion 34 verglichen werden, um mittels eines Differenzschwellenwertes zu entscheiden, ob für den zugehörigen Zustand 11 die optimale Aktion 34 geschätzt werden oder in der Nachschlagetabelle 33 hinterlegt werden soll.It can be provided in the alternative as a further development, action values for the approximated by means of the to also estimate estimated optimal actions, for example by means of a neural network set up and trained accordingly for this purpose. The estimated action values can then - as described above - with an action value of the respectively associated in the specific stored optimal action 34 are compared in order to decide by means of a difference threshold value whether the optimal action 34 is to be estimated for the associated state 11 or is to be stored in the look-up table 33 .

Die approximierte und die Nachschlagetabelle 33 werden zur Manöverplanung bereitgestellt, insbesondere in einen Speicher eines Steuergeräts mindestens eines Fahrzeugs oder mindestens eines Roboters eingeladen.The approximated and the look-up table 33 are provided for maneuver planning, in particular loaded into a memory of a control unit of at least one vehicle or at least one robot.

In 3 ist eine schematische Darstellung zur Verdeutlichung des Verfahrens zum Unterstützen einer Manöverplanung für ein zumindest teilautomatisiert fahrendes Fahrzeug oder einen Roboter, wie es in einem Steuergerät in einem Fahrzeug oder einem Roboter ausgeführt wird, gezeigt. Hierbei wird das mit Bezug auf die 2 beschriebene Beispiel zur Verdeutlichung weiterverwendet.In 3 is a schematic representation to clarify the method for supporting a maneuver planning for an at least partially automated driving vehicle or a robot, as it is executed in a control device in a vehicle or a robot. This is with reference to the 2 described example used for clarification.

Mittels eines Steuergeräts des Fahrzeugs oder des Roboters wird eine approximierte und eine Nachschlagetabelle 33 erhalten und/oder bereitgestellt. Es kann beispielsweise vorgesehen sein, dass die approximierte und die Nachschlagetabelle 33 mittels eines Backendservers erzeugt wurden und an das Steuergerät übermittelt wurden. Die approximierte und die Nachschlagetabelle 33 werden in den Speicher des Steuergeräts geladen und von diesem für eine Manöverplanung bereitgestellt.Using a control unit of the vehicle or the robot is an approximated and a look-up table 33 obtained and/or provided. It can be provided, for example, that the approximated and the look-up table 33 were generated by means of a backend server and were transmitted to the control unit. The approximated and the look-up table 33 are loaded into the controller's memory and made available by the controller for maneuver planning.

Für einen aktuellen, beispielsweise ausgehend von erfassten Sensordaten erkannten und diskretisierten, Zustand 11 im Zustandsraum 10 wird überprüft, ob für diesen Zustand eine optimale Aktion 34 in der Nachschlagetabelle 33 hinterlegt ist. Ist dies der Fall, so wird die hinterlegte optimale Aktion 34 abgerufen und für eine Manöverplanung bereitgestellt (z.B. wäre dies der Fall für A=0 und B=10 mit der optimalen Aktion „g“). Stellt sich beim Überprüfen heraus, dass keine optimale Aktion 34 in der Nachschlagetabelle 33 hinterlegt ist (z.B. für A=10 und B=20), so wird die optimale Aktion 34 mit Hilfe der approximierten geschätzt und für eine Manöverplanung bereitgestellt.For a current state 11 in state space 10 , for example recognized and discretized based on detected sensor data, it is checked whether an optimal action 34 is stored in look-up table 33 for this state. If this is the case, the stored optimal action 34 is retrieved and made available for maneuver planning (eg this would be the case for A=0 and B=10 with the optimal action “g”). If it turns out during checking that no optimal action 34 is stored in the look-up table 33 (eg for A=10 and B=20), then the optimal action 34 is determined using the approximated estimated and made available for maneuver planning.

Anschließend wird die optimale Aktion 34 ausgeführt, beispielsweise indem mittels eines Trajektorienplaners eine Trajektorie geplant wird und mittels einer Regelung eine Aktorik des Fahrzeugs oder des Roboters angesteuert wird.The optimal action 34 is then carried out, for example in that a trajectory is planned using a trajectory planner and an actuator of the vehicle or the robot is controlled using a controller.

Bezugszeichenlistereference list

11: Vorrichtungcontraption
22: Aktionsbestimmungseinrichtungaction determiner
33: Approximierungseinrichtungapproximation facility
44: Kommunikationsschnittstellecommunication interface
1010: Zustandsraumstate space
1111: ZustandStatus
3030: bestimmte Abbildungspecific figure
3131: approximierte Abbildungapproximate figure
3232: Fehlerschwellenwerterror threshold
3333: Nachschlagetabellelookup table
3434: optimale Aktionoptimal action
3535: AusreißerRunaway
5050: Fahrzeugvehicle
5151: Steuergerätcontrol unit
5252: Kommunikationsschnittstellecommunication interface
5353: weiteres Steuergerätanother control unit
100100: Backendserverbackend server

Claims

Method for supporting a maneuver planning for an at least partially automated driving vehicle (50) or a robot, a state space (10) of an environment of the vehicle (50) or of the robot being described in discrete form by means of a Markov decision problem by means of an action determination device (2), wherein to support a maneuver planning for the vehicle (50) or the robot based on the Markov decision problem by executing at least one optimization method, optimal actions are determined based on discrete states (11) in the state space (10), wherein a mapping (30) with states (11) is determined in the state space (10) as input values and with optimal actions (34) in the state space (10) as output values, wherein the specific mapping (30) is approximated by a function approximation by means of an approximation device (3), wherein elements of the approximated mapping (31) whose output values have an error compared to the corresponding output values of the specific mapping (30) which has a predetermined error threshold value (32 ) exceeds, are stored in a look-up table (33) depending on the respective associated input values, and wherein the approximated map (31) and look-up table (33) are provided for use in maneuver planning.

procedure after claim 1 , characterized in that the provision comprises loading the approximated mapping (31) and the look-up table (33) into a memory of a control unit (51) of at least one vehicle (50) or at least one robot, so that when the at least one vehicle (50 ) or the at least one robot for providing optimal actions (34) for identified discrete states (11) of a state space (10) by means of the control unit (51) it can first be checked whether an optimal action (34 ) is stored in the look-up table (33); if this is the case, the stored optimal action (34) can be retrieved and made available for the maneuver planning, otherwise the optimal action (34) can be estimated using the approximated mapping (31) and made available for the maneuver planning.

procedure after claim 1 or 2 , characterized in that at least one neural network is trained and provided for functional approximation of the mapping (30).

Method according to one of the preceding claims, characterized in that the provision of the approximated mapping (31) and the look-up table (33) is carried out by means of a backend server (100).

A method for supporting maneuver planning for an at least partially automated vehicle (50) or a robot, with a control unit (51) of the vehicle (50) or the robot using a method according to one of Claims 1 until 4 generated approximated mapping (31) and a look-up table (33) are obtained and/or provided, and optimal actions (34) are provided for maneuver planning as a function of a recognized discrete state (11) of a state space (10), with this being checked first whether an optimal action (34) is stored in the look-up table (33) for the recognized state (11); if this is the case, the stored optimal action (34) is retrieved and made available for the maneuver planning, otherwise an optimal action (34) is estimated using the approximated mapping (31) and made available.

Method for planning a maneuver for an at least partially automated driving vehicle (50) or a robot, wherein a method according to one of Claims 1 until 4 approximated map (31) and a look-up table (33) can be used in maneuver planning.

Device (1) for supporting maneuver planning for an at least partially automated vehicle (50) or a robot, comprising an action determination device (2) and an approximation device (3), wherein the action determination device (2) is set up to define a state space (10) of a To describe the environment of the vehicle (50) or the robot in discrete form using a Markov decision problem, to support maneuver planning for the vehicle (50) or the robot based on the Markov decision problem by executing at least one optimization method optimal actions based on discrete states (11) in the state space (10), to determine a mapping (30) with states (11) in the state space (10) as input values and with optimal actions (34) in the state space (10) as output values, and wherein the approximation means ( 3) is set up to display the specific image (30) using a function app roximation, wherein elements of the approximated image (31) whose output values have an error compared to the corresponding output values of the specific image (30) which predetermines a exceeds an error threshold value (32), are stored in a lookup table (33) depending on the respective associated input values, and the device (1) is set up to use the approximated mapping (31) and the lookup table (33) for maneuver planning to provide.

Device (1) after claim 7 , characterized in that the device (1) is designed as a backend server (100).

Control unit (51) for an at least partially automated driving vehicle (50) or a robot, wherein the control unit (51) is set up to, according to a method according to one of Claims 1 until 4 to receive and/or provide the generated approximated mapping (31) and a look-up table (33), and to provide optimal actions (34) for maneuver planning depending on a recognized discrete state (11) of a state space (10), and to check this first, whether an optimal action (34) is stored in the look-up table (33) for the recognized state (11); if this is the case, to call up the stored optimal action (34) and make it available for the maneuver planning, otherwise to estimate an optimal action (34) using the approximated mapping (33) and make it available for the maneuver planning.

Vehicle (50) or robot, comprising at least one control unit (51). claim 9 .