WO2020048623A1

WO2020048623A1 - Estimation of a pose of a robot

Info

Publication number: WO2020048623A1
Application number: PCT/EP2018/074232
Authority: WO
Inventors: Panji Setiawan; Miguel CRISTOBAL; Claudiu CAMPEANU
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-09-07
Filing date: 2018-09-07
Publication date: 2020-03-12
Anticipated expiration: 2021-03-07
Also published as: CN112639502A; CN112639502B

Abstract

The present disclosure relates to an apparatus and method for estimating a pose of a robot wherein a current pose estimate of the pose of the robot is determined based on a first pose estimate, that is based on a current pose distribution or a second pose estimate or a combination of the first pose estimate and the second pose estimate, and wherein a contribution of the first pose estimate to the current pose estimate and a contribution of the second pose estimate to the current pose estimate are determined based on the current pose distribution. Consequently, the strength of each pose estimate is combined in the current pose estimate. Furthermore, improved particle filtering methods and systems are described wherein the weights of the particles are updated based on similarity scores between a set of reference features and a set of observed features, the set of observed features being detected in an environment of the robot from sensor data of one or more sensors of the robot. Using similarity scores solves the problem of dynamic objects in the environment of the robot and increases feature discriminability.

Description

ESTIMATION OF A POSE OF A ROBOT

The present disclosure relates to estimating the pose of a robot, in particular of a vehicle, in a robust and efficient way. The disclosed systems and methods may be used for real-time localization of a vehicle.

BACKGROUND

Mobile robot localization is becoming increasingly important in robotics as robot systems operate in increasingly unstructured environments. Nowadays applications of mobile robot systems include as diverse applications as mobile platforms for planetary exploration, underwater vehicles for deep-sea exploration, robotic vehicles in the air or in confined spaces, such as mines, cars that travel autonomously in urban environments, and androids operating in highly dynamic environments involving interaction with human beings.

Mobile robots in these and other applications have to be operated in environments that are inherently unpredictable, where they often have to navigate in an environment composed of static and dynamic objects. In addition, even the location of the static objects is often unknown or known only with uncertainty. It is therefore crucial to localize the robot with high precision, generally using sensor data from sensors of the robot and/or external sensors. The problem of localization involves estimating a robot’s coordinates and often its orientation, together forming the so-called pose, in an external reference frame or global coordinate system from the sensor data, often using a map of the environment.

To account for the inherent uncertainties of this localization process, which include unavoidable measurement errors and sensor noise, generally a probabilistic approach, wherein the estimate of the robot’s momentary pose, also called belief, is represented by a probability density function over the space of all locations, and potentially orientations, the so- called state space.

A frequently used probabilistic approach to the localization problem involves recursive Bayesian estimation, also known as a Bayes filter. Using a Bayes filter, the probability density function of a robot is continuously updated based on the most recently acquired sensor data or observation. The recursive algorithm consists of two parts: prediction and update. The true state X of the robot is assumed to be an unobserved Markov process, and the measurements Z are the observed states of a hidden Markov model. The prediction step uses a system model p(X_t\X_t- ) .also called motion model, to predict the probability distribution function p(X_t \ Z_{1:t i}) , the so-called current prior, at time t given the previous observations Z_1:t-1 from the previous probability distribution function p

at time t - 1 , the so-called previous posterior, wherein the predicted probability distribution function is spread due to noise. The update step updates the prediction in light of the new observation data to calculate the current probability distribution function p(Z_t|Z_1:t) given the observations Z_1:t up to the current time, the so-called current posterior.

The current posterior is proportional to the product of the measurement likelihood p(Z_t\X_t) and the current prior pQi^Z^^) normalized by the evidence piZ^Z^^) . Through the likelihood p(Z_t \X_t) , a measurement model expressing the conditional probability of observation Z_t given the true state X_t at time t enters the calculation. From the current posterior p(Z_t|Z_1:t) , an optimal estimate )C_t for the true state X_t at time t , i.e. an estimate for the pose of the robot, can be determined, for instance by determining the maximum of the current probability distribution function or applying a minimum mean-square error (MMSE) approach. The pose estimate may then be used to operate the robot in its environment.

If the system and measurement models are linear and the posterior is Gaussian, the Bayes filter turns into a Kalman filter. For a nonlinear system with additive noise, a local linearization using a first-order Taylor series expansion may be used to provide an extended Kalman filter (EKF).

An extended Kalman filter localization algorithm is for instance described in Chapter 7 of the book“Probabilistic Robotics” by S. Thrun, W. Burgard, and D. Fox, The MIT Press, 2018. The motion model, i.e. the system model, rO^Ii^,Z^) for the state transition probability given the previous state X_t-t and the control data u_t is implemented either using a velocity motion model wherein the control data u_t is given by velocities or using an odometry motion model wherein the control data u_t is replaced by sensor measurements. Furthermore, the motion model is extended by a map m of the environment of the robot to create a map-based motion model piX_tlu^ X_f ^ m) that may be approximately factorized as in Equation (1 ):

piXt ut^t^ m) = ri pCZ_tli^Z_t- pCZ_tlm) (1 ) where h is a normalizing factor. The second term p(X_t\m) expresses the“consistency” of pose or state X_t with the map m.

The measurement model that describes the formation processes by which sensor measurements are generated in the physical world is also extended by the map m of the environment to define a conditional probability distribution function p(Z_t\X_t, m where X_t is the robot pose and Z_t is the measurement at time t.

Both, the velocity motion model and the odometry motion model are subject to noise that leads to a growing uncertainty as the robot moves. In addition, robot odometry is generally subject to drift and slippage such that there is no fixed coordinate transformation between the coordinates used by the robot’s internal odometry and the physical world coordinates. Determining the pose of a robot relative to a given map of the environment generally increases the certainty of the pose estimate but significantly adds to the computational complexity of the underlying algorithm. As a result, commonly known algorithms for the mobile robot localization problem generally cannot be executed in real time and often suffer from loss of precision as the robot moves.

This is particularly problematic in the context of autonomous vehicles where a high precision or accuracy of the vehicle positioning or localization is crucial for safety reasons. In addition, the economical aspect plays a big role to ensure a feasible commercial deployment in vehicles that is usually implemented as part of the Advanced Driver-Assistance Systems (ADAS) solution.

The recent review paper“A Survey of the State-of-the-Art Localization Techniques and Their Potentials for Autonomous Vehicle Applications” by S. Kuutti, S. Fallah, K. Katsaros, M. Dianati, F. Mccullough, and A. Mouzakitis in IEEE Internet of Things Journal, vol. 5, no. 2, pp. 829-846, April 2018 reviews the state-of-the-art vehicle localization systems to support autonomous driving. Low-cost systems mostly use sensors such as GPS (Global Positioning System), IMU (Inertial Measurement Unit), and (mono or stereo) cameras. The problem with these low-cost systems is that their accuracy is rather low. By way of example, the percentage of absolute mean errors below 1 m is 80% and the averaged absolute mean error is 1.43 m in the longitudinal direction or the driving direction and 0.58 m in the lateral direction. The Internet of Vehicles (loV) Planning and Control Division, however, requires a maximum error of 1 m in the longitudinal direction and of 0.5 m in the lateral direction. Although low-cost systems are appealing as the involved sensors are in most cases already embedded in today’s vehicles, the lack of accuracy prevents widespread implementation in autonomous vehicles.

Furthermore, current state-of-the-art systems only estimate the pose of the vehicle with respect to two-dimensional planar coordinates and potentially the bearing of the vehicle, i.e. at maximum three degrees of freedom. In a potentially unknown three-dimensional terrain, estimation of the full 6 degrees of freedom (DoF) pose, i.e. a three-dimensional position and a three-dimensional orientation, for instance including roll, pitch, and yaw, is desirable. Finally, sensor data from GPS sensors is frequently unavailable due to blockage of the GPS signal by buildings or trees and odometry measurements using an IMU sensor suffer from an inherent drift. Consequently, a robust vehicle localization method and system are needed to fulfill the requirements with respect to safety of autonomous vehicles.

SUMMARY OF INVENTION

The present disclosure offers a way to significantly improve the performance of the above- mentioned low-cost systems and thereby solves the above described technical problems. The disclosed methods and systems in particular provide several improvements in the framework of Bayesian filtering and increase the robustness of the localization process.

The disclosed methods and systems not only allow to perform localization in several (e.g., six) degrees of freedom (DoF), but can also be configured for running at a rate of 10 Hz or more and are therefore suitable for real-time implementations. They can be used for determining the pose of an autonomous or non-autonomous vehicle or other kind of robot. Their potential field of application is thus not limited to autonomous driving. Each of the devices referred to herein as an ..apparatus" may be a system of cooperating devices. The apparatus may comprise processing circuitry configured to perform the various data or signal processing operations associated with the respective apparatus. Those operations are described in detail below. The processing circuitry may be a combination of software and hardware. For example, the processing circuitry may comprise one or more processors and a non-volatile memory in which program code executable by the one or more processors is stored. The program code causes the processing circuitry to perform the respective operations when executed by the one or more processors.

According to one aspect of the present disclosure, an apparatus for estimating a pose of a robot is provided, wherein the apparatus is configured to determine a current pose estimate of the robot based on a first pose estimate or a second pose estimate or a combination of the first pose estimate and the second pose estimate, wherein the first pose estimate is based on a current pose distribution of the robot; and wherein a contribution of the first pose estimate to the current pose estimate and a contribution of the second pose estimate to the current pose estimate are determined based on the current pose distribution. Thus a more accurate and more reliable pose estimate can be obtained. In an embodiment, the apparatus is configured to determine a current pose estimate as a weighted sum of multiple pose estimates, each of the multiple pose estimates having a respective weight in the weighted sum, wherein the multiple pose estimates include a first pose estimate, which is based on a current pose distribution, and one or more further pose estimates, and wherein the weights of the multiple pose estimates are based on the current pose distribution.

The second pose estimate may be based on one or more of the following: prediction from one or more previous pose estimates, or a global pose estimate derived from at least one of sensor data from a position sensor and sensor data from an orientation sensor. The prediction may include dead reckoning.

The contribution of the first pose estimate and the contribution of the second pose estimate may be determined based on a confidence measure of the current pose distribution, in particular with regard to the first pose estimate.

Upon determination that the confidence measure of the current pose distribution exceeds a threshold, only the first pose estimate contributes to the current pose estimate.

According to a further aspect, the apparatus may be further configured to adapt the threshold based on the confidence measure of the current pose distribution. The threshold may be adapted repeatedly, e.g., periodically or continuously.

According to a further aspect, the threshold may be increased in response to the confidence measure of the current pose distribution being significantly higher than the threshold, or the threshold may be decreased in response to the confidence measure of the current pose distribution being significantly lower than the threshold. The confidence measure may be considered to be significantly higher than the threshold when the confidence measure exceeds the threshold plus a non-negative first offset. The first offset may be zero. Similarly, the confidence measure may be considered to be significantly lower than the threshold when the confidence measure is lower than the threshold minus a non-negative second offset. The second offset may be zero.

A transition from increasing the threshold to decreasing the threshold and vice versa may be delayed by a respective delay time.

The contribution of the first pose estimate and the contribution of the second pose estimate may alternatively be determined based on confidence measures of the respective pose estimates.

According to another aspect of the present disclosure, an apparatus for estimating a pose of a robot is provided, wherein the apparatus is configured to determine a plurality of current hypothetical poses of the robot, in particular using prediction; for each of the plurality of current hypothetical poses, to determine a weight; and to determine a current pose estimate of the robot based on the plurality of current hypothetical poses and their weights, wherein for each of the plurality of current hypothetical poses, determining the weight comprises calculating a similarity score which is a measure of similarity between a set of reference features and a set of observed features. The set of observed features may comprise features detected in an environment of the robot. The features may be detected by one or more sensors of the robot. The sensors may include remote sensing means, e.g., a video camera, a radar sensor, a sonar sensor, or a combination of these. The apparatus may comprise processing circuitry to perform the pose estimation. Thus a reliable pose estimate can be obtained.

Each reference feature and each observed feature may comprise one or more feature descriptors.

Each reference feature and each observed feature may comprise one or more feature classes, and, for each feature class, a probability value, and calculating the similarity score may be based on the one or more feature classes of the reference features and their probability values and the one or more feature classes of the observed features and their probability values.

According to a further aspect, each feature class may be associated with a class of real-world elements. The class of real-world elements may be, for example, „tree“, „sky“, „person“, „ vehicle", or„building“.

According to a further aspect, each reference feature may further comprise a space-fixed (SF) positional coordinate and each detected feature may further comprise a body-fixed (BF) positional coordinate, the BF positional coordinate being defined relative to the robot, wherein calculating the similarity score comprises mapping between the SF positional coordinate and the BF positional coordinate on the basis of a current hypothetical pose. A coordinate may be multi-dimensional. For example, it may be a point in a two-dimensional space (e.g., corresponding to the earth’s surface) or a three-dimensional space (e.g., the three-dimensional space on the earth’s surface).

The weights of the current hypothetical poses may be determined based on a distribution of the similarity scores when the distribution fulfills a reliability condition. The distribution may be a frequency distribution or a normalized frequency distribution of the similarity scores.

According to a further aspect, the weights of the current hypothetical poses may be determined independently of the distribution of the similarity scores when the distribution does not fulfill the reliability condition.

According to a further aspect, the apparatus may further comprise at least one of a position sensor and an orientation sensor, wherein the weights of the current hypothetical poses are further adapted based on a global pose estimate derived from at least one of sensor data of the position sensor and sensor data of the orientation sensor. The position sensor or the orientation sensor or both may be based, for example, on vision, sound, radar, satellite signals, inertia, or a combination thereof.

According to another aspect of the disclosure, an apparatus for estimating a pose of a robot is configured to: generate a first pose distribution of the robot based on one or more first navigational measurements, generate a second pose distribution of the robot based on the first pose distribution and on a current instance of a refined pose distribution, generate a next instance of the refined pose distribution based on the second pose distribution and on one or more second navigational measurements, and determine a pose estimate of the robot based on the next instance of the refined pose distribution. Thus a new distribution peak can be added to the existing pose distribution. In a situation in which the existing pose distribution (i.e. the current instance of the refined pose distribution) is erroneous (e.g., due to errors in sensor readings or absence of sensor readings, for example after a period of lack of camera or satellite data), the presence of the new, added peak in the next instance of the refined distribution can enable the apparatus to“recover”, i.e. to find an accurate, new pose estimate. Thus a reliable pose estimate can be obtained.

According to a further aspect, the current instance and the next instance of the refined pose distribution are each represented by a set of hypothetical poses and associated weights, wherein the set representing the current instance and the set representing the next instance comprise the same number of hypothetical poses.

According to a further aspect, in generating the second pose distribution, the current instance of the refined pose distribution contributes more to the second pose distribution than the first pose distribution. For example, the second pose distribution may be a weighted sum of the first pose distribution and the current instance of the refined pose distribution, wherein the current instance of the refined pose distribution has a greater weight than the first pose distribution. For example, the current instance of the refined pose distribution and the first pose distribution may have relative weights of 1 minus X (e.g., 0.95) and X (e.g., 0.05), respectively, wherein X is less than 0.5. For example, the current instance of the refined pose distribution may be represented by a set of, e.g., 95 samples (i.e. 95 hypothetical poses) while the first pose distribution is represented by a set of, e.g., 5 samples, with all the samples (a total of 100 samples in this example) having the same sample weight (e.g., 0.01 ). In this example, the second pose distribution can then be taken as the set consisting of all the samples from the two sets. According to a further aspect, the apparatus is configured to generate the first pose distribution independently of the refined pose distribution.

According to a further aspect, the apparatus is configured to generate the one or more first navigational measurements by one or more of the following: satellite-based pose estimation, inertia-based pose estimation, vision-based pose estimation, or user input. The first navigational measurements may comprise a global pose estimate. The global pose estimate may be derived, for example, from at least one of sensor data from a position sensor and sensor data from an orientation sensor.

According to a further aspect, the apparatus is configured to generate the one or more second navigational measurements by one or more of the following: satellite-based pose estimation, inertia-based pose estimation, vision-based pose estimation, or odometric pose estimation.

According to one aspect of the present disclosure, a robot, in particular a vehicle, especially an autonomous vehicle, is provided that comprises the apparatus according to any one of the above aspects.

According to another aspect of the present disclosure, a method for estimating a pose of a robot is provided, wherein the method comprises determining a current pose estimate of the robot based on a first pose estimate or a second pose estimate or a combination of the first pose estimate and the second pose estimate, wherein the first pose estimate is based on a current pose distribution of the robot, and wherein a contribution of the first pose estimate and a contribution of the second pose estimate to the current pose estimate are determined based on the current pose distribution. The method may further comprise determining the current pose distribution, in particular using particle filtering.

The second pose estimate may be determined by one or more of the following: prediction from one or more previous pose estimates, or deriving a global pose estimate from at least one of sensor data from a position sensor and sensor data from an orientation sensor.

Upon determination that the confidence measure of the current pose distribution exceeds a threshold, only the first pose estimate contributes to the current pose estimate. According to a further aspect, the method may further comprise adapting the threshold based on the confidence measure of the current pose distribution. The threshold may be adapted repeatedly, e.g., periodically or continuously.

According to one aspect of the present disclosure, a method for estimating a pose of a robot is provided, wherein the method comprises determining a plurality of current hypothetical poses of the robot, in particular using prediction; for each of the plurality of current hypothetical poses, determining a weight; and determining a current pose estimate of the robot based on the plurality of current hypothetical poses and their weights, wherein for each of the plurality of current hypothetical poses, determining the weight comprises calculating a similarity score which is a measure of similarity between a set of reference features and a set of observed features. The set of observed features may comprise features detected in an environment of the robot. The features may be detected by one or more sensors of the robot. The sensors may include remote sensing means, e.g., a video camera, a radar sensor, a sonar sensor, or a combination of these.

Each reference feature and each observed feature may comprise one or more feature classes, and, for each feature class, a probability value, and calculating the similarity score may be based on the one or more feature classes of the reference features and their probability values and the one or more feature classes of the observed features and their probability values. According to a further aspect, each feature class may be associated with a class of real-world elements. The class of real-world elements may be, for example, „tree“, „sky“, „person“, „ vehicle", or„building“.

According to afurther aspect, the method may comprise further adapting the weights of the current hypothetical poses are further adapted based on a global pose estimate derived from at least one of sensor data of a position sensor and sensor data of an orientation sensor. The position sensor or the orientation sensor or both may be based, for example, on vision, sound, radar, satellite signals, inertia, or a combination thereof.

According to one aspect, a method of estimating a pose of a robot comprises: generating a first pose distribution of the robot based on one or more first navigational measurements, generating a second pose distribution of the robot based on the first pose distribution and on a current instance of a refined pose distribution, generating a next instance of the refined pose distribution based on the second pose distribution and one or more second navigational measurements, and determining a pose estimate of the robot based on the next instance of the refined pose distribution.

According to one aspect of the present disclosure, a computer-readable medium is provided for storing instructions that when executed on a processor cause the processor to perform a method according to any one of the above described aspects. BRIEF DESCRIPTION OF THE DRAWINGS

In the following, exemplary embodiments are described in more detail with reference to the attached figures and drawings, in which:

Figure 1 shows a basic particle filtering process used to introduce the present disclosure. Figure 2 shows the relevant steps of the basic particle filtering process of Figure 1. Figure 3 shows a modified particle filtering process according to the present invention as the basic framework of the present disclosure.

Figure 4 shows the main steps of the modified particle filtering process according to

Figure 3 including pose estimation according to a first embodiment of the present invention.

Figure 5 shows the main steps of the modified particle filtering process according to

Figure 3 including pose estimation according to a second embodiment of the present invention.

Figure 6 depicts the temporal behavior of a threshold for the confidence measure of the current pose distribution for a test case based on real data.

Figure 7 shows details of a first phase of the weight update of the correction block of

Figures 4 and 5.

Figure 8 shows details of a second phase of the correction block according to a first embodiment of the weight update processing.

Figure 9 shows details of the second phase of the correction block according to a second embodiment of the weight update processing.

Figure 10 shows details of the second phase of the correction block according to a third embodiment of the weight update processing.

Figure 1 1 shows a vehicle with a localization system according to the present disclosure.

DETAILED DESCRIPTION

The present disclosure relates to the general technical field of mobile robot localization, in particular to the real-time localization of vehicles, especially autonomous vehicles. It offers a way to significantly improve the performance of low-cost systems by improving the mapping step, stabilizing the pose estimates and making the underlying algorithm ready for real-time application.

More specifically, the present disclosure provides several improvements in the framework of Bayesian filtering as described above with respect to the mobile robot localization problem.

The basic particle filtering process used for the implementation of a Bayes filter is shown in Figure 1 to introduce the present disclosure. The depicted particle filtering process is based on the well-known Monte Carlo localization, also known as particle filter localization, that localizes a mobile robot using a particle filter. The process uses a particle filter to represent the distribution of likely states, with each particle representing a possible state, i.e. , a hypothesis of where the robot is, also called a hypothetical pose of the robot.

The posterior probability distribution function (also called probability density function) or posterior belief bel(X_t ) = p(X_t|Z_1:t) is represented by a set of randomly chosen weighted samples (particles) {Z_t ^m,w_t”¾₌₁ with M hypothetical poses {Z_t”¾₌₁ and corresponding weights {w™}"₌₁ . For a very large number M of samples, the characterization becomes an equivalent representation of the true probability distribution function. The particle filtering approach can represent any arbitrary distribution and can keep track of several hypothetical poses at the same time. The particles are resampled based on recursive Bayesian estimation.

In the prediction step 1 10, the current prior bel(X_t )

is determined by applying a simulated motion to each of the particles {Z™₁}"₌₁ at time -1 . According to the present disclosure, an odometry motion model rO^IO^Z^) based on an odometry measurement O_t of at least one corresponding sensor of the robot at time t is used to obtain a set of predicted samples {Z_t ^m}"₌₁ based on the last set of samples

.

The odometry measurement O_t is obtained from one or more odometry sensors of the robot to estimate a change in the position and/or the orientation of the robot over time. The one or more odometry sensors may be provided to measure the change in at least one positional coordinate and/or the change in at least one angular coordinate, such as pitch, roll, and yaw of the robot. Typical examples for odometry sensors are motion sensors such as wheel encoders, rotational encoders, linear encoders, speedometers, accelerometers, gyroscopes, and inertial measurement units (IMU). IMUs may be used to simultaneously determine up to 6 DOFs, i.e., the full three-dimensional position and the full three-dimensional orientation of the pose of the robot. In addition, a vision-based sensor, including a remote sensing means, e.g., a video camera, a radar sensor, a sonar sensor, or a combination of these, may be used to calculate odometry using a technique called visual odometry. Generally, odometry sensors may be provided to determine changes in the pose of the robot with the same dimensionality as the pose. Odometry sensors may include internal sensors provided with the robot that do not require measurements of the environment.

In addition to the motion sensors, the odometry measurement O_t may alternatively or additionally include the determination of at least one positional coordinate of the robot using one or more satellite-based sensors of the robot, i.e, external sensors. Such satellite-based sensors determine the global position of the robot, i.e., the position of the robot relative to a global coordinate system, using global navigation satellite systems (GNSS), such as GPS, GLONASS, BeiDou, and Galileo.

In the update step 120, the current weights {w™}"₌₁ , also called current importance weights, are updated from the previous weights {w^₁}^₌₁ of the previous posterior based on the measurement model p(Y_t \X_t, Y_t ^MAP) . For each current hypothetical pose of the robot, i.e. , each predicted particle X™ , the probability p(Y_t|Z™, y_t ^AMP) that, had the robot been at the state of the particle, the robot would perceive what its sensors have actually sensed, is calculated. Then, a current weight w™ proportional to said probability is assigned to each predicted particle wherein a normalization constant a is applied to normalize the weights.

In the basic process according to Figure 1 , the measurement model is exclusively based on a mapping between a set of observed features Y_t and a set of reference features Y_t ^MAP determined from a map of the environment. Details of this map matching process will be described below with reference to Figures 7 to 10. The observed features Y_t are extracted from the sensor data of at least one vision-based sensor of the robot as also described in more detail below. Typical examples for vision-based sensors, also referred to as remote sensing means in the following, are mono and stereo cameras, radar sensors, light detection and ranging (LiDAR) sensors, e.g., using pulsed lasers, ultrasonic sensors, infrared sensors, or any other sensors adapted to provide imaging measurements of the environment of the robot. The sensor data output by such vision-based sensors may be analyzed to extract the above mentioned features Y_t . The map may in particular contain information about landmarks, lane markings, buildings, curbs, and the road geometry. If multiple vision-based sensors are used, different maps based on different frequency ranges, such as optical and radio, may be used.

The current hypothetical poses {Z™}"₌₁ and the corresponding current weights {w™}"₌₁ may further be subjected to resampling in step 130 to avoid degeneracy of the probability distribution function. In the importance resampling step 130 of Figure 1 , a new set of samples jt_erati_on or frame t + 1 is generated by resampling particles

based on the current belief bel(X_t) , i.e., the current pose distribution or current posterior p(X_t|Z_1:t). As known in the art, particles with high importance weights are multiplied during resampling and particles with low importance weights are eliminated to generate the new set of samples. As a result, the number M of particles is kept constant during the localization process and the weights of the particles are kept finite. Here and in the following, the terms “current hypothetical poses” and“current weights” refer to the particles and corresponding weights before and after the importance resampling, as applicable, as the resampling step largely maintains the probability distribution function.

Based on the current hypothetical poses and the respective current weights, a current pose of the robot for the current iteration of frame t may be estimated by applying a (global or local) maximum a posteriori estimation method. Alternatively, a minimum mean-square error criterion may be used. Furthermore, a simple average (mean) and a mean within a window around the maximum a posteriori estimate (robust mean) may be used to determine the current pose estimate X_t.

Figure 2 shows the relevant steps of the basic particle filtering process of Figure 1. The blocks 210, 220, and 230 share the same concepts as in Figure 1 and are denoted as Prediction, Correction, and Resampling. The sequence of prediction 210, correction 220, and resampling 230 is iterated for each time step or each frame t of the vision-based sensor data. The inputs 1 to 4 of each iteration are highlighted by dashed frames in Figure 2. Outputs 1 1 , 12, 13, and 15 of the blocks 210, 220, 230, and 250 are also shown in the figure. The output 13 represents a refined pose distribution. The refined pose distribution is used to compute a current pose of the robot. In this disclosure, the values of a pose distribution at different points in time may be referred to as instances of the pose distribution. The current instance (i.e. the most recent available instance) of the refined pose distribution may also be referred to herein as the current pose distribution.

Starting from the particle filtering representation of the previous posterior probability distribution function, i.e., the previous particles

and the corresponding previous weights {w^₁}(^₌₁ , a motion model using the odometry measurement O_t is applied in the prediction step 210 to determine predicted samples f_t ^m}m=i Using the predicted samples {xP}m=i and the previous weights

as input, the correction step 220 performs map matching between observed features Y_t and map or reference features Y_t ^MAP to determine updated weights {w_t ^m}"₌₁ . The predicted samples and corresponding updated weights are then resampled in resampling step 230 to produce resampled particles

corresponding balanced weights {w™}"₌₁ = 1/M . Based on these resampled current hypothetical poses and corresponding current weights, a current pose estimate X_t is determined in the pose estimation step 250. Mobile robots localization often has to be performed in dynamic environments where other objects and/or subjects than the robot may change their location or configuration over time. Examples of more persistent changes that may affect the pose estimation are people, changing daylight, movable furniture, other vehicles, in particular parked vehicles, doors, and the like. These dynamic objects are generally not represented by reference features in the, generally static, reference map and may therefore cause mapping errors when performing the above described update step. In addition, features of different objects, such as the edge of a table or a chair, may not be discriminable using standard feature vectors. The presence of dynamic objects in real-world environments and the general issue of observation discriminability cause mismatches during the map matching process. Therefore, the above described process may lead to incorrect pose estimates.

Furthermore, the particle filtering process as described above with respect to Figures 1 and 2 may be unstable in some situations due to its correction mechanism. Finally, a satellite-based sensor signal may not be available everywhere. By way of example, tall buildings, tunnels, and trees may shield a GPS sensor of a vehicle from at least some of the GPS satellites. In at least some urban areas, sufficient GPS signals are therefore generally not available. As a result, odometry-based prediction is prone to drift errors. It is therefore desirable to also enable the use of the above described particle filtering process in those areas where satellite-based sensor signals are not available.

To solve the above described technical problems, the present disclosure modifies the particle filtering process as shown in Figure 3. The prediction step 310 that determines a set of predicted samples based on the odometry measurement O_t of at least one corresponding odometry sensor of the robot at time t and the last set of samples

is the same as the prediction step 1 10 in Figure 1 such that a repeated description is omitted for the sake of clarity.

The update step 320, however, is extended as compared to the update step 120 by taking additional measurements Z_t at time t into account when determining the updated weights K¾=i- In particular, observation data Z_t from at least one satellite-based sensor and/or at least one inertia-based sensor may be taken into account to include corresponding probability scores. As shown in Figure 3, the current weights may be determined according to Equation (2) as follows:

{wn^= = a - p(Y_t, Z_t\xr y_t ^MAP)

= a p(Y_t\X?, Y_t ^MAP ) p(Z_t\xn (2) wherein a denotes a normalization factor, Y_t denotes a set of observed features, and Y_t ^MAP denotes a set of reference features. Furthermore, the pose of the robot is written as vector

X_t = (xP°^sltwn ' Xro^ta^twn^ _wjj , t denoting the transpose, xP°^sltwn denoting one, two, or three positional coordinates such as x, y, and z coordinates in a global coordinate system, and x[^otatwn denoting one, two, or three rotational coordinates, such as pitch, roll, and yaw with respect to the orientation of a global coordinate system. Likewise, z ^0Sltwn denotes a measurement of a corresponding number of positional coordinates in the global coordinate system, and z °^tatwn denotes a measurement of a corresponding number of rotational coordinates in the global coordinate system. Consequently, the measurement vector can be written as z_t = (z_t ^vosition,z_t ^rotation)^T .

The measurement Z_t refers to a measurement of the global pose using at least one satellite- based sensor and/or at least one inertia-based sensor. By way of example, the position ^^pos^ition _may k_{e measure}d using a position sensor such as a GPS sensor and/or an accelerometer. Similarly, the orientation z °^tatwn may be measured using a rotation sensor such as a gyroscope. Using an inertial measurement unit, the complete global pose Z_t may be measured up to 6 DOFs. In addition, multiple measurements from different sensors may be included in the determination of the updated weights. The measured sensor data may be submitted to a filtering process before being used in the determination.

The resampling step 130 of Figure 1 is further modified as shown in the resampling step 330 of Figure 3 by generating only a fraction of the total number M of particles, such as 95%, from the current belief bel(X_t) while some of the particles, such as 5%, are resampled using the global pose measurement Z_t and/or a global pose measurement G_t that is not based on satellite-based sensor data. The global pose measurement G_t may for instance, be based on image processing using vision-based sensor data and/or based on inertia-based sensor data. The global pose measurement G_t may in particular be independent of the global pose measurement Z_t , i.e., based on sensor data not included in the global pose measurement Z_t. As the global pose G_t is not derived from satellite-based sensor data, the global pose and the correspondingly resampled particles can also be determined if the satellite-based sensor of the robot cannot receive GPS signals.

Figure 4 shows the main steps of the modified particle filtering process according to Figure 3 including pose estimation according to a first embodiment of the present invention. As in the basic particle filtering process according to Figure 2, the modified particle filtering process according to Figure 4 comprises a loop iterating over time or frame number t. The inputs 1 to 5 of this iteration are shown in Figure 4 in dashed boxes. In addition, a previous pose estimate X_t-t at time t - 1 is provided as input 6 to the pose prediction step 460. The modified process according to the first embodiment of the present disclosure produces the outputs 1 1 to 14, 15a, 15b, 18, and 19 as shown in Figure 4.

Starting from the particle filtering representation of the previous posterior probability distribution function, i.e. , the previous particles

and the corresponding previous weights {w^₁}^₌₁ , a motion model using the odometry measurement O_t is applied in the prediction step 410 as in the prediction step 210 to determine predicted samples { ™}"₌₁ . Using the predicted samples { _t ^m}"₌₁ and the previous weights {w_t™₁}"₌₁ as input, the correction step 420 performs map matching between observed features Y_t and map or reference features Y_t ^MAP to determine updated weights {w™}"₌₁ . In addition to the map matching, the correction step 420, however, takes into account a global pose measurement

Z_t = (z^p0Sltwn , Zt°^tatwn) as described above with respect to Figure 3, using at least one satellite-based sensor and/or at least one inertia-based sensor and/or a global pose measurement G_t = ^QP°^sltwn _i Qrotatwn^ that is performed without the use of satellite-based sensors.

The predicted samples and corresponding updated weights are then resampled in resampling step 230 to produce resampled particles j _t ^m'^{resamP e} } and corresponding balanced weights = 1/M. However, different from the resampling step 230 of the basic particle filtering process, resampling step 430 only generates a reduced number M - N of particles from the current belief. The remaining N particles are determined independently of the current pose distribution using a recovery particle generation step 480 as shown in Figure 4. These recovered particles {X™}m=_M-N+1 and their respective weights {w_t ^m}"_=M-w+1 are created from a global pose measurement Z_t =

and/or a global pose measurement G_t = ( QP°^sltwn _> Q-rotatwn^ ^gj^. j_s performed without the use of satellite-based sensors. As described above, the global pose measurement G_t may be acquired using image processing based on vision-based sensor data and/or using inertia-based sensor data. According to one particular embodiment, the global pose measurement G_t may be based exclusively on vision-based sensor data.

The set of resampled particles {x™'^resamP^led

j_s supplemented in augmentation step 485 with the particles {C™} .=M-N+1 recovered in step 480 from the global pose measurement(s) to generate the current hypothetical poses {X™}m=i and their respective current weights {w“}"_{= i} that will be provided as input 1 to the next iteration of the loop. The set of predicted samples { ™}"₌₁ may be reduced to M - N samples by first ordering the predicted samples according to their weights {w™}"₌₁ and then discarding samples with the smallest weights.

Adding N particles which are sampled from other reliable global poses Z_t and/or G_t with the corresponding importance weights {w™}"_=M-w+1 determined based on the reliability, in particular a covariance, of those poses, ensures that pose estimates which are not based on the particle filtering process will be considered. The presence of such pose estimates can improve the accuracy of the overall localization process and help in cases such as relocalization, missing or bad GPS signals.

From the current pose distribution

before resampling 430 or the current pose distribution x™-^resamv^led _^

after resampling 430, a confidence measure of the current pose distribution, expressing the confidence that the pose of the robot can be unambiguously determined from the pose distribution, is calculated in step 440. The confidence measure may be expressed as the posterior probability of the current pose estimate X_t = f(bel(X_t )) derived as a function / of the current belief, i.e. the posterior probability ( X_t ) . As described above, the function may be the maximum a posteriori estimate. Another possibility for the function / is to identify several clusters representing local maxima of the posterior probability, wherein the pose estimate is selected to be the weighted average of the most probable local maximum cluster.

When using such a clustering method, the confidence measure or posterior probability w_t of the current pose estimate may be calculated as the accumulated weights w_t ^c = {w_t ^c,m}“^c ₌₁ of the most probable local maximum cluster according to Equation (3):

wherein M_c is the number of weights w_t ^c,m from the current pose distribution that belong to the local maximum cluster.

Based on the confidence measure of the current pose distribution, the localization process according to the first embodiment shown in Figure 4 may output a first pose estimate 15a that is based on the current pose distribution, a second pose estimate that may in particular, be determined independent of the current pose distribution, or a combination of the first and the second pose estimates. The first pose estimate X_t = f{bel(X_t )) is determined in the pose estimation step 450 from the current pose distribution {x™'^resamP^led _^

as described above wherein the order of the calculation of the confidence measure 440 and the pose estimation 450 may be inverted. In addition to outputting the current pose estimate 15a, the current pose estimate derived from the current pose distribution is stored in a storage space such as a memory unit in step 455.

The present localization process according to the first embodiment may additionally determine an independent second pose estimate 15b by prediction in step 460. The prediction may be made based on one or more previous pose estimates and/or using other global pose estimates. By way of example, the second pose estimate 15b may be extrapolated from two or more previous pose estimates. In a particular embodiment, dead reckoning may be performed to determine the second pose estimate X_t = g(o_t,X_t_ i) based on the odometry measurement O_t and the stored previous pose estimate X_t-t . Here, the function g denotes a deterministic motion model that predicts the pose X_t of the robot at time t from the pose X_t-t of the robot at time t - 1 by applying the changes in the pose derived from the odometry measurement O_t . The second pose estimate 15b may also be stored in the storage 455.

According to the specific embodiment shown in Figure 4, either the first pose estimate 15a or the second pose estimate 15b is output as the current pose estimate X_t . The present disclosure is, however, not limited to this alternative output but may output a combination of the first pose estimate 15a and the second pose estimate 15b wherein the contribution of the first pose estimate and the second pose estimate to the combination may be determined based on the confidence measure of the current pose distribution. By way of example, the combined pose estimate X_t may be determined according to Equation (4):

X_t = w_tX + (1 - W_f)X } (4)

wherein x represents the first pose estimate 15a and X _t ² represents the second pose estimate 15b.

In the state-of-the-art, the method of dead reckoning has been used for the prediction step of a particle filter as for instance described in E.J. Krakiwsky, C.B. Harris and R.V.C. Wong,“A Kalman filter for integrating dead reckoning, map matching and GPS positioning”, Position Location and Navigation Symposium, 1988, Record. Navigation into the 21^st Century, IEEE PLANS’88, IEEE, Orlando, FL, 1988, pp. 39-46. Using dead reckoning in the prediction step only, however, does not solve the above-mentioned instability problem of the first pose estimate. The localization process according to the first embodiment therefore combines the independent second pose estimate 15b with the first pose estimate 15a or replaces the first pose estimate 15a with the second pose estimate 15b based on the confidence measure of the current pose distribution. According to the specific embodiment depicted in Figure 4, the first pose estimate 15a is used as the current pose estimate X_t if the confidence measure w_t of the current pose distribution exceeds a threshold

that may be time dependent. Otherwise, the second pose estimate 15b may be used as the current pose estimate X_t . Also, a combination of the second pose estimate 15b derived from dead reckoning with a further pose estimate, that may be independently derived, may be used if the confidence measure is smaller than or equal to the threshold. Here and in the following, it is assumed that both the confidence measure and the threshold are positive scalars. The above described condition may be implemented for general scalars or even vectors by calculating the absolute value or applying a norm before evaluating the condition.

As mentioned above, the threshold

may be time dependent and initialized with a predetermined value at time t = 0 . The threshold may then be repeatedly adapted, e.g., periodically or continuously, according to a function f_gw , in particular based on the confidence measure of the current pose distribution. By way of example, the threshold may be increased or decreased based on whether the confidence measure of the current pose distribution exceeds the threshold or not. The threshold may for instance be changing according to Equation (5):

wherein ff_w (t^up) is a monotonically increasing function of the time t^up and fgS^wn{t^down ) is a monotonically decreasing function of the time t^down . A possible structure for both functions may be of the form of an exponential function as in Equation (6):

wherein 0^_ffset denotes a fixed offset, A_ew denotes a size of the change, and c signifies a decay factor. The threshold may also be only adapted if the confidence measure is higher than the threshold by a non-negative first offset value or lower than the threshold by a non-negative second offset value and else being kept constant else. A transition from increasing the threshold to decreasing the threshold and vice versa may be delayed by a respective delay time. Such an added delay applied to the adaptive threshold function serves the purpose of avoiding spurious instantaneous changes between the pose estimates that may cause jumps with respect to the pose of the robot. The delay will force an estimate to continuously be used within a time period At before switching to a different pose estimate.

An example for applying delay times to the adaptation of the threshold is given in Equation (7):

wherein t^up is reset to 0 when the pose estimation based on dead reckoning is active and t^d°wn j_s reset to 0 when the pose estimation based on particle filtering is active. t^up is increased when w_t >

is satisfied, otherwise t^down is increased.

Figure 6 shows the temporal behavior of the threshold

for the confidence measure of the current pose distribution for a test case based on real data. Two exemplary phases for determining the pose estimate based on particle filtering and based on dead reckoning are indicated in the figure by dashed vertical lines. During the particle filtering phase, the threshold Q is increased according to the function fgw (t^up) . Equivalently, the threshold Q is decreased during the dead reckoning phase according to the function fgS^wn{t^down ). Determining from a confidence measure of the current pose distribution to interchangeably apply a first pose estimate based on particle filtering and a second pose estimate based on dead reckoning or another estimation process independent of the particle filtering combines the strength of both estimates. Dead reckoning is known to be a stable pose estimate within a short period of time, but to be prone to the“drifting” phenomenon if applied over a longer period of time. Particle filtering on the other hand, is free from such a“drifting” phenomenon but suffers from jumps or instability, i.e. a significant change of the pose estimate, due to unreliable measurements or observation updates in the correction step. Combining two pose estimates from different approaches as in the present embodiment increases the overall stability of the localization process.

Figure 5 shows the main steps of the modified particle filtering process according to Figure 3 including pose estimation according to a second embodiment of the present invention. The method steps annotated with the same reference signs as in Figure 4 are identical to those described above with respect to the first embodiment and are therefore not described again. Different from the first embodiment as shown in Figure 4, the second embodiment according to Figure 5, however, always determines both, the first pose estimate 15a based on particle filtering in pose estimation step 450 and the second pose estimate 15b based on prediction, in particular dead reckoning, in prediction step 460. The first and second pose estimates are herein determined as described above with respect to Figure 4.

According to the second embodiment, the current pose estimate 16 is calculated as a combination of the first pose estimate 15a and the second pose estimate 15b and output in step 470. The respective contributions of the first pose estimate and the second pose estimate to this combination are determined based on confidence measures of the respective pose estimates in step 470. By way of example, the current pose estimate X_t may be determined according to Equation (8):

X_t = w + w ? (8)

wherein denotes the first pose estimate 15a and X denotes the second pose estimate 15b and the respective contributions are determined from covariance estimates aj² and s| with respect to the corresponding pose estimates. According to a particular example, the respective contributions may be determined according to Equation (9) as follows:

The covariance estimates o and s| can be considered as confidence measures of the respective pose estimates.

The covariance of a pose estimate can for instance, be estimated by first taking the Jacobian of the reprojection error cost function e around its convergence point as shown in Equation (10):

Secondly, a monotonically increasing function h is applied to the inverse of the Jacobian and the residual r , i.e. , the error value at its convergence point (x, y, z, roll, pitch, yaw), to determine the covariance of the pose estimate X\ as shown in Equation (1 1 ):

As mentioned above, particle filtering suffers from the effect of dynamic objects in the environment of the robot and the general issue of observation discriminability. The localization process according to the present disclosure addresses these problems by modifying the update step 220 in Figure 2 of the basic particle filtering process as shown in Figures 7 to 10.

Figure 7 shows details of a first phase of the weight update process of the correction block 420 of Figures 4 and 5 according to the present disclosure. State-of-the-art methods calculate a likelihood function based on the distance score between a set of observed feature points and a set of reference feature points in a map according to the nearest neighbor principle when performing map matching. The update processes according to Figures 7 to 10 of the present disclosure significantly extend and modify this concept by exploiting additional knowledge on the observed and reference features.

The first phase of the update process as shown in Figure 7 uses three nested loops where initially the indices m, p, and q in the respective loop are set to 1. The outer loop iterates over the set of predicted particles [xp}_m= provided from the prediction step 410 as input 1 1 to the transformation step 421 wherein M denotes the total number of particles. The intermediate loop iterates over the set of observed features wherein P denotes the total

number of observed features. The inner loop finally iterates over the set of reference features YM^AP _ wherein Q denotes the total number of reference features taken from the

map.

The set of reference features is extracted from sensor data of at least one sensor of the robot by performing feature detection and description on at least one frame, also called keyframe, of the sensor data. The sensors may be based on remote sensing and are referred to in the present disclosure as vision-based sensors. As described above, the vision-based sensor may involve a camera, in particular a stereo camera, a radar sensor, a light detection and ranging (LiDAR) sensor, e.g., using a pulsed laser, an ultrasonic sensor, an infrared sensor, or any other sensors adapted to provide imaging measurements, also called ranging, of the environment of the robot. The resulting sensor data may be organized into frames according to a time index. Some or all of these frames may be analyzed to detect and to extract features that may then be compared with reference features that are generally extracted offline from one or several reference maps of the environment of the robot.

As feature detection and matching are two important problems in machine vision and robotics, a large number of methods for feature extraction are known in the art. Some of the more widespread methods are the Scale-Invariant Feature Transform (SIFT), the Speed Up Robust Feature (SURF) technique, the Binary Robust Independent Elementary Features (BRIEF) method, and the Oriented FAST and Rotated BRIEF (ORB) method. An advanced ORB technique that also involves Simultaneous Localization and Mapping (SLAM) is for instance described in the article“ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras” by R. Mur-Artal and J.D. Tardos in IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255-1262, Oct. 2017.

The above-mentioned extraction methods typically process the sensor data to extract features at salient keypoint locations in the form of individual or clustered feature points. As part of the process, feature descriptors that describe the properties of the corresponding feature points are determined. Each of the above-mentioned observed features may therefore comprise one or several feature points and their corresponding feature descriptors. In addition, the update process according to a specific embodiment of the present disclosure exploits semantic information related to the features in terms of feature classes. Consequently, each feature may additionally comprise one or more feature classes as described below in more detail. The semantic information may in particular be used to distinguish between particular types or object classes of real-world elements such as cars, traffic signs, roads, buildings, or the like. In contrast, typical features may simply identify lines, corners, edges, distinct patterns or objects of distinct appearance. Generally, a feature descriptor is provided as a feature vector that may have a high dimensionality. Both, the observed features and the reference features are typically structured to include the same information, i.e. feature points and corresponding feature descriptors, and potentially feature classes, to allow for a matching between the observed and the reference features that yields a correspondence associated with a likelihood (Y_t\X™, Y_t ^MAP), that is referred to here and in the following as map matching.

The set of reference features Y_t ^MAP is generally extracted from a set of reference images acquired using the same imaging method as the one used by the at least one vision-based sensor and related to the environment of the robot. The extracted reference features may then be stored as a so-called feature-based map in a database or a storage medium.

Generally, maps may be feature-based or location-based. While the location-based maps are volumetric, in that they include feature descriptors for any location inside the area of the map, feature-based maps only specify features at specific locations, in particular locations of key objects contained in the map. The update process according to the present embodiment is described within the context of feature-based maps but may easily be modified to be applied to the location-based maps. While feature-based models extract relatively little information, by virtue of the fact that feature extractors project high dimensional sensor measurements into lower dimensional space, this advantage is offset by superior computational properties of feature-based representations.

As mentioned above, mobile robots are generally located in dynamic environments that change over time, both with respect to the location of other objects or subjects and with respect to environmental conditions such as changing daylight. Considering dynamic objects, one potential problem of the matching process between the observed features and the reference features lies in the fact that dynamic objects present in reference images used for the detection of the reference features may not be present in the topical environment of the robot and vice versa. Feature extraction and matching based on such dynamic objects thus introduces errors in the update process. State-of-the-art methods therefore suggested applying dedicated filters to the sensor data of the at least one vision-based sensor of the robot before performing the feature extraction and matching. Such filtering, however, further contributes to the already heavy computational load of the underlying processing.

The feature/map matching process according to the present disclosure therefore devises an alternative approach wherein the reference features, more specifically the reference images are filtered offline to remove dynamic objects such as pedestrians and other vehicles. The dynamic objects may be identified for instance, based on semantic information such as the above-mentioned feature classes. Ideally, the map of the reference features has been created before performing the localization process and is available from a storage medium or a database for the below described matching process. Situations may, however, exist where the mobile robot enters a terrain or area of which no such map exists. In these cases, the described method may be modified to perform simultaneous localization and mapping (SLAM) as it is generally known in the art. In SLAM, the robot acquires a map of its environment while simultaneously localizing itself relative to this map. When an object is detected in the environment of the robot, a SLAM algorithm must reason about the relation of this object to previously detected objects. Information that helps localize the robot is propagated through the map, and as a result improves the localization of other features in the map.

The update process according to the present embodiment may be applied to the SLAM problem. In this case, the set of reference features includes those features already extracted from the current map of the environment of the robot. Consequently, the reference features may include dynamic objects that may be removed from the set based on semantic information as described above. Alternatively, the dynamic objects may be kept as reference features under the assumption that they remain in the environment of the robot during the SLAM process. The algorithm may further selectively remove some of the dynamic objects, such as pedestrians and moving vehicles that are highly dynamic based on the corresponding semantic information or a feature class while keeping other less dynamic objects such as parked vehicles.

To perform matching between the set of observed features and the set of reference features, the process according to the present disclosure first transforms the observed features Y_t that are necessarily observed with respect to the local coordinate system of the robot into the global coordinate system of the reference features Y_t ^MAP in step 421 in Figure 7. This transformation is performed for each current hypothetical pose X™ to produce corresponding transformed observed features f_t ^m,p . In other words, each reference feature comprises at least one global, i.e. space-fixed positional coordinate, and each observed feature comprises at least one body- fixed positional coordinate, defined relative to the robot wherein a mapping between the space- fixed positional coordinates and the body-fixed positional coordinates is performed in the transformation step 421 on the basis of the current hypothetical poses {X™}m= i This transformation may of course also involve rotations to map between rotational coordinates of the local and global coordinate systems. Furthermore, the transformation may be performed in the inverse direction to map the reference features into the local coordinate system of the robot. After performing the coordinate transform, a likelihood distance score and a similarity score are calculated in step 422 according to the present disclosure. The distance score

may for instance be computed for each particle m, each observed feature p, and each reference feature q using a nearest neighbor based likelihood scoring method as known in the art. The similarity score S™^,p’⁹ may for instance be computed using the Hamming distance for an ORB feature descriptor. The similarity score may then be used to penalize the distance score in step 423 according to Equation (12) as follows if the features are not similar:

wherein D^MIN is a minimum distance score for the case that the similarity score exceeds a threshold 6_S for the similarity score that may for instance, be chosen within the range of the Hamming distance.

According to an alternative embodiment, each feature comprises one or more feature classes and, for each feature class, a probability value that conveys the semantic information, and calculating the similarity scores takes into account the feature classes of the reference features and the observed features and their respective probabilities. In particular, each feature class may be associated with a class of real-world elements and each probability value expresses the probability of the feature belonging to the respective feature class. In this case, the similarity score S_t ^m,p’⁹ may be determined as the probability of the reference feature q and the observed feature p having the same association, i.e. semantic label. The threshold 6_S for this case may simply express a particular probability value.

Calculation of the similarity scores may be performed separately on feature descriptors and feature classes or on a combination of feature descriptors and feature classes. The feature classes may, in particular, be integrated into the feature descriptors. As the distance score is only meaningful for those pairs of features ( p, q ) that share a certain degree of similarity, the penalty for the distance score in step 423 according to Equation (12) may be applied to feature descriptors and feature classes in a like way. If the semantic information is included in the calculation of the similarity score based on the feature classes, dynamic objects do not have to be removed separately from the sensor data of the at least one vision-based sensor if the map of reference features was pre-processed to remove dynamic objects. By way of example, observed features related to pedestrians will not find a match in the map with a sufficient similarity to yield a meaningful pair. The similarity based approach according to the present embodiment is therefore highly efficient and suitable for real-time implementations. In step 424, the scores

related to the nearest reference features q according to Equation (13): d = arg min D ’^p’

y q ^t (13)

are determined and stored for each particle m and each observed feature p . The resulting distance and similarity scores {D™'^p, S™'^p} _-l are then accumulated for each particle m in step 425 according to Equation (14):

As a result of the first phase of the update process shown in Figure 7, the set of distance and similarity scores {D_t ^m, 5_t ^m}"₌₁ for the current hypothetical poses { _t ^m}"₌₁ is output. This set of distance and similarity scores may be processed in different ways to update the previous weights {w™₁}"₌₁ .

Updating of the previous weights according to a first embodiment of the weight updating process is shown in Figure 8. First, the set of distance and similarity scores is separated into a set of distance scores {D™}"₌₁ and a set of similarity scores {5™}"₌₁ . The similarity scores are then further processed by calculating a similarity score distribution in step 426, for instance by fitting the set of similarity scores to a given distribution model, such as a one-sided distribution, so that

a? with mean p and standard deviation a? of the distribution.

The distribution may be a frequency distribution or a normalized frequency distribution of the scores. Based on the parameters m£ and s of the distribution of the similarity scores, a reliability condition for the distribution may be defined as in Equation (15): t < 9R (15)

wherein 0_R is a threshold that may be determined as the tail threshold of the similarity distribution.

For the current set of particles {X™)m=i _> it is determined in step 427 whether the distribution of the similarity scores fulfills the reliability condition. If the distribution does not fulfill the reliability condition, the calculated distance scores are discarded and the updated importance weights {w™}"₌₁ are determined independent of the distribution of the similarity scores. In this case, the previous weights { w™₁}m=₁ ^maY be used or a uniform distribution may be assigned in step 429b to produce the updated weights 12b. By using the previous weights or a uniform distribution, a distance score associated with an unreliable similarity score is discarded. This improves the stability of the underlying update process. In case the distribution of the similarity scores fulfills the reliability condition, the current weights {w“}"_{= i} are determined based on the distribution /(m^ , s ) of the similarity scores. According to the embodiment shown in Figure 8, the weight w_t ^m =

is first assigned to each particle in step 428 and then weighted with the similarity score /(m^, a_t ^s) in step 429a to produce the updated, not yet normalized weight w™ . Consequently, the weighting function for the particle can be taken as the probability value of the one-sided distribution of the similarity scores. If the reliability condition is fulfilled, the updated weights 12a are output by the process. According to the embodiment shown in Figure 8, the current weights 12a are therefore determined according to Equation (16):

{_W?} ₌₁ = a - _V{Y_t \X?, Y_t ^MAP) (16) wherein a is a normalization factor.

According to a second embodiment of the update process as shown in Figure 9, the current importance weights may be obtained by further applying a weighting based on a global rotation estimate which may come from at least one rotation sensor, such as an inertia-based sensor. Steps 426 to 428 and 429a are identical to those in Figure 8 so that their description is not repeated. In addition to applying the similarity score weighting in step 429a when the distribution of the similarity scores fulfills the reliability condition in step 427, the weight is further adjusted based on a global rotation estimate z ^otatwn from the global pose measurement Z_t in step 529c to produce updated importance weights 12a according to Equation (17):

Furthermore, when the above described reliability condition is not satisfied, a reset is performed according to this embodiment by assigning the weight w™ = p

) in step 529b based on a global position estimate z^^0Sltwn from the global pose measurement Z_t that may come from at least one position sensor, such as a satellite-based sensor, in particular a GPS sensor. The assigned weight may then further be adjusted in step 529d based on a global rotation estimate z ^otatwn from the global pose measurement Z_t to produce updated importance weights 12b according to Equation (17).

As compared to the first embodiment shown in Figure 8, the update process according to the second embodiment in Figure 9 has two advantages. First, in the case that the reliability condition is not satisfied, the importance weights will be determined by the global position estimate that can provide a certain degree of reliability. Secondly, further weighting based on a global rotation estimate regardless of whether the reliability condition is satisfied or not further increases the reliability of the updated weights. According to the second embodiment, two additional separate global estimates for position and rotation, for instance derived from a GPS sensor and an IMU sensor, are used to increase the reliability of the updated weights and, consequently, the current pose estimate. According to the second embodiment, the updated importance weights are determined according to Equation (18) when the reliability condition is fulfilled: "¾= i = « p(X_t \X?, Y_t ^MAP) p(Z_t\Xn (18)

Finally, according to a third embodiment of the update process as shown in Figure 10, in addition to the global pose measurement Z_t from satellite-based sensors and/or inertia-based sensors, a global pose measurement G_t that comes from position and/or rotation sensors other than the satellite-based sensors, and possibly the inertia-based sensors, e.g., from vision- based sensors, is used to further increase the reliability of the updated importance weights.

As the steps 426 to 428 and 429a in Figure 10 are identical to those in Figure 8, their description is not repeated here. When the reliability condition is not fulfilled, the weight w™ = p(Z_t, G_t \XJP-) may be assigned in step 629b based on the global pose estimates Z_t and G_t to produce the updated importance weights 12b. In case the distribution of the similarity scores fulfills the reliability condition, the weights determined in step 429a are further adjusted based on the global pose estimates Z_t and G_t to produce the updated importance weights 12a according to Equation (19):

w? = ^w?^l - p(Z_t, G_t \xn (19)

In this case, the updated importance weights are consequently determined according to Equation (20) as follows:

K¾=i = « P(W, Yt^MAP) P(Z_t, G_t\Xn (20)

Depending on the accuracy of the pose estimates, either the global position estimate or the global rotation estimate or both may be used in steps 629b and 629c of the third embodiment for the global pose estimates Z_t and G_t.

In summary, the current weights may be further adapted based on a global pose estimate derived from at least one of sensor data of a position sensor and sensor data of a rotation sensor, e.g., from at least one of sensor data of a satellite-based sensor, sensor data of an inertia-based sensor, and sensor data of the at least one vision-based sensor.

In the above described embodiments of the update process, knowledge of a feature descriptor and optionally a feature class for the observed and reference features is included to increase the probability of finding the correct nearest neighbor within a given search area and to adjust the particle weights calculation. Feature points which do not satisfy a threshold criterion for the similarity score calculated for these feature descriptors and feature classes are penalized. In addition, the predicted particles are weighted based on the distribution of the similarity scores if the corresponding distribution fulfills a reliability condition and the distance scores for the predicted particles are discarded and replaced with one or more global pose estimates if the distribution does not fulfill the reliability condition.

Usage of feature descriptors and optionally feature classes for the observed and reference features further increases filter accuracy against dynamic objects, such as passing vehicles, pedestrians, and the like, and increases feature discriminability. Consequently, the resulting current pose estimate becomes more reliable.

Figure 1 1 finally shows a vehicle implementing the present disclosure according to any of the above described embodiments. Without limitation, the vehicle 700 is equipped with wheel encoders 792 on the front wheels as odometry sensors that measure the rotation of the front wheels from which a change in the position of the vehicle can be determined. The vehicle 700 further includes an inertial measurement unit (IMU) 790 as an inertia-based sensor configured to determine changes in 6 DOFs, i.e. in the positional coordinates as well as the orientation of the vehicle. The IMU 790 thus constitutes a combination of a position sensor and a rotation sensor. Furthermore, the vehicle is equipped with a GPS sensor 796 as a satellite-based sensor or position sensor for measuring a global pose z^^0Sltwn based on a GPS signal. Finally, the vehicle 700 is equipped with a stereoscopic camera 794 as a vision-based sensor that records stereoscopic images of the environment of the vehicle. The images recorded by the camera 794 are then processed as described above to extract observed features in the environment of the vehicle.

To perform the localization process described above with respect to the embodiments of the present disclosure, the vehicle is equipped with processing circuitry 780 configured to execute any one of the above described methods. The sensor signals from the odometry sensors 792, the IMU 790, the GPS sensor 796 and the camera 794 are transmitted to the processing circuitry 780 via cable or wirelessly. The processing circuitry 780 then processes the sensor data as described above to perform localization of the vehicle 700 in the global coordinate system as indicated in Figure 1 1 with dashed lines.

Figure 1 1 shows the x- and y-axes of the global coordinate system wherein the z-axis coincides with the z-axis of the local coordinate system of the vehicle 700. The global coordinates are also called space-fixed coordinates while the local coordinates, represented by the x’- and y’- axes as well as the z-axis, are also called body-fixed coordinates. The heading of the vehicle 700 is indicated by the x’-axis in the figure. In a convenient way, this heading can be used to define the x’-axis of the local coordinate system wherein the rotation angles with respect to roll, pitch, and yaw are shown in the vehicle fixed local coordinate system. As described above, the processing circuitry 780 is configured to transform between positional and rotational coordinates in the body-fixed local coordinate system and positional and rotational coordinates in the space-fixed global coordinate system. A global pose and a global pose estimate therefore always refer to space-fixed global coordinates in the present disclosure. Figure 1 1 further schematically indicates the velocity of the vehicle 700 as a vector whose direction may differ from the heading of the vehicle due to sideslip of the vehicle. Such a sideslip may be one source of errors in the localization process as it is generally not accounted for by the wheel encoders 792.

The processes and methods described in the present disclosure, in particular with respect to Figures 1 to 10 may be implemented in a system comprising a processing circuitry configured to perform the described processes and methods. The system may comprise a combination of software and hardware. By way of example, the prediction, correction, and resampling steps of the particle filtering process according to Figures 4 and 5 may be implemented as software modules or as separate units of processing circuitry. In fact, any of the blocks in the Figures 2, 4, 5, and 7 to 10 can be implemented as hardware units or software modules. The described processing may be performed by a chip, such as a general purpose processor, a CPU, a GPU, a digital signal processor (DSP), or a field programmable gate array (FPGA), or the like. However, the present disclosure is not limited to implementation on programmable hardware. It may be implemented on an application-specific integrated circuit (ASIC) or by a combination of the above-mentioned hardware components.

The storage 455 in Figures 4 and 5 may be implemented using any of the storage units known in the art, such as a memory unit, in particular RAM, ROM, EEPROM, or the like, a storage medium, in particular a DVD, CD, USB (flash) drive, hard disk, or the like, a server storage available via a network, etc.

The processing circuitry 780 may in particular, be configured to determine a plurality of current hypothetical poses {XF}m=i of the robot, in particular the vehicle 700, using odometry measurements O_t in the prediction step 410, to determine the respective updated weights in the correction step 420, in particular based on similarity scores as shown in Figures 7 to 10, to resample the particles in the resampling step 430 and potentially recover particles in step 480 and supplement them in the augmentation step 485. Furthermore, the processing circuitry 718 may be configured to determine a confidence measure of the current pose distribution in step 440 of Figure 4 or to determine confidence measures of independently determined pose estimates in step 470 of Figure 5. In addition, the processing circuitry may be configured to perform pose estimation in step 450 based on the particle filtering and to perform an independent pose estimation based on prediction in step 460.

The above-described localization processes and sub-processes may also be implemented by a program including instructions stored on a computer-readable medium. The instructions, when executed on a processor, cause the processor to perform the above described processes and methods. The computer-readable medium can be any medium on which the instructions are stored such as a DVD, CD, USB (flash) drive, hard disk, server storage available via a network, or the like.

Summarizing, the present disclosure offers ways of improving the performance of low-cost systems for mobile robot localization. The processes according to the present disclosure have been extensively tested in real time using car prototypes having GPS, IMU and stereo camera sensors. Preliminary results show that the percentage of absolute mean errors below 1 m is approximately 90% and the averaged absolute mean error is 0.75 m or below in the longitudinal direction and less than 0.4 m in the lateral direction. These errors lie within the required specification of the loV Planning and Control Division and are therefore suitable for commercial deployment.

The methods and systems of the present disclosure significantly improve the localization accuracy performance while maintaining the low-cost feature. They may be implemented in low-cost systems having a stereoscopic camera, a GPS sensor and an IMU sensor. The described methods and systems provide high accuracy in all 6 DOFs of the vehicle pose including the altitude and the rotation (roll, pitch, and yaw). Due to their low requirements on processing power, the disclosed processes are suitable for real-time implementation. First tests have demonstrated that pose estimation can be performed at around 10 Hz.

The described methods and systems solve many of the problems of low-cost vehicle localization systems by solving the problem of dynamic objects and observation discriminability by means of an extended feature descriptor in the particle filtering process, by solving the problem of pose estimate instability by interchangeably using particle filtering and dead reckoning, and by solving the problem of intermittency of GPS signals by adding a global pose estimate based on non-GPS sensors.

Claims

1. An apparatus for estimating a pose of a robot (700), configured to: determine (450, 460, 470) a current pose estimate of the robot based on a first pose estimate (15a) or a second pose estimate (15b) or a combination (16) of the first pose estimate and the second pose estimate, wherein the first pose estimate is based on a current pose distribution (13) of the robot; and wherein a contribution of the first pose estimate to the current pose estimate and a contribution of the second pose estimate to the current pose estimate are determined based on the current pose distribution.

2. The apparatus of claim 1 , wherein the second pose estimate (15b) is based on one or more of the following: prediction (460) from one or more previous pose estimates (6), or a global pose estimate derived from at least one of sensor data from a position sensor (796) and sensor data from an orientation sensor (790).

3. The apparatus of claim 1 or 2, wherein the contribution of the first pose estimate (15a) and the contribution of the second pose estimate (15b) are determined based on a confidence measure (440) of the current pose distribution.

4. The apparatus of claim 3, wherein, upon determination that the confidence measure (440) of the current pose distribution exceeds a threshold, only the first pose estimate (15a) contributes to the current pose estimate.

5. The apparatus of claim 4, further configured to adapt the threshold based on the confidence measure (440) of the current pose distribution.

6. The apparatus of claim 5, wherein the threshold is increased in response to the confidence measure (440) of the current pose distribution being significantly higher than the threshold, or wherein the threshold is decreased in response to the confidence measure (440) of the current pose distribution being significantly lower than the threshold.

7. The apparatus of claim 6, wherein a transition from increasing the threshold to decreasing the threshold and vice versa is delayed by a respective delay time.

8. The apparatus of claim 1 or 2, wherein the contribution of the first pose estimate (15a) and the contribution of the second pose estimate (15b) are determined based on confidence measures (470) of the respective pose estimates.

9. An apparatus for estimating a pose of a robot (700), configured to: determine (410) a plurality of current hypothetical poses (1 1 ) of the robot; for each of the plurality of current hypothetical poses, determine (420) a weight

(12); and determine (450) a current pose estimate (15a) of the robot based on the plurality of current hypothetical poses and their weights; wherein for each of the plurality of current hypothetical poses, determining (420) the weight (12) comprises calculating (422 - 425) a similarity score which is a measure of similarity between a set of reference features (4) and a set of observed features (3). 10. The apparatus of claim 9, wherein each reference feature and each observed feature comprises one or more feature descriptors.

1 1. The apparatus of claim 9 or 10, wherein each reference feature and each observed feature comprises one or more feature classes and, for each feature class, a probability value, and wherein calculating (422 - 425) the similarity score is based on the one or more feature classes of the reference features and their probability values and the one or more feature classes of the observed features and their probability values.

12. The apparatus of claim 1 1 , wherein each feature class is associated with a class of real-world elements. 13. The apparatus of any one of claims 10 to 12, wherein each reference feature further comprises a space-fixed, SF, positional coordinate and each observed feature further comprises a body-fixed, BF, positional coordinate, the BF positional coordinate being defined relative to the robot; and wherein calculating (422 - 425) the similarity score comprises mapping (421 ) between the SF positional coordinate and the BF positional coordinate on the basis of a current hypothetical pose (1 1 ).

14. The apparatus of any one of claims 9 to 13, wherein the weights of the current hypothetical poses are determined (429a) based on a distribution of the similarity scores when the distribution fulfills a reliability condition (427).

15. The apparatus of claim 14, wherein the weights of the current hypothetical poses are determined (429b, 529b, 629b) independently of the distribution of the similarity scores when the distribution does not fulfill the reliability condition (427). 16. The apparatus of claim 14 or 15, further comprising at least one of a position sensor

(796) and an orientation sensor (790), wherein the weights of the current hypothetical poses are further adapted (529c, 529d, 629c) based on a global pose estimate derived from at least one of sensor data of the position sensor (796) and sensor data of the orientation sensor (790). 17. An apparatus for estimating a pose of a robot (700), configured to: generate a first pose distribution (18) of the robot based on one or more first navigational measurements (5), generate a second pose distribution (1 ) of the robot based on the first pose distribution (18) and on a current instance of a refined pose distribution (13), generate a next instance of the refined pose distribution (13) based on the second pose distribution (1 ) and on one or more second navigational measurements (2, 3), and determine a pose estimate of the robot based on the next instance of the refined pose distribution (13).

18. The apparatus of claim 17, wherein the current instance and the next instance of the refined pose distribution (13) are each represented by a set of hypothetical poses and associated weights, wherein the set representing the current instance and the set representing the next instance comprise the same number of hypothetical poses.

19. The apparatus of claim 17 or 18, wherein in generating the second pose distribution (1 ), the current instance of the refined pose distribution (13) contributes more to the second pose distribution (1 ) than the first pose distribution (18).

20. The apparatus of any one of claims 17 to 19, configured to generate the first pose distribution (18) independently of the refined pose distribution.

21. The apparatus of any one of claims 17 to 20, configured to generate the one or more first navigational measurements by one or more of the following: satellite-based pose estimation, inertia-based pose estimation, vision-based pose estimation, or user input.

22. The apparatus of any one of claims 17 to 21 , configured to generate the one or more second navigational measurements by one or more of the following: satellite-based pose estimation, inertia-based pose estimation, vision-based pose estimation, or odometric pose estimation.

23. A robot (700), in particular a vehicle, comprising the apparatus of any one of claims 1 to 22. 24. A method for estimating a pose of a robot (700), the method comprising: determining (450, 460, 470) a current pose estimate of the robot based on a first pose estimate (15a) or a second pose estimate (15b) or a combination (16) of the first pose estimate and the second pose estimate; wherein the first pose estimate is based on a current pose distribution (13) of the robot; and wherein a contribution of the first pose estimate to the current pose estimate and a contribution of the second pose estimate to the current pose estimate are determined based on the current pose distribution.

25. A method for estimating a pose of a robot (700), the method comprising: determining (410) a plurality of current hypothetical poses (1 1 ) of the robot; for each of the plurality of current hypothetical poses, determining (420) a weight (12); and determining (450) a current pose estimate (15a) of the robot based on the plurality of current hypothetical poses and their weights; wherein for each of the plurality of current hypothetical poses, determining (420) the weight (12) comprises calculating (422 - 425) a similarity score which is a measure of similarity between a set of reference features (4) and a set of observed features (3).

26. A method of estimating a pose of a robot (700), comprising: generating a first pose distribution (18) of the robot based on one or more first navigational measurements (5), generating a second pose distribution (1 ) of the robot based on the first pose distribution (18) and on a current instance of a refined pose distribution (13), generating a next instance of the refined pose distribution (13) based on the second pose distribution (1 ) and one or more second navigational measurements (2, 3), and determining a pose estimate of the robot based on the next instance of the refined pose distribution (13). 27. A computer-readable medium storing instructions that when executed on a processor cause the processor to perform the method of any one of claims 24 to 26.