GB2641767A

GB2641767A - Method for verifying ground-truth data in a scene and system

Info

Publication number: GB2641767A
Application number: GB2408372.7A
Authority: GB
Inventors: Monninger Thomas; Brown Aaron; Antol Stanislaw; Bauer Peter
Original assignee: Mercedes Benz Group AG
Current assignee: Mercedes Benz Group AG
Priority date: 2024-06-12
Filing date: 2024-06-12
Publication date: 2025-12-17
Also published as: GB202408372D0; DE102024130910A1

Abstract

Method for verifying ground truth data in a scene 13 and training an auto-labelling pipeline, comprising: capturing images of objects in a scene using a capture device 11 such as cameras, LIDAR, RADAR, ultrasonic sensor; generating ground truth data by applying an auto-labelling pipeline 12 to the images; verifying the positional alignment between the manually obtained labels and the automatic generated labels for each object of the ground-truth data; identifying data points in the ground truth data that do not align with the manual labels; and utilizing the deviating data points either as examples for improving the autolabelling pipeline or disregarding them. A tool with a simple user interface may be used for documenting the scene setup information without additional burden. The automatically generated labels may be obtained from a machine learning and/or image processing process. The method may be applied to a vehicle 14 advanced driver assistance system (ADAS).

Description

[0001] Mercedes-Benz Group AG

[0002] METHOD FOR VERIFYING GROUND-TRUTH DATA IN A SCENE AND SYSTEM

[0003] FIELD OF THE INVENTION

[0004] The invention relates to the field of automobiles. More specifically, the present invention relates to a method for verifying Ground-Truth Data in a scene according to claim 1. Furthermore, the present invention relates to a system, a corresponding computer program product, and a corresponding non-transitory computer-readable storage medium.

[0005] BACKGROUND INFORMATION

[0006] Learning-based models require Ground-Truth (GT) data, specifically labeled with the desired output for a given input. The model learns the underlying data distribution by minimizing the loss, i.e., the difference between prediction and GT. A problem can arise when the GT data lacks the desired/needed quality.

[0007] One specific application is as follows: A perception system in ADAS/AD aims to generate a 3D geometric environment representation from cost-effective sensor data such as cameras (which only provide a 2D projection), radar, and ultrasonic (both with coarse detections). To generate precise GT labels (the 3D geometric environment), data collection vehicles are equipped with LiDAR as an expert sensor. Automatic labeling is used to generate the labels. However, even the expert sensor can generate erroneous GT data, for example, False Negative (FN) issues due to filtering steps or False Positive (FP) issues due to reflections.

[0008] Some low-quality labels in the GT could significantly impair the entire training process. Since data generation is inexpensive, it is already sufficient to identify and remove these poor-quality labels from the training set.

[0009] The objective of the invention is to develop a method for capturing Groud-TruthData with high quality standards to achieve better performances, using this Data.

[0010] This objective is solved by means of an inventive method with the features of claim 1, as well as by means of an inventive system. Advantageous embodiments of the invention and further developments can be found in the dependent claims and in the description.

[0011] SUMMARY OF THE INVENTION

[0012] One aspect of the invention relates to a method for verifying Ground-Truth data in a scene using a learning-based system.

[0013] A scene for verifying Ground-Truth data using a learning-based system could encompass various contexts depending on the application. For instance, it may be a traffic monitoring scenario at a bustling intersection where vehicles traverse, pedestrians cross the street, and traffic lights regulate flow. The scene may be carefully observed to capture relevant features and information essential for the specific application of the learning-based system.

[0014] Ground-Truth refers to the true or correct data used as a reference for evaluating or validating algorithms or systems. These data serve as a benchmark for assessing the accuracy and performance of models or systems.

[0015] The procedure for verifying Ground-Truth data in a scene using a learning-based system begins with the capture of manual data points using a sensing. For data capture, various sensing devices such as sensors and camera systems could be employed. Therefore an extended data collection may be implemented, where for each data point additionally the scene setup is documented. This includes number of vehicles and other relevant objects in the scene (for parking use cases, for example number of cones, number of pedestrians etc.). This can be tracked in a tool with a very simple UI and no overhead for the data collection team.

[0016] These manual data points serve as a reference for the GT data, representing the actual features in the scene.

[0017] To ensure the accuracy of the GT data, an Auto-Labeling Pipeline is employed to generate additional documentation of the scene for each manual data point. This involves automatically assigning labels to the captured objects or features.

[0018] Subsequently, the correspondence between the manual and automatic labels for each data point is verified. Discrepancies between these labels are identified using comparison and evaluation algorithms. Additionally, an expanded data collection takes place, documenting not only the capture of each data point but also the scene setup. This allows for a more comprehensive analysis of the scene and potential improvement in the quality of GT data.

[0019] The identified deviating data points can either be used as training data to enhance the Auto-Labeling Pipeline or be disregarded. This decision is made through algorithms for data selection and prioritization, aimed at optimizing the quality of GT data.

[0020] Following this, an automated process is implemented, utilizing the manual scene annotations and the generated GT data to verify the correspondence between manual and automatic labels. Through the application of machine learning algorithms and pattern recognition techniques, the accuracy of the GT data is efficiently ensured, leading to enhanced performance of the learning-based system.

[0021] Further advantages, features, and details of the invention derive from the following description of preferred embodiments as well as from the drawing. The features and feature combinations previously mentioned in the description as well as the features and feature combinations mentioned in the following description of the figure and/or shown in the figure alone can be employed not only in the respectively indicated combination but also in any other combination or taken alone without leaving the scope of the invention.

[0022] BRIEF DESCRIPTION OF THE DRAWING

[0023] The novel features and characteristic of the disclosure are set forth in the appended claims. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and together with the description, serve to explain the disclosed principles. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the figures to reference like features and components. Some embodiments of system and/or methods in accordance with embodiments of the present subject matter are now described below, by way of example only, and with reference to the accompanying figures.

[0024] The drawing shows in: [0019] Fig. 1

[0025] DETAILED DESCRIPTION

[0026] In the present document, the word "exemplary" is used herein to mean "serving as an example, instance, or illustration". Any embodiment or implementation of the present subject matter described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

[0027] While the disclosure is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawing and will be described in detail below. It should be understood, however, that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the scope of the disclosure.

[0028] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion so that a setup, device or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus preceded by "comprises" or "comprise" does not or do not, without more constraints, preclude the existence of other elements or additional elements in the system or method.

[0029] In the following detailed description of the embodiment of the disclosure, reference is made to the accompanying drawing that forms part hereof, and in which is shown by way of illustration a specific embodiment in which the disclosure may be practiced. This embodiment is described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.

[0030] Fig. 1 illustrates the process of verifying ground-truth data in a scene 13 using a machine-learning based system 10, for example in a vehicle 14. The diagram marks several components that clarify the process and interaction of the elements. The system's 10 data capture device 11 (optical capture device or sensor device as example) is a central unit for manual data collection and serves as the primary data source. The auto-labeling pipeline 12 uses these manual inputs to generate automatic labels that then serve as ground-truth data. The scene 13, where this data collection occurs, is documented by both the data capture device 11 and the auto-labeling pipeline 12. The manual data points are collected directly in this scene 13 by the capture device 11 and form the basis for the automatic labels produced by the pipeline 12.

[0031] These labels are combined with the manual data points to form the ground-truth data, with each additional piece of information contributing to the documentation of the scene 13. A critical step in the process is verifying the match between manual and automatic labels to ensure data accuracy. This also involves identifying deviant data points, which indicate where the auto-labeling pipeline 12 could be improved.

[0032] Finally, it is decided whether these deviant data points will be used to enhance the pipeline 12 or ignored, directly affecting the system's further optimization. This structured and detailed depiction facilitates understanding of the complex procedures for data verification and optimization within the system 10.

[0033] In summary, the invention proposes a system 10 and a method to annotate and verify ground truth generated by auto-labeling.

[0034] List of reference signs system 11 capture device 12 auto-labeling pipeline 13 scene 14 vehicle

Claims

1. Mercedes-Benz Group AGCLAIMS1. Method for verifying Ground-Truth data in a scene (13) using a learning-based system (10), comprising the following steps: - Capture of manual data points using a system's capture device (11); - Generation of Ground-Truth data by additional documentation of the scene for each manual data point using an auto-labeling pipeline (12); - Verification of the alignment between the manual and automatic labels for each individual manual data point and the corresponding data points of the Ground-Truth data to ensure the expected accuracy of the Ground-Truth data; - Identification of data points in the Ground-Truth data that do not align with the manual labels; - Utilization of these deviating data points either as examples for improving the auto-labeling pipeline (12) or disregarding them.

2. Method according to claim 1, characterized in that the utilization of a tool with a simple user interface for documenting the required scene setup information without additional burden.

3. Method according to any one of claims 1 or 2, characterized in that the automatic generation of the automatic labels is performed using techniques of machine learning and image processing.

4. System to execute a Method or verifying Ground-Truth data in a scene (13) according to any of the preceding claims.

5. Computer program product comprising program code means for performing a method according to claim 1 to 3.

6. A non-transitory computer-readable storage medium comprising at least the computer program product according to claim 5.