EP4670059A1

EP4670059A1 - METHOD AND DEVICE FOR CLASSIFIING MATERIALS USING INDIRECT FLIGHT TIME DATA

Info

Publication number: EP4670059A1
Application number: EP24706135.1A
Authority: EP
Inventors: Jose Manuel GIL-CACHO; Thomas ALLAIN; Frederic DE GROEF; Jernej PERHAVC
Original assignee: Sony Depthsensing Solutions NV SA; Sony Semiconductor Solutions Corp
Current assignee: Sony Depthsensing Solutions NV SA; Sony Semiconductor Solutions Corp
Priority date: 2023-02-24
Filing date: 2024-02-22
Publication date: 2025-12-31
Also published as: WO2024175706A1

Abstract

An information processing device for classifying materials, comprising circuitry configured to: obtain spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material; compute, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features; and input the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data.

Description

METHOD AND DEVICE FOR CLASSIFYING MATERIALS USING INDIRECT TIME OF FLIGHT DATA

TECHNICAL FIELD

The present disclosure generally pertains to an information processing device and an information processing method for classifying materials.

TECHNICAL BACKGROUND

Time-of-flight (ToF) devices are known which determine a depth map of a scene based on a ToF of light that is emitted by an illuminator of the ToF device, thrown back by an object in the scene and detected by a ToF sensor of the ToF device, wherein the ToF of the light is determined based on the round-trip time (direct ToF device) or the phase of the detected light (indirect ToF device).

Moreover, spot ToF devices are known in which the illuminator emits spotted light to the scene, for example, a light pattern of separated high-intensity and low-intensity light areas such as a pattern of light dots.

Some known methods for material sensing are based on multispectral sensors where the reflectance of the material is analyzed at different wavelengths.

Generally, it is known to perform facial recognition based on 2D images, e.g., for authenticating a user for unlocking a mobile device. However, some of the known methods may be prone to spoofing.

Although there exist techniques for material sensing and facial recognition, it is generally desirable to improve the existing techniques.

SUMMARY

According to a first aspect the disclosure provides an information processing device for classifying materials, comprising circuitry configured to: obtain spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material; compute, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features; and input the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data. According to a second aspect the disclosure provides an information processing method for classifying materials, comprising: obtaining spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material; computing, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features; and inputting the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data.

Further aspects are set forth in the dependent claims, the drawings and the following description.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments are explained by way of example with respect to the accompanying drawings, in which:

Fig. 1 schematically illustrates in a block diagram an embodiment of an information processing device;

Fig. 2 schematically illustrates an embodiment of spot and valley detection;

Fig. 3 schematically illustrates an embodiment of direct global separation;

Fig. 4 schematically illustrates an embodiment of an information processing device for facial recognition;

Fig. 5 schematically illustrates an embodiment of a training of a machine learning algorithm;

Fig. 6 schematically illustrates an embodiment of normalized mean feature values for different material classes;

Fig. 7 schematically illustrates in a flow diagram an embodiment of an information processing method; and

Fig. 8 schematically illustrates a multi-purpose computer which can be used for implementing an information processing device.

DETAILED DESCRIPTION OF EMBODIMENTS

Before a detailed description of the embodiments under reference of Fig. 1 is given, general explanations are made.

As mentioned in the outset, time-of-flight (ToF) devices are known which determine a depth map of a scene based on a ToF of light that is emitted by a light source of the ToF device, thrown back by an object in the scene and detected by a ToF sensor of the ToF device, wherein the ToF of the light is determined based on the round-trip time (direct ToF device (dToF)) or the phase of the detected light (indirect ToF device (iToF)).

Moreover, spot ToF devices are known in which the light source emits spotted light to the scene, for example, a light pattern of separated high-intensity and low-intensity light areas such as a pattern of light dots.

As also mentioned in the outset, some known methods for material sensing are based on multispectral sensors where the reflectance of the material is analyzed at different wavelengths.

Generally, as further mentioned in the outset, it is known to perform facial recognition based on 2D images, e.g., for authenticating a user for unlocking a mobile device.

For example, the image (e.g., a 2D infrared (IR) or RGB (red-green-blue) image) may be fed into a convolutional neural network (CNN) for automatic feature extraction and classification.

However, in some cases, some of the known methods may be prone to spoofing.

It has been recognized that facial recognition based on 2D images may be spoofed by presenting a printed image of the legitimate user to the camera or by wearing/presenting a mask with the characteristics of the legitimate user to the camera of the mobile device.

It has thus been recognized that the facial recognition should be based on skin identification to avoid spoofing.

Moreover, it has been recognized that different materials have different reaction to incident light. This is called sub-surface scattering: the light penetrates the material, and depending on its thickness and transparency, the light would travel inside the material more or less.

Hence, it has been recognized that spot indirect ToF devices, which typically use active modulated (temporal and spatial) IR light, may be used for skin identification for improving facial recognition. Thereby, depth information and material sensitive information would be available at once.

It has further been recognized that traditional machine learning with engineered features may be used, wherein the features are selected based on an understanding of the underlying physics and the machine learning algorithm is a classifier based on the engineered features.

Furthermore, it has been recognized that such an approach may be generalized to identify other materials than skin, e.g., to classify a material into a material group. Hence, some embodiments pertain to an information processing device for classifying materials, wherein the information processing device includes circuitry configured to: obtain spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material; compute, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features; and input the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data.

The information processing device may be a data processing module, a computer, a server, a mobile device (such as a smartphone, a tablet, a laptop), a virtual reality device or the like.

The circuitry may be based on or may include or may be implemented as integrated circuity logic or may be implemented by a CPU (central processing unit), an application processor, a graphical processing unit (GPU), a microcontroller, an FPGA (field programmable gate array), an ASIC (application specific integrated circuit) or the like or a combination thereof.

The functionality may be implemented by software executed by a processor such as a microprocessor or the like. The circuitry may be based on or may include or may be implemented by typical electronic components configured to achieve the functionality as described herein. The circuitry may be based on or may include or may be implemented in parts by typical electronic components and integrated circuitry logic and in parts by software.

The circuitry may include data storage capabilities to store data such as memory which may be based on semiconductor storage technology (e.g., RAM, EPROM, etc.) or magnetic storage technology (e.g., a hard disk drive) or the like.

The circuitry may include a data bus for receiving and transmitting data over the data bus. The circuitry may implement communication protocols for receiving and transmitting the data over the data bus.

In some embodiments, the information processing device includes a spot indirect time-of-flight device to acquire the spot indirect time-of-flight data. In some embodiments, the spot indirect time-of-flight device includes the information processing device. In some embodiments, the information processing device and the spot indirect-time-of-flight device are separated devices.

The spot indirect time-of-flight device includes a spot illuminator configured to illuminate a scene with spotted light. The spot illuminator may include a light emitting diode (LED), a laser, a laser diode, an LED array, a laser diode array, a diffractive optical element, a lens system, etc.

The spotted light has a spatial light pattern including high-intensity light areas and low-intensity light areas and, thus, a plurality of light spots corresponding to the high-intensity light areas is projected onto the scene. The light spots may be dots, stripes, a checker pattern or the like.

The spotted light is further temporal intensity modulated with a configured modulation frequency in accordance with an applied modulation signal. The applied modulation signal is a periodic signal which may be a sinusoidal signal, a rectangular signal, or the like.

The spot indirect time-of-flight device includes a time-of-flight sensor configured to detect light thrown back by an object in a scene.

The time-of-flight sensor includes a time-of-flight pixel array including plurality of time-of- flight pixels arranged in rows and columns, wherein each time-of-flight pixel may be a CAPD (Current Assisted photonic Demodulator) pixel array including a plurality of CAPD pixels (e.g., one-tapped, two-tapped, etc.) arranged in rows and columns.

The time-of-flight sensor may be embedded in a time-of-flight camera which may further include optical elements such as a(n) (adaptive) lens system, color filters, a diffractive optical element or the like.

The spot indirect time-of-flight device includes a control configured to control the overall operation of the spot indirect time-of-flight device.

The time-of-flight measurement may include four correlation measurements, wherein for each correlation measurement a different phase shift between a modulation signal applied to the spot illuminator and a demodulation signal applied to the time-of-flight sensor is utilized (e.g., 0°, 90°, 180° and 270°), wherein the modulation signal and the demodulation signal are synchronized.

In some embodiments, for each correlation measurement, the control outputs one frame, wherein the frame includes digital representations of the output voltages of each CAPD pixel.

In other embodiments, the control performs pre-processing by computing IQ values (I: in-phase component; Q: quadrature component) for each CAPD pixel, wherein the IQ values are based on the frames of the correlation measurements, thereby computing one frame with IQ values for each CAPD pixel, which is output. The spot indirect time-of-flight data may thus include one or more frames including digital representations of output voltages or IQ values.

The circuitry computes, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features.

Thus, the plurality of predefined features basically corresponds to a plurality of computation rules indicating how to compute the feature data from the spot indirect time-of-flight data to obtain properties of the material which allow to classify the material, in particular which are unique for different materials. These material properties are represented in the spot indirect time- of-flight data. Basically, any computation rule may be valid, for example, computation rules with which time-of-flight properties are computed such as amplitude, confidence, intensity, phase or depth of each time-of-flight pixel. Each predefined feature of the plurality of predefined features thus corresponds to a computation rule for computing feature values representing a part of the feature data.

However, it has been recognized that some of the predefined features may be correlated such that accuracy and robustness of the material classification may be reduced and processing time may be increased when too many correlated predefined features are used.

Hence, it has been recognized that the predefined features should be engineered and selected according to the underlying physics, their importance for the classification and their correlation with other predefined features.

As mentioned above, it has been recognized that different materials have different reaction to incident light. This is called sub-surface scattering: the light penetrates the material, and depending on its thickness and transparency, the light would travel inside the material more or less.

Hence, the features should be selected to be unique in different materials because sub-surface scattering is different in those materials.

In general, the light signal received at each time-of-flight pixel has typically a direct light component and a global light component, wherein the direct light component corresponds to light that is emitted by the illuminator, thrown back by an object in a scene and imaged onto a certain time-of-flight pixel. However, the global light component is a sum of multiple reflections, scattering and stray light due to the time-of-flight camera itself (e.g., lenses, filters, etc.), geometric scene features (e.g., corners, concave regions, etc.) and material features (e.g., scattering, transparency, etc.). The global light component causes Multipath Interference (MPI), as it mixes with the direct light component.

It has been recognized that different sub-surface scattering of the materials results in different effects in the direct and global light component, since the more the material lets the light penetrate, the higher typically the magnitude of time-of-flight quantities - such as phase and amplitude - of the global light component.

Hence, it has been recognized that sub-surface scattering affects spots (e.g., dots) and valleys in a unique way in different materials.

Thus, in some embodiments, the circuitry is configured to detect spots and valleys in the spot indirect time-of-flight data, as will be discussed under reference of Fig. 2.

As a result of the spot/valley detection, the circuitry obtains the pixel positions of the spots and the corresponding valleys.

In some embodiments, the circuitry is configured to perform direct and global separation (DGS), as will be discussed under reference of Fig. 3.

In some embodiments, the feature data are computed based on properties of the detected spots and valleys.

The plurality of predefined features may thus correspond to computation rules which require properties of the detected spots and valleys as input, for example, the computation rule may require computation of a sum, a difference, a product or a ratio of spot and corresponding valley properties, e.g., difference or ratio between cartesian depth/phase of a spot (before or after DGS correction) and that of the corresponding valley, or a ratio between confidence of a spot and that of the corresponding valley.

The circuity obtains an input feature for the trained machine learning algorithm for each predefined feature at the end of the computation according to the predefined feature, wherein each input feature includes one or more feature values (having a data format such as binary, Integer, floating point, double precision or the like) and, thus, each input feature includes a single feature value or an array/vector of feature values.

The feature data may thus include a plurality of input features, e.g., an array/vector of different input features, wherein each input feature includes one or more feature values.

It has been recognized that the following features per spot may provide a robust material classification. In some embodiments, the predefined features are predefined features per detected spot requiring computation of reflectance of the spot, reflectance of the corresponding valley, the spot size, ratio between cartesian depth or phase of the spot after direct global separation and cartesian depth or phase of the corresponding valley, ratio between amplitude or confidence of the spot after direct global separation and amplitude or confidence of the corresponding valley, ratio between variance of cartesian depth or phase of the spot and variance of cartesian depth or phase of the corresponding valley, as will be discussed in more detail under reference of Figs. 1 to 6.

However, as mentioned above, other predefined features may be appropriate as well, since the possible combinations of, for example, time-of-flight properties and/or properties of the detected spots and valleys across one or more ROIs (region-of-interests) is basically infinite.

The circuitry inputs the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data.

The training of the machine learning algorithm will be discussed under reference of Fig. 5.

The classifier may be a binary classifier such that the machine learning algorithm outputs whether the material belongs to a certain material or not.

The material class may correspond to a larger material group such as metal, textile, paper, plastic, silicone, latex, rubber, wax, wood or skin or the like.

The classifier may be a multiclass classifier such that the machine learning algorithm outputs, for example, for each material class, a probability that the material belongs to this material class.

The classifier may be a multiclass classifier such that the machine learning algorithm outputs, for example, the material class to which the material belongs to.

As mentioned above, traditional machine learning (which may also be referred to as statistical learning) is utilized using predefined (engineered) features in contrast to artificial neural networks which automatically find the relevant features on their own during training.

Hence, in some embodiments, the machine learning algorithm is one of a support vector classifier, a random forest, a decision tree, a k-nearest neighbor algorithm, a naive Bayes classifier and AdaBoost. Other machine learning algorithms may be appropriate as well.

As mentioned above, the facial recognition may be improved by including skin identification to avoid spoofing.

Hence, in some embodiments, the machine learning algorithm is trained to classify the material into skin or non-skin. In some embodiments, the circuitry is configured to perform facial recognition using the classification result.

In such embodiments, the circuitry determines that facial recognition has failed when the machine learning algorithm classifies the material as not being skin.

In such embodiments, when the machine learning classifies the material as being skin, the circuitry performs the facial recognition based on the spot indirect time-of-flight data and/or based on one or more 2D images such as a 2D infrared (IR) or RGB (red-green-blue) image. The 2D images may be acquired with an image sensor (e.g., a CCD (“Charge-Coupled device”) sensor or an active pixel sensor). The image sensor may be part of the information processing device.

In some embodiments, the circuitry may compute an amplitude image, based on the spot indirect time-of-flight data, which is comparable to a 2D image and extracts 2D facial features from the amplitude image, wherein 2D facial feature extraction from 2D images is generally known.

In some embodiments, the circuitry is configured to compute, based on the spot indirect time-of- flight data, a depth map and to perform the facial recognition further based on the depth map.

In such embodiments, the circuitry may extract structural 3D features from the depth map to perform the facial recognition and may correlate the structural 3D features with the 2D facial features to perform the facial recognition.

The circuitry may store corresponding reference features of the legitimate user. The extracted 2D facial features - which are extracted based on the 2D images - and/or the structural 3D features, which are extracted based on the spot indirect time-of-flight data, may be compared with the stored reference features of the legitimate user for authentication.

It has further been recognized that classifying the material as skin or not for improving facial recognition may benefit from using only input features from certain regions of the human face, since taking feature data of the whole face may blur the distinguishability of skin with respect to other materials.

However, with predefined region-of-interest, as will be discussed under reference of Fig. 4, the classification accuracy may increase.

Hence, in some embodiments, the circuitry is configured to compute the feature data in accordance with a plurality of predefined region-of-interests. Some embodiments pertain to a(n) (corresponding) information processing method for classifying materials, wherein the information processing method includes: obtaining spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material; computing, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features; and inputting the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data.

The information processing method may be performed by the information processing device as described herein.

The methods as described herein are also implemented in some embodiments as a computer program causing a computer and/or a processor to perform the method, when being carried out on the computer and/or processor. In some embodiments, also a non-transitory computer- readable recording medium is provided that stores therein a computer program product, which, when executed by a processor, such as the processor described above, causes the methods described herein to be performed.

Returning to Fig. 1, there is schematically illustrated in a block diagram an embodiment of an information processing device 1, which is discussed in the following under reference of Figs. 1 to 6.

The information processing device l is a mobile device, here a smartphone, and includes a spot indirect time-of-flight device 100 (spot iToF device 100 in the following), a data bus 12 and circuitry 13.

The spot iToF device 100 includes a spot illuminator 2, a ToF camera 3, a control 4 and a communication interface 5.

The spot illuminator 2 includes, e.g., a diode laser array as a light source or a laser and a diffractive optical element or the like to illuminate a scene 6 with spotted light. The scene 6 includes an object 7 which at least partially throws back the spotted light. The object is made of or covered with a certain material which is to be classified.

The spotted light has a spatial light pattern including high-intensity areas 8 and low-intensity areas 9 and, thus, a plurality of light spots corresponding to the high-intensity areas 8 is projected onto the scene 6. The light pattern or the light spots may be dots, stripes, or a checker pattern or the like. Moreover, the spot illuminator 2 emits the temporal intensity modulated light with a configured modulation frequency, in accordance with an applied modulation signal, to the scene 6. The applied modulation signal is a periodic signal such as a sinusoidal signal, a rectangular signal or the like.

The ToF camera 3 includes, e.g., a lens system, an aperture and an ToF sensor (not shown) to detect slight thrown back by the object 7 in the scene 6.

The ToF sensor includes a plurality of two-tapped current-assisted photonic demodulator (CAPD) pixels arranged in rows and columns. Each CAPD pixel generates, in accordance with an applied demodulation signal, an output voltage in accordance with the phase of the received light signal.

The demodulation signal is further phase-shifted by 180° between a first tap and a second tap of the two-tapped CAPD pixel and the difference of the voltage of the two taps is output to decrease an ambient light contribution to the output voltage, pixel offset, etc. The applied demodulation signal is a period signal such as a sinusoidal signal, a rectangular signal or the like in accordance with the modulation signal.

The integration time of the ToF sensor (of each two-tapped CAPD pixel) is controlled, for example, by controlling a number of modulation periods T over which the output voltage is generated.

The control 4 executes software by a processor, for example, a control block 10 and, optionally, a pre-processing block 11.

The control block 10 includes procedures having instructions to control the overall operation of the spot iToF device 100.

The control block 10 includes instructions to perform a ToF measurement to acquire spot iToF data, for example, the ToF measurement includes four correlation measurements, wherein for each correlation measurement a different phase shift between the modulation signal applied to the spot illuminator 2 and the demodulation signal applied to the ToF sensor is utilized (e.g., 0°, 90°, 180° and 270°).

The control block 10 obtains for each correlation measurement a frame including digital representations of the output voltages of the plurality of CAPD pixels from the ToF sensor.

Optionally, the pre-processing block 11 computes IQ values (I: in-phase component; Q: quadrature component) based on the digital representations of the output voltages generated in each correlation measurement. The control 4 outputs the four frames of the ToF measurement as spot iToF data via the communication interface 5 over the data bus 12 (e.g., a data bus in accordance with MIPI (Mobile Industry Processor Interface) specifications) to the circuitry 13.

The circuitry 13 includes, e.g., a processor (e.g., an application processor) and data storage configured to perform the functions as described in the following.

The circuitry 13 executes software including a data processing block 14 which includes procedures having instructions to cause the circuitry 13 to perform the following functions.

The circuitry 13 computes, based on the spot iToF data, ToF properties for each pixel of the ToF sensor, at least:

(1) I=V₀ -V₉₀ ,

(2) Q⁼V_18O°-V₂7O°, and

Here, Vo° is the digital representation of the pixel output voltage for a phase shift of 0° between the modulation and demodulation signal, Vw is the digital representation of the pixel output voltage for a phase shift of 90° between the modulation and demodulation signal, Viso° is the digital representation of the pixel output voltage for a phase shift of 180° between the modulation and demodulation signal, and ¥270° is the digital representation of the pixel output voltage for a phase shift of 270° between the modulation and demodulation signal.

Moreover, the circuitry 13 may compute the intensity for each ToF pixel, which is the squared amplitude, for detecting spots and valleys, as will be discussed under reference of Fig. 2.

Furthermore, the circuitry 13 may compute the confidence for each pixel, which is proportional to the amplitude, as generally known.

The data processing block 14 includes a trained machine learning algorithm 15, which is trained to classify the material of the object 7 based on feature data.

In the following spot and valley detection as well as direct and global separation (DGS) is discussed under reference of Fig. 2 and 3 for preparing computation of the feature data input to the trained machine learning algorithm 15.

Fig. 2 schematically illustrates an embodiment of spot and valley detection.

A ToF sensor 20 (e.g., of the ToF camera 3 in Fig. 1) is schematically illustrated in Fig. 2, which has a plurality of two-tapped CAPD pixels arranged in rows (R-l,. . ., R-m,. . ., R-M; wherein M is an Integer) and columns (C-l,. . ., C-n,. . ., C-N; wherein N is an Integer). Each CAPD pixel has a pixel position identified by its row index (X) and its column index (Y).

The circuitry 13, executing the data processing block 14, obtains spot iToF data acquired by the spot iToF device 100 in a ToF measurement from the control 4.

The spot iToF data include a plurality of spots 21 and valleys 22 characterized by the amplitude/intensity/confidence values (Z) associated with each pixel position.

The lower graph in Fig. 2 schematically depicts intensity values along the line LI parallel to the row R-m in the vicinity of the column C-n.

As illustrated there, a spot may spread over one or more pixels, e.g., centered at column C-n - the spot is illustrated by reference number 21 -LI - until the valley corresponding to the spot 21- L1 - a valley pixel of the valley is illustrated by reference number 22-L1.

The circuitry 13 detects the spot 21 -LI - here a dot - based on a first predetermined intensity threshold Zthl. Furthermore, the circuitry 13 detects whether the spot 21-L1 is saturated based on a second predetermined intensity threshold Zth2.

Once the spot 21-L1 is detected, the circuitry 13 associates the spot 21-L1 with a pixel position, for example, with the position of the center spot pixel given by X = R-m and Y = C-n.

Moreover, the circuitry 13 computes a spot size - here dot size - of the spot 21 -LI (for each spot 21, but illustrated only for spot 21-L1, as the skilled person will appreciate), which is the number of spot pixels included in the spot 21-L1 which have an intensity value above the first predetermined intensity threshold Zthl.

Once the spot size is computed, the circuitry 13 sets a spot pixel window - here dot pixel window - around the center of the spot 21-L1 (center at X = R-m, Y = C-n). Here, for illustration, the dot pixel window is one additional pixel to the spot size (here dot size) in each direction (X, Y).

All pixels which are within the spot size are part of the spot 21-L1 and are referred to as spot pixels.

All pixels which are not within the spot size but within the spot pixel window are part of the valley corresponding to the spot 21 -LI and are referred to as valley pixels (as for example, the valley pixel 22-L1).

The circuitry 13, executing the data processing block 14, further performs DGS for each spot 21, as will be discussed in the following. Fig. 3 schematically illustrates an embodiment of DGS.

The spot 21-L1 is chosen for illustration, but DGS is performed for each detected spot 21.

The Fig. 3 depicts the IQ values of spot 21-L1 (including three spot pixels along LI) in the IQ- plane.

As generally known, in the IQ-plane the angle corresponds to the phase and the distance to the origin corresponds to the amplitude (which is proportional to the confidence).

The IQ values of spot 21 -LI have a different phase due to the global light component present in addition to the direct light component.

The IQ value of valley pixel 22-L1 includes only the global light component.

The circuitry 13 computes for spot 21 -LI (and for all other detected spots 21 similarly) the difference between the IQ values of spot 21 -LI (foe each spot pixel) and the IQ values of valley pixel 22-L1 to get rid of the global light component in the IQ values of the spot 21 -LI, thereby the DGS corrected IQ values 21 -LI -DGS of the spot 21 -LI (for each spot pixel) are obtained.

The circuitry 13 computes for each detected spot 21 the amplitude/confidence/intensity based on the DGS corrected IQ values.

Generally, the phase in iToF is computed by:

(4) phase= arctan y .

The circuitry 13 computes for each detected spot 21 the phase based on IQ values before and after the DGS correction.

The circuitry 13 computes for each valley 22 (for each valley pixel in the valley 22) the amplitude or confidence and the phase.

Thus, for computing the feature data, the circuitry 13 stores - after spot and valley detection and DGS - for each detected spot 21 the following values: pixel position of the center spot pixel of the spot, pixel positions of each spot pixel (within the spot size) of the spot, amplitude or confidence of each spot pixel before and after DGS, phase of each spot pixel before and after DGS and the spot size. These are referred to as pre-feature spot data in the following.

Moreover, for computing the feature data, the circuitry 13 stores - after spot and valley detection and DGS - for each detected valley 22 the following values: pixel position of the corresponding center spot pixel, pixel positions of the valley pixels within the spot pixel window of the corresponding spot, the amplitude or confidence of each valley pixel and the phase of each valley pixel. These are referred to as pre-feature valley data in the following.

Returning to Fig. 1, as mentioned above, the data processing block 14 includes a trained machine learning algorithm, which is trained to classify the material of the object 7 based on feature data.

In the following the computation of the feature data is discussed.

The feature data including a plurality of input features (IF 1) to (IF6) are computed by the circuitry 13 based on the detected spots and valleys using the pre-feature spot data and the prefeature valley data.

The feature data include a plurality of input features (IF 1) to (IF6) computed according to a plurality of predefined features, wherein each input feature (IF 1) to (IF6) includes one or more feature values (depending on the number of detected spots).

In this embodiment, the input features (IF 1) to (IF6) are computed after choosing the corresponding predefined features from a (possibly infinite) pool of predefined features after analyzing the accuracy of the classification results. However, other predefined features may be appropriate as well in other embodiments.

(IF 1) reflectance of the spot: r

This input feature may be understood as a distance-normalize confidence at the maximum spot pixel position (e.g., the center spot pixel position).

(IF2) reflectance of the corresponding valley: r_DGS vALLE¥⁼ZDGS’^confidence_VALLEY-

This input feature may be understood as a distance-normalize confidence around the maximum spot pixel position (e.g., the center spot pixel position). The confidence of any valley pixel may be used for the computation.

(IF3) Spot size: the number of spot pixels in the spot.

(IF4) Ratio between cartesian depth (or phase) of the spot after DGS and cartesian depth (or phase) of the corresponding valley: z_DIV DGS VAI.^{= DGS} • ^VALLEY

(IF5) Ratio between confidence (or amplitude) of the spot after DGS and confidence (or amplitude) of the corresponding valley: conf_{DGS V}AL^{= — CONFLDENCEDGS}

- coniidenceyALLEY

The confidence of the center spot pixel (e.g., the spot pixel with the highest intensity value in the spot). The confidence of any valley pixel may be used for the computation. (IF 6) Ratio between variance of cartesian depth (or phase) of the spot and variance of cartesian depth (or phase) of the corresponding valley: varianceRATio SPOT VALLEY=

The variances are based on all spot pixel spot and all valley pixels corresponding to the spot, respectively.

In (IF 1 ) to (IF6) Z_DOT is the cartesian depth measured at the center spot pixel (e.g., the spot pixel with the highest intensity value in the spot), Z_DGS is the cartesian depth (Z_DOT) after DGS, and ZVALLEY is the cartesian depth measured around the spot (any valley pixel may be used).

The circuitry 13 computes the cartesian depth as follows: from the IQ values the phase can be computed (see (4)), then a calibrated phase is computed therefrom, then a radial depth is computed therefrom, and the cartesian depth is computed therefrom.

The cartesian depth is a lens-distortion-corrected version of the radial depth, which is obtained by projecting the radial depth values to the cartesian depth plane. A camera model obtained during camera calibration process is used for the projection, as discussed for example in:

J. Weng, P. Cohen, and M. Hemiou. Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10):965-980, Oct. 1992.

Once the circuitry 13 has computed the input features (IF 1) to (IF6) according to the plurality of predefined features, the circuitry 13 inputs the feature data including the inputs features (IF 1) to (IF 6) into the trained machine learning algorithm, which is trained to classify the material based on the feature data.

Fig. 4 schematically illustrates an embodiment of the information processing device 1 for facial recognition, which is discussed in the following.

The information processing device 1 is the information processing device 1 of Fig. 1 - a smartphone -, wherein the machine learning algorithm 15 is trained to classify the material into skin or non-skin.

The information processing device 1 includes the spot iToF device 100 (not shown) with the spot illuminator 2 and the ToF camera 3. The information processing device 1 includes the circuitry 13 (not shown). The information processing device 1 includes further a touch-display 30 and a RGB camera 31. The touch-display 30 may include an LCD (Liquid-Crystal Display), an IPS-LCD (In-Plane Switching-Liquid-Crystal Display), an OLED (Organic Light-Emitting Diode) display or an AMOLED (Active-Matrix Organic Light-Emitting Diode) display.

The touch functionality of the touch-display 30 may be based on a capacitive or resistive touch screen.

The RGB camera 31 may include optical parts (e.g., a lens) and an image sensor (e.g., a CCD (“Charged-Coupled Device”) sensor or an active pixel sensor).

A user may wish to unlock the smartphone 1 which requires authentication of the user. For example, two-factor authentication may be employed which requires the user, e.g., to enter a PIN and to perform facial recognition.

Assuming the user has entered the correct PIN, the smartphone starts the facial recognition.

The user may hold the smartphone 1 in front of his face and presses a button 32 depicted on the touch-display such that the smartphone 1 in response thereto acquires an image 33 (as illustrated by the dashed box) of the user’s face with the RGB camera 31 and spot iToF data of the user’s face (the field-of-view of the RGB camera 31 and of the spot iToF device 100 overlap).

The user may press again the button 32 to confirm that the image 33 and the spot iToF data should be used for the facial recognition.

As mentioned in the general explanations, it has been recognized that classifying the material as skin or non-skin for improving facial recognition may benefit from using only input features from certain regions of the human face, since taking feature data of the whole face may blur the distinguishability of skin with respect to other materials.

Hence, the feature data is only computed for pixels (of the ToF sensor 20) which correspond to the regions-of-interest ROI (as illustrated by the dotted boxes).

The ROIs cover forehead, nose, left/right cheek and lips of the face.

The reason is that sub-surface scattering is different in these areas of a human face due to the underlying bones in the forehead and nose region while the left/right cheek and the lips have a softer tissue.

In contrast, for example, for masks made of plastic or silicone formed like the user’s face, these differences are not present.

Hence, the smartphone 1 (the circuitry 13 thereof) computes the input features (IF1) to (IF6) only for the ROIs to obtain the feature data. Then, the circuitry 13 inputs the feature data into the trained machine learning algorithm 15 which outputs a classification result indicating whether the light in the ToF measurement has been thrown back by a material that is skin or non-skin.

If the material is non-skin, the smartphone 1 determines that facial recognition has failed.

If the material is skin, the smartphone 1 proceeds with the facial recognition based on the image 33 and the spot iToF data from which it computes a depth map, as discussed in the general explanations.

Hence, the accuracy of facial recognition may be improved, and spoofing may be avoided.

Fig. 5 schematically illustrates an embodiment of a training of the trained machine learning algorithm 15 of Fig. 1, which is discussed in the following.

The machine learning algorithm 15-t is in the training stage.

The machine learning algorithm 15-t is one of a support vector classifier, a random forest, a decision tree, a k-nearest neighbor algorithm, a naive Bayes classifier and AdaBoost.

The training is based on a plurality of training datasets 50, each training dataset including spot iToF data 51a and a label 51b.

For binary classification, the label 51b indicates whether the measured material is a certain material or belongs to a certain material class, in particular whether the measured material is skin or non-skin.

For multiclass classification, the label 51b indicates the material that was measured or the material class the measured material belongs to.

The plurality of training datasets 50 includes training datasets which have been acquired under various different conditions to increase robustness of the classification.

The various different conditions include, e.g., scenes with a plurality of different objects made of or being covered with different materials; a plurality of different mask (face mask) materials (e.g., silicone, plastic, rubber, wax, wood, paper, latex); a plurality of different persons (e.g., different age, gender, ethnicity, wearing glasses or not, having beard or not); a plurality of different rotations of the objects, persons or masks; and a plurality of different distances between spot iToF device and object, person or mask.

The various different conditions include further a plurality of different hardware, e.g., a plurality of different mobile devices (e.g., under-display camera phones, conventional smartphones), spot illuminators, ToF cameras or the like. Each training dataset is used for training the machine learning algorithm 15-t in the training stage.

A feature data generator 52 obtains the spot iToF data 51a of the respective training dataset and computes the feature data 53 including the input features (IF1) to (IF6).

The feature data 53 is input to the machine learning algorithm 15-t in the training stage.

For binary classification:

The machine learning algorithm 15-t in the training stage outputs a binary value 54 indicating whether the measured material belongs to a predetermined class (e.g., skin) or not.

Binary classification may have thus, in some embodiments, the same amount of examples of skin vs. non-skin (50%). This type of classifier may reduce the probability of false negative but may increase the risk of false positive.

For multiclass classification:

For example, in some embodiments, the machine learning algorithm 15-t in the training stage outputs for each class a probability 54 that the measured material belongs to the respective class.

For example, in some embodiments, the machine learning algorithm 15-t in the training stage outputs the material class 54 to which the material belongs to.

Multiclass classification may have a ratio of skin vs. #number of the other classes (e.g., l/#classes*100%). This type of classifier may reduce the probability of false positives but may increase the risk of false negatives.

A loss function 55, e.g., cross entropy, as generally known, which generates hyperparameter updates 56 based on a difference between the classification of the machine learning algorithm 15-t in the training stage and the label 51b.

Once training is completed, the trained machine learning algorithm 15 is obtained.

Fig. 6 schematically illustrates an embodiment of normalized mean feature values for different material classes, which is discussed in the following.

In Fig. 6 a diagram is shown in which the normalized mean feature values of the input features spot size, TDGS SPOT, TOGS VALLEY, ZDGS V ALLEY and confoos VAL are compared for the material class skin (dotted pattern) and plastic (striped pattern). While TDGS SPOT is basically the same for both materials, the other input features are different and, thus, both material classes can be distinguished based on these input features, which are computed according to predefined features.

Fig. 7 schematically illustrates in a flow diagram an embodiment of an information processing method 300.

The information processing method 300 may be performed by the information processing device as described herein, e.g., by the information processing device 1 of Fig. 1 and 4.

At 301, spot indirect time-of-flight data is acquired, as discussed herein.

At 302, the spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material is obtained, as discussed herein.

At 303, spots and valleys are detected in the spot indirect time-of-flight data, as discussed herein.

At 304, based on properties of the detected spots and valleys, feature data are computed according to a plurality of predefined features, wherein the feature data are computed in accordance with a plurality of predefined region-of-interests, as discussed herein.

At 305, the feature data is input into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data, wherein the machine learning algorithm is trained to classify the material into skin or non-skin, as discussed herein.

At 306, facial recognition using the classification result is performed, wherein, based on the spot indirect time-of-flight data, a depth map is computed and the facial recognition is performed further based on the depth map, as discussed herein.

Fig. 8 schematically illustrates a multi-purpose computer 130 which can be used for implementing a circuitry.

The computer 130 can be implemented such that it can basically function as any type of information processing device as described herein. The computer has components 131 to 141, which can form a circuitry, such as any one of the circuitries of information processing device, as described herein.

Embodiments which use software, firmware, programs or the like for performing the methods as described herein can be installed on computer 130, which is then configured to be suitable for the concrete embodiment.

The computer 130 has a CPU 131 (Central Processing Unit), which can execute various types of procedures and methods as described herein, for example, in accordance with programs stored in a read-only memory (ROM) 132, stored in a storage 137 and loaded into a random access memory (RAM) 133, stored on a medium 140 which can be inserted in a respective drive 139, etc.

The CPU 131, the ROM 132 and the RAM 133 are connected with a bus 141, which in turn is connected to an input/output interface 134. The number of CPUs, memories and storages is only exemplary, and the skilled person will appreciate that the computer 130 can be adapted and configured accordingly for meeting specific requirements which arise, when it functions as an information processing device.

At the input/output interface 134, several components are connected: an input 135, an output 136, the storage 137, a communication interface 138 and the drive 139, into which a medium 140 (compact disc, digital video disc, compact flash memory, or the like) can be inserted.

The input 135 can be a pointer device (mouse, graphic table, or the like), a keyboard, a microphone, a camera, a touchscreen, etc.

The output 136 can have a display (liquid crystal display, cathode ray tube display, light emittance diode display, etc.), loudspeakers, etc.

The storage 137 can have a hard disk, a solid state drive and the like.

The communication interface 138 can be adapted to communicate, for example, via a local area network (LAN), wireless local area network (WLAN), mobile telecommunications system (GSM, UMTS, LTE, NR etc.), Bluetooth, infrared, a data bus (e.g., according to MIPI specifications), etc.

It should be noted that the description above only pertains to an example configuration of computer 130. Alternative configurations may be implemented with additional or other sensors, storage devices, interfaces or the like. For example, the communication interface 138 may receive spot iToF data via a data bus. The computer 130 may include a RGB camera, a touchdisplay and a spot iToF device.

When the computer 130 functions as a base station, the communication interface 138 can further have a respective air interface (providing e.g. E-UTRA protocols OFDMA (downlink) and SC- FDMA (uplink)) and network interfaces (implementing for example protocols such as Sl-AP, GTP-U, SI -MME, X2-AP, or the like). Moreover, the computer 130 may have one or more antennas and/or an antenna array. The present disclosure is not limited to any particularities of such protocols. It should be recognized that the embodiments describe methods with an exemplary ordering of method steps. The specific ordering of method steps is however given for illustrative purposes only and should not be construed as binding.

All units and entities described in this specification and claimed in the appended claims can, if not stated otherwise, be implemented as integrated circuit logic, for example on a chip, and functionality provided by such units and entities can, if not stated otherwise, be implemented by software.

In so far as the embodiments of the disclosure described above are implemented, at least in part, using software-controlled data processing apparatus, it will be appreciated that a computer program providing such software control and a transmission, storage or other medium by which such a computer program is provided are envisaged as aspects of the present disclosure.

Note that the present technology can also be configured as described below.

(1) An information processing device for classifying materials, including circuitry configured to: obtain spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material; compute, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features; and input the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data.

(2) The information processing device of (1), wherein the circuitry is configured to detect spots and valleys in the spot indirect time-of-flight data.

(3) The information processing device of (2), wherein the feature data are computed based on properties of the detected spots and valleys.

(4) The information processing device of (3), wherein the predefined features are predefined features per detected spot requiring computation of reflectance of the spot, reflectance of the corresponding valley, the spot size, ratio between cartesian depth or phase of the spot after direct global separation and cartesian depth or phase of the corresponding valley, ratio between amplitude or confidence of the spot after direct global separation and amplitude or confidence of the corresponding valley, ratio between variance of cartesian depth or phase of the spot and variance of cartesian depth or phase of the corresponding valley. (5) The information processing device of anyone of (1) to (4), wherein the machine learning algorithm is trained to classify the material into skin or non-skin.

(6) The information processing device of (5), wherein the circuitry is configured to perform facial recognition using the classification result.

(7) The information processing device of (6), wherein the circuitry is configured to compute, based on the spot indirect time-of-flight data, a depth map and to perform the facial recognition further based on the depth map.

(8) The information processing device of anyone of (5) to (7), wherein the circuitry is configured to compute the feature data in accordance with a plurality of predefined region-of- interests.

(9) The information processing device of anyone of (1) to (8), wherein the machine learning algorithm is one of a support vector classifier, a random forest, a decision tree, a k-nearest neighbor algorithm, a naive Bayes classifier and AdaBoost.

(10) The information processing device of anyone of (1) to (9), including a spot indirect time- of-flight device to acquire the spot indirect time-of-flight data.

(11) An information processing method for classifying materials, including: obtaining spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material; computing, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features; and inputting the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data.

(12) The information processing method of (11), including detecting spots and valleys in the spot indirect time-of-flight data.

(13) The information processing method of (12), wherein the feature data are computed based on properties of the detected spots and valleys.

(14) The information processing method of (13), wherein the predefined features are predefined features per detected spot requiring computation of reflectance of the spot, reflectance of the corresponding valley, the spot size, ratio between cartesian depth or phase of the spot after direct global separation and cartesian depth or phase of the corresponding valley, ratio between amplitude or confidence of the spot after direct global separation and amplitude or confidence of the corresponding valley, ratio between variance of cartesian depth or phase of the spot and variance of cartesian depth or phase of the corresponding valley.

(15) The information processing method of anyone of (11) to (14), wherein the machine learning algorithm is trained to classify the material into skin or non-skin.

(16) The information processing method of (15), including performing facial recognition using the classification result.

(17) The information processing method of (16), including computing, based on the spot indirect time-of-flight data, a depth map and performing the facial recognition further based on the depth map.

(18) The information processing method of anyone of (15) to (17), including computing the feature data in accordance with a plurality of predefined region-of-interests.

(19) The information processing method of anyone of (11) to (18), wherein the machine learning algorithm is one of a support vector classifier, a random forest, a decision tree, a k- nearest neighbor algorithm, a naive Bayes classifier and AdaBoost.

(20) The information processing method anyone of (11) to (19), including acquiring the spot indirect time-of-flight data.

(21) A computer program comprising program code causing a computer to perform the method according to anyone of (11) to (20), when being carried out on a computer.

(22) A non-transitory computer-readable recording medium that stores therein a computer program product, which, when executed by a processor, causes the method according to anyone of (11) to (20) to be performed.

Claims

1. An information processing device for classifying materials, comprising circuitry configured to: obtain spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material; compute, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features; and input the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data.

2. The information processing device according to claim 1, wherein the circuitry is configured to detect spots and valleys in the spot indirect time-of-flight data.

3. The information processing device according to claim 2, wherein the feature data are computed based on properties of the detected spots and valleys.

4. The information processing device according to claim 3, wherein the predefined features are predefined features per detected spot requiring computation of: reflectance of the spot, reflectance of the corresponding valley, the spot size, ratio between cartesian depth or phase of the spot after direct global separation and cartesian depth or phase of the corresponding valley, ratio between amplitude or confidence of the spot after direct global separation and amplitude or confidence of the corresponding valley, ratio between variance of cartesian depth or phase of the spot and variance of cartesian depth or phase of the corresponding valley.

5. The information processing device according to claim 1, wherein the machine learning algorithm is trained to classify the material into skin or non-skin.

6. The information processing device according to claim 5, wherein the circuitry is configured to perform facial recognition using the classification result.

7. The information processing device according to claim 6, wherein the circuitry is configured to compute, based on the spot indirect time-of-flight data, a depth map and to perform the facial recognition further based on the depth map.

8. The information processing device according to claim 5, wherein the circuitry is configured to compute the feature data in accordance with a plurality of predefined region-of- interests.

9. The information processing device according to claim 1, wherein the machine learning algorithm is one of a support vector classifier, a random forest, a decision tree, a k-nearest neighbor algorithm, a naive Bayes classifier and AdaBoost.

10. The information processing device according to claim 1, comprising a spot indirect time- of-flight device to acquire the spot indirect time-of-flight data.

11. An information processing method for classifying materials, comprising: obtaining spot indirect time-of-flight data acquired in a time-of-flight measurement of light thrown back by a material; computing, based on the spot indirect time-of-flight data, feature data according to a plurality of predefined features; and inputting the feature data into a machine learning algorithm, wherein the machine learning algorithm is trained to classify the material based on the feature data.

12. The information processing method according to claim 11, comprising detecting spots and valleys in the spot indirect time-of-flight data.

13. The information processing method according to claim 12, wherein the feature data are computed based on properties of the detected spots and valleys.

14. The information processing method according to claim 13, wherein the predefined features are predefined features per detected spot requiring computation of: reflectance of the spot, reflectance of the corresponding valley, the spot size, ratio between cartesian depth or phase of the spot after direct global separation and cartesian depth or phase of the corresponding valley, ratio between amplitude or confidence of the spot after direct global separation and amplitude or confidence of the corresponding valley, ratio between variance of cartesian depth or phase of the spot and variance of cartesian depth or phase of the corresponding valley.

15. The information processing method according to claim 11, wherein the machine learning algorithm is trained to classify the material into skin or non-skin.

16. The information processing method according to claim 15, comprising performing facial recognition using the classification result.

17. The information processing method according to claim 16, comprising computing, based on the spot indirect time-of-flight data, a depth map and performing the facial recognition further based on the depth map.

18. The information processing method according to claim 15, comprising computing the feature data in accordance with a plurality of predefined region-of-interests.

19. The information processing method according to claim 11, wherein the machine learning algorithm is one of a support vector classifier, a random forest, a decision tree, a k-nearest neighbor algorithm, a naive Bayes classifier and AdaBoost.

20. The information processing method according to claim 11, comprising acquiring the spot indirect time-of-flight data.