TWI871681B

TWI871681B - System and method for presenting three-dimensional content and three-dimensional content calculation apparatus

Info

Publication number: TWI871681B
Application number: TW112123704A
Authority: TW
Inventors: 林愷翔; 周宏春; 蔡東展; 王傑生; 林士豪; 徐文正
Original assignee: 宏碁股份有限公司
Priority date: 2023-06-26
Filing date: 2023-06-26
Publication date: 2025-02-01
Also published as: TW202501320A

Abstract

A system and a method for presenting three-dimensional content and a three-dimensional content calculation apparatus are provided. In the method, the calculation apparatus receives a request for presentation content including one or more images from a user device, receives the presentation content from a content delivery network according to the request, processes the images using a first machine-learning model to generate a first predicted result, processes the images using multiple machine-learning models to generate at least a second predicted result and a third predicted result, selects a second machine-learning model from the machine-learning models based on a comparison of the first predicted result with the at least the second predicted result and the third predicted result, processes the images using the second machine-learning model and sends a processing result to the user device. Accordingly, the user device generates a three-dimensional presentation of the presentation content.

Description

Three-dimensional content representation system, method and three-dimensional content calculation device

本發明是有關於一種內容表示系統及方法，且特別是有關於一種三維內容表示系統、方法及三維內容計算裝置。The present invention relates to a content representation system and method, and in particular to a three-dimensional content representation system, method and three-dimensional content computing device.

深度神經網路的學習能力可與參數的數目及參數的相對精度呈正相關。因此，對於一般的或非特定的輸入資料集，大型深度神經網路（large deep neural network）（例如，參數數量大於閾值數量的深度神經網路）可實現高的準確度（accuracy）。然而，大型深度神經網路可能伴隨許多缺點。舉例來說，大型深度神經網路可能佔用大量記憶體，消耗大量的處理資源，花費時間處理資料集（對於即時操作來說這可成為問題），使用大量的訓練資料集來將大型深度神經網路訓練到必要的準確度，花費長時間進行訓練，等等。因此，儘管大型深度神經網路可被訓練到對於某些資料集來說可接受的準確度，大型深度神經網路在許多情況下可能並不適用。The learning ability of a deep neural network can be positively correlated with the number of parameters and the relative accuracy of the parameters. Therefore, for general or non-specific input data sets, a large deep neural network (e.g., a deep neural network with a number of parameters greater than the number of thresholds) can achieve high accuracy. However, large deep neural networks may be accompanied by many disadvantages. For example, large deep neural networks may take up a large amount of memory, consume a large amount of processing resources, take time to process the data set (which can be a problem for real-time operation), use a large training data set to train a large deep neural network to the necessary accuracy, take a long time to train, etc. Therefore, although large deep neural networks can be trained to acceptable accuracy for some datasets, they may not be applicable in many cases.

本文中闡述三維內容表示系統，其包括內容遞送網路、用戶端裝置及計算裝置。其中，內容遞送網路提供包括一或多個圖像的表示內容。用戶端裝置發送對於表示內容的請求。計算裝置連接內容遞送裝置及用戶端裝置，經配置以：自用戶端裝置接收請求；根據該請求從內容遞送網路接收表示內容；使用第一機器學習模型對表示內容中的一個或多個圖像進行處理，以產生第一預測結果；使用多個機器學習模型對所述圖像進行處理，以產生至少第二預測結果及第三預測結果，其中第一機器學習模型大於多個機器學習模型；基於對第一預測結果與至少第二預測結果及第三預測結果的比較，而從多個機器學習模型中選擇第二機器學習模型；以及使用第二機器學習模型對所述圖像進行處理，並將處理結果傳送至用戶端裝置，其中用戶端裝置使用該處理結果產生表示內容的三維表示。This article describes a three-dimensional content representation system, which includes a content delivery network, a client device, and a computing device. The content delivery network provides representation content including one or more images. The client device sends a request for the representation content. The computing device is connected to the content delivery device and the client device, and is configured to: receive the request from the client device; receive the representation content from the content delivery network according to the request; use a first machine learning model to process one or more images in the representation content to generate a first prediction result; use multiple machine learning models to process the images to generate at least a second prediction result and a third prediction result, wherein wherein the first machine learning model is larger than the plurality of machine learning models; selecting a second machine learning model from the plurality of machine learning models based on a comparison of the first prediction result with at least the second prediction result and the third prediction result; and processing the image using the second machine learning model and transmitting the processing result to the client device, wherein the client device uses the processing result to generate a three-dimensional representation of the content.

本文中闡述三維內容表示方法，其適用於包括內容遞送網路、用戶端裝置及計算裝置的三維內容表示系統。所述方法包括下列步驟：計算裝置自用戶端裝置接收對於一表示內容的請求，該表示內容包括一或多個圖像；計算裝置根據請求從內容遞送網路接收表示內容；計算裝置使用第一機器學習模型對表示內容中的一個或多個圖像進行處理，以產生第一預測結果；計算裝置使用多個機器學習模型對所述圖像進行處理，以產生至少第二預測結果及第三預測結果，其中第一機器學習模型大於所述多個機器學習模型；計算裝置基於對第一預測結果與所述至少第二預測結果及第三預測結果的比較，而從多個機器學習模型中選擇第二機器學習模型；計算裝置使用第二機器學習模型對所述圖像進行處理，並將處理結果傳送至用戶端裝置；以及用戶端裝置使用處理結果產生表示內容的三維表示。The present invention discloses a three-dimensional content representation method, which is applicable to a three-dimensional content representation system including a content delivery network, a client device and a computing device. The method includes the following steps: the computing device receives a request for a representation content from the client device, the representation content including one or more images; the computing device receives the representation content from the content delivery network according to the request; the computing device processes one or more images in the representation content using a first machine learning model to generate a first prediction result; the computing device processes the images using multiple machine learning models to generate at least a second prediction result and a second prediction result. three prediction results, wherein the first machine learning model is greater than the multiple machine learning models; the computing device selects a second machine learning model from the multiple machine learning models based on a comparison of the first prediction result with the at least second prediction result and the third prediction result; the computing device processes the image using the second machine learning model and transmits the processing result to the client device; and the client device uses the processing result to generate a three-dimensional representation of the content.

本文中闡述三維內容計算裝置，其包括通信介面、儲存裝置及儲存裝置。通信介面用以與用戶端裝置及內容遞送網路進行通信。儲存裝置用以儲存由處理器執行操作的指令。處理器耦接通信介面及儲存裝置，經配置以存取並執行儲存裝置儲存的指令以通過通信介面自用戶端裝置接收對於一表示內容的請求，該表示內容包括一或多個圖像，根據該請求通過通信介面從內容遞送網路接收表示內容，使用第一機器學習模型對表示內容中的一個或多個圖像進行處理，以產生第一預測結果，使用多個機器學習模型對所述圖像進行處理，以產生至少第二預測結果及第三預測結果，其中第一機器學習模型大於所述多個機器學習模型，基於對第一預測結果與所述至少第二預測結果及第三預測結果的比較，而從所述多個機器學習模型中選擇第二機器學習模型，以及使用第二機器學習模型對所述一或多個圖像進行處理，並將處理結果傳送至用戶端裝置，使得用戶端裝置使用該處理結果產生表示內容的三維表示。The present invention discloses a three-dimensional content computing device, which includes a communication interface, a storage device and a storage device. The communication interface is used to communicate with a client device and a content delivery network. The storage device is used to store instructions for operations performed by a processor. The processor is coupled to the communication interface and the storage device, and is configured to access and execute instructions stored in the storage device to receive a request for a representation content from the client device through the communication interface, and the representation content includes one or more images, and receives the representation content from the content delivery network through the communication interface according to the request, uses a first machine learning model to process one or more images in the representation content to generate a first prediction result, and uses multiple machine learning models to process the images to generate at least a second prediction result and a third prediction result, wherein the first machine learning model is larger than the multiple machine learning models, selecting a second machine learning model from the multiple machine learning models based on a comparison of the first prediction result with the at least second prediction result and the third prediction result, and processing the one or more images using the second machine learning model, and transmitting the processing result to the client device, so that the client device uses the processing result to generate a three-dimensional representation of the representation content.

提及這些例示性例子並不是為了限制或定義本揭露，而是為了幫助理解本揭露。在具體實施方式中論述另外的實施例，且在具體實施方式中提供進一步的說明。These exemplary examples are not mentioned to limit or define the present disclosure, but to help understand the present disclosure. Additional embodiments are discussed in the specific embodiments, and further description is provided in the specific embodiments.

機器學習模型的學習能力可與模型的參數數量或層數量相關。增強學習能力（例如，通過增大參數數量或層數量）可使機器學習模型能夠從更廣泛的資料集中學習。舉例來說，增大分類器（classifier）的參數數量可使分類器能夠可靠地區分出的分類（classification）增多。增大機器學習模型的參數數量或層數量還可增大執行模型的處理成本（例如，處理負載、執行時間、訓練時間等），此可能使機器學習模型在某些條件（例如，諸如即時操作等）下無法進行操作。The learning ability of a machine learning model can be related to the number of parameters or the number of layers of the model. Increasing the learning ability (e.g., by increasing the number of parameters or the number of layers) can enable the machine learning model to learn from a wider set of data. For example, increasing the number of parameters of a classifier can increase the number of classifications that the classifier can reliably distinguish. Increasing the number of parameters or the number of layers of a machine learning model can also increase the processing cost of executing the model (e.g., processing load, execution time, training time, etc.), which may make the machine learning model inoperable under certain conditions (e.g., such as real-time operation, etc.).

本文中闡述用於針對離散處理任務的機器學習模型選擇的方法及系統。可對多個小型機器學習模型進行實例化（instantiated）及訓練以代替大型機器學習模型來對輸入資料集進行處理。由於每個小型機器學習模型可包括比大型機器學習模型少的參數或層，因此小型機器學習模型可被配置成針對給定的輸入資料集的一部分，實現與大型機器學習模型相同程度的準確度。每個小型機器學習模型可被配置成對特定的輸入資料集（例如，包括特定特性的資料集等）進行處理，或者可被配置成產生特定的輸出（例如，諸如由大型機器學習模型被配置產生的可能輸出的子集等）。所述多個小型機器學習模型可一起被配置成以與大型機器學習模型類似的準確度和/或損失，對大型機器學習模型被配置成處理的相同輸入資料集進行處理。然而，具有比大型機器學習模型少的參數或層的小型機器學習模型可更高效地進行操作（例如，使用更少的處理資源進行儲存和/或執行、使用更小的訓練資料集進行訓練、更快的訓練時間，等等）。Methods and systems for machine learning model selection for discrete processing tasks are described herein. Multiple small machine learning models can be instantiated and trained to process an input data set in place of a large machine learning model. Because each small machine learning model can include fewer parameters or layers than the large machine learning model, the small machine learning models can be configured to achieve the same degree of accuracy as the large machine learning model for a given portion of the input data set. Each small machine learning model can be configured to process a specific input data set (e.g., a data set including specific characteristics, etc.), or can be configured to produce specific outputs (e.g., a subset of possible outputs that the large machine learning model is configured to produce, etc.). The multiple small machine learning models can be configured together to process the same input data set that the large machine learning model is configured to process with similar accuracy and/or loss as the large machine learning model. However, the small machine learning models having fewer parameters or layers than the large machine learning model can operate more efficiently (e.g., using fewer processing resources to store and/or execute, using smaller training data sets for training, faster training times, etc.).

舉例來說，可將大型分類器配置成基於輸入圖像內的物件，根據多個不同的類別對輸入圖像進行分類。可對第一小型機器學習模型進行實例化，以根據不同類別的子集對輸入圖像進行分類，且可對第二機器學習模型進行實例化，以根據其餘的不同類別對輸入圖像進行分類。作為另外一種選擇或附加地，可對第一機器學習模型進行實例化，對以自然光（natural lighting）（例如，日光等）為特徵的輸入圖像進行分類，且可對第二機器學習模型進行實例化，對以合成光（synthetic lighting）（例如，閃光、白熾光、螢光等）為特徵的輸入圖像進行分類。For example, a large classifier can be configured to classify an input image according to multiple different categories based on objects within the input image. A first small machine learning model can be instantiated to classify the input image according to a subset of the different categories, and a second machine learning model can be instantiated to classify the input image according to the remaining different categories. Alternatively or additionally, the first machine learning model can be instantiated to classify input images characterized by natural lighting (e.g., daylight, etc.), and the second machine learning model can be instantiated to classify input images characterized by synthetic lighting (e.g., flash, incandescent, fluorescent, etc.).

在一些例子中，可將大型機器學習模型壓縮成小型機器學習模型。對機器學習模型進行壓縮可減小參數數量或層數量，此可使得經壓縮的機器學習模型適用於對大型機器學習模型能夠處理的輸入資料集的一部分進行處理。一旦進行了壓縮，便可通過使用不同的訓練資料集對經壓縮的機器學習模型進行訓練，而對所述多個小型機器學習模型進行實例化。可對每一小型機器學習模型進行訓練，以對對應的大型機器學習模型預期進行處理的一系列輸入資料集進行處理。In some examples, a large machine learning model can be compressed into small machine learning models. Compressing the machine learning model can reduce the number of parameters or the number of layers, which can make the compressed machine learning model suitable for processing a portion of the input data set that the large machine learning model can process. Once compressed, the multiple small machine learning models can be instantiated by training the compressed machine learning models using different training data sets. Each small machine learning model can be trained to process a range of input data sets that the corresponding large machine learning model is expected to process.

一旦進行了訓練，便可使用大型機器學習模型及多個小型機器學習模型對輸入資料集進行處理。對於任意輸入資料集，模型選擇器可確定（大型機器學習模型及所述多個小型機器學習模型之中的）哪個機器學習模型應對特定的輸入資料集進行處理。在一些例子中，模型選擇器可對輸入資料集進行採樣以產生測試特徵向量，並將測試特徵向量作為輸入，傳遞到機器學習模型中以產生對應的測試輸出。對於深度神經網路（deep neural network，DNN）而言，所述DNN的一個或多個初始層可作為特徵提取器來操作。可將來自大型機器學習模型的測試輸出標記為偽真值（pseudo ground truth）（例如，假設為真）。模型選擇器然後可將來自每一小型機器學習模型的測試輸出與來自大型機器學習模型的測試輸出進行比較。在一些實例中，模型選擇器可使用準確度度量（accuracy metric）或損失函數（loss function）（例如，準確度、精度、曲線下面積（area under the curve）、對數損失（logarithmic loss）、F1分數、加權人類不同意率（weighted human disagreement rate）、交叉熵（cross entropy）、平均絕對誤差（mean absolute error）、均方誤差（mean square error）等）。模型選擇器可對所述多個小型機器學習模型中具有最高準確度度量或最低損失（根據損失函數）的特定的小型機器學習模型進行識別。模型選擇器然後可使用特定的小型機器學習模型對特定的輸入資料集的其餘部分進行處理。Once trained, the large machine learning model and multiple small machine learning models can be used to process an input data set. For any input data set, a model selector can determine which machine learning model (among the large machine learning model and the multiple small machine learning models) should process a particular input data set. In some examples, the model selector can sample the input data set to generate a test feature vector, and pass the test feature vector as an input to the machine learning model to generate a corresponding test output. For a deep neural network (DNN), one or more initial layers of the DNN can operate as a feature extractor. The test output from the large machine learning model can be labeled as a pseudo ground truth (e.g., assumed to be true). The model selector can then compare the test output from each small machine learning model to the test output from the large machine learning model. In some examples, the model selector can use an accuracy metric or loss function (e.g., accuracy, precision, area under the curve, logarithmic loss, F1 score, weighted human disagreement rate, cross entropy, mean absolute error, mean square error, etc.). The model selector can identify a specific small machine learning model among the plurality of small machine learning models that has the highest accuracy measure or the lowest loss (according to the loss function). The model selector can then use the specific small machine learning model to process the remainder of the specific input data set.

在例示性例子中，單目深度估測（monocular depth estimation）可由一個或多個機器學習模型（例如DNN或類似模型）來實行，所述一個或多個機器學習模型用於各種電腦視覺操作（例如，但不限於分類、語義分割、物件檢測、實例分割、深度估測等）（例如，諸如，用於無人駕駛汽車的自動駕駛、虛擬實境、增強現實、三維（three-dimensional，3D）類比、目標獲取等）。計算裝置可對大型DNN進行實例化（定義及訓練），以對各種圖像、視訊幀或視訊片段進行處理，從而使用單目深度估測來產生深度圖或逆深度圖（inverse depth map）。深度圖可將圖像的每一像素表示為所述像素所表示環境中的位置與相機之間的距離（例如，實數）。還可對多個小型DNN進行訓練以對各種輸入圖像、視訊幀和/或視訊片段進行處理。在一些實例中，可通過對大型DNN進行壓縮且對經壓縮的大型DNN進行訓練來產生每一個小型DNN。In an illustrative example, monocular depth estimation may be performed by one or more machine learning models (e.g., DNN or similar models) used for various computer vision operations (e.g., but not limited to, classification, semantic segmentation, object detection, instance segmentation, depth estimation, etc.) (e.g., for autonomous driving of self-driving cars, virtual reality, augmented reality, three-dimensional (3D) analogy, object acquisition, etc.). A computing device may instantiate (define and train) a large DNN to process various images, video frames, or video clips to generate a depth map or an inverse depth map using monocular depth estimation. The depth map may represent each pixel of the image as a distance (e.g., a real number) between the location in the environment represented by the pixel and the camera. Multiple small DNNs may also be trained to process various input images, video frames, and/or video clips. In some examples, each small DNN may be generated by compressing a large DNN and training the compressed large DNN.

計算裝置可接收多個圖像。所述多個圖像可為不同的圖像、從視訊幀提取的圖像、從視訊片段提取的圖像或類似圖像。可從內容遞送網路、用戶端裝置、另一計算裝置、相機（例如，諸如直播相機串流（live camera stream）或先前儲存的從相機捕捉的圖像）、伺服器等接收圖像。作為另外一種選擇，計算裝置可通過從儲存在計算裝置的記憶體中的視訊片段提取圖像來接收圖像。The computing device may receive a plurality of images. The plurality of images may be different images, images extracted from video frames, images extracted from video clips, or the like. The images may be received from a content delivery network, a client device, another computing device, a camera (e.g., such as a live camera stream or previously stored images captured from a camera), a server, and the like. Alternatively, the computing device may receive the images by extracting the images from a video clip stored in a memory of the computing device.

計算裝置可從所述多個圖像選擇一個或多個圖像。在一些實例中，計算裝置對所述多個圖像進行採樣以匯出所述一個或多個圖像。The computing device may select one or more images from the plurality of images. In some embodiments, the computing device samples the plurality of images to export the one or more images.

計算裝置可使用大型DNN對所述一個或多個圖像進行處理，以產生與大型DNN的輸出對應的第一預測結果。舉例來說，計算裝置可使用所述一個或多個圖像產生特徵向量，可將特徵向量作為輸入傳遞到大型DNN中。大型DNN可對特徵向量進行處理且輸出第一預測結果（例如，深度圖或反向深度圖（reverse depth map）等）。在一些例子中，計算裝置可將第一預測結果視為偽真值。The computing device may process the one or more images using the large DNN to generate a first prediction result corresponding to the output of the large DNN. For example, the computing device may generate a feature vector using the one or more images, and the feature vector may be passed as an input to the large DNN. The large DNN may process the feature vector and output a first prediction result (e.g., a depth map or a reverse depth map, etc.). In some examples, the computing device may treat the first prediction result as a false value.

計算裝置可使用所述多個小型DNN對所述一個或多個圖像進行處理，以產生附加的預測結果。舉例來說，第一小型DNN可對一個或多個圖像進行處理以產生第二預測結果，第二小型DNN可對一個或多個圖像進行處理以產生第三預測結果，等等。小型DNN中的每一者可小於大型DNN（例如，更少的參數和/或層等）。The computing device may process the one or more images using the multiple small DNNs to generate additional prediction results. For example, the first small DNN may process the one or more images to generate a second prediction result, the second small DNN may process the one or more images to generate a third prediction result, and so on. Each of the small DNNs may be smaller than the large DNN (e.g., fewer parameters and/or layers, etc.).

計算裝置可基於對第一預測結果與第二預測結果、第三預測結果等的比較，而從所述多個小型DNN中選擇小型DNN。在一些實例中，計算裝置可使用一個或多個準確度度量和/或損失函數來對第二預測結果、第三預測結果等相對於第一預測結果進行比較。舉例來說，由於預測結果包括深度圖或反向深度圖（例如，將每一像素表示為距相機的實數距離），因此可使用損失函數來確定第一預測結果（其被標記為偽真值）與第二預測結果、第一預測結果及第三預測結果等之間的差異。損失函數的例子包括但不限於自適性強健損失（adaptive robust loss）、均方誤差、平均絕對誤差、交叉熵、加權人類不同意率（weighted human disagreement rate，WHDR）、其組合或類似函數。計算裝置可選擇具有最高準確度度量、最低誤差、最低損失等的特定的小型DNN。The computing device may select a small DNN from the plurality of small DNNs based on a comparison of the first prediction with the second prediction, the third prediction, etc. In some examples, the computing device may use one or more accuracy metrics and/or loss functions to compare the second prediction, the third prediction, etc. relative to the first prediction. For example, since the prediction includes a depth map or an inverse depth map (e.g., representing each pixel as a real distance from the camera), a loss function may be used to determine the difference between the first prediction (which is marked as a false value) and the second prediction, the first prediction, the third prediction, etc. Examples of loss functions include, but are not limited to, adaptive robust loss, mean squared error, mean absolute error, cross entropy, weighted human disagreement rate (WHDR), combinations thereof, or similar functions. The computing device may select a particular small DNN with the highest accuracy metric, lowest error, lowest loss, etc.

計算裝置然後可使用特定的小型DNN對所述多個圖像進行處理，以從所述多個圖像產生深度圖或反向深度圖。在一些實例中，計算裝置可對所述多個圖像中的每一者進行處理。在其他實例中，計算裝置可通過對所述多個圖像進行採樣來對所述多個圖像的一部分進行處理。舉例來說，計算裝置對所述多個圖像中的每第n個圖像進行處理。The computing device may then process the plurality of images using a specific small DNN to generate a depth map or a reverse depth map from the plurality of images. In some examples, the computing device may process each of the plurality of images. In other examples, the computing device may process a portion of the plurality of images by sampling the plurality of images. For example, the computing device processes every nth image of the plurality of images.

在一些例子中，可重複進行模型選擇過程，以確保特定的小型DNN仍然是用於對所述多個圖像進行處理的最高效的小型DNN。在檢測到事件時、在檢測到用戶輸入時、在其中執行特定的小型DNN的預定數量的實例之後、在檢測到所述多個圖像的一個或多個特性的改變（例如，諸如平均像素值的改變等）時、在以上情況的組合下或在類似情況下，可以規則的時間間隔重新執行模型選擇過程。計算裝置可連續地確保使用最高效的小型DNN對所述多個圖像進行處理。In some examples, the model selection process may be repeated to ensure that a particular small DNN remains the most efficient small DNN for processing the plurality of images. The model selection process may be re-executed at regular intervals upon detection of an event, upon detection of user input, after a predetermined number of instances of a particular small DNN are executed therein, upon detection of a change in one or more characteristics of the plurality of images (e.g., such as a change in average pixel value, etc.), in a combination of the above, or in a similar situation. The computing device may continuously ensure that the most efficient small DNN is used to process the plurality of images.

模型選擇過程可應用於各種機器學習模型，以確定出對完全不同的資料集進行處理的高效方式。這樣一來，可將本文中闡述的技術應用於深度神經網路（如先前所闡述）以及任何其他類型的機器學習模型。The model selection process can be applied to a variety of machine learning models to determine how to efficiently process completely different sets of data. In this way, the techniques described in this article can be applied to deep neural networks (as described previously) as well as any other type of machine learning model.

圖1示出根據本揭露各方面的用於選擇被配置成對完全不同的資料集進行處理的機器學習模型的示例性系統的方塊圖。計算裝置104可被配置成對用於鄰近裝置（例如，諸如在同一網路內進行操作的裝置）和/或遠端裝置（例如，諸如在其他網路內進行操作的裝置等）的完全不同的資料集進行處理。計算裝置104可包括通過匯流排或類似裝置連接的中央處理單元（central processing unit，CPU）108、記憶體112（例如，揮發性記憶體（例如隨機存取記憶體等）及非揮發性記憶體（例如快閃記憶體、硬碟等））、輸入/輸出介面116、網路介面120及資料處理器124。在一些實施方案中，計算裝置104可包括附加的元件或更少的元件。1 is a block diagram of an exemplary system for selecting a machine learning model configured to process disparate data sets according to aspects of the present disclosure. The computing device 104 can be configured to process disparate data sets for nearby devices (e.g., such as devices operating within the same network) and/or remote devices (e.g., such as devices operating within other networks, etc.). The computing device 104 may include a central processing unit (CPU) 108, a memory 112 (e.g., volatile memory (e.g., random access memory, etc.) and non-volatile memory (e.g., flash memory, hard disk, etc.)) connected via a bus or similar device, an input/output interface 116, a network interface 120, and a data processor 124. In some implementations, the computing device 104 may include additional components or fewer components.

輸入/輸出介面116可包括一個或多個硬體和/或軟體介面，所述一個或多個硬體和/或軟體介面被配置成從一個或多個裝置132接收資料和/或將資料傳送到所述一個或多個裝置132，所述一個或多個裝置132連接到計算裝置104，且例如，但不限於顯示裝置、鍵盤及滑鼠、感測器、週邊裝置、媒體串流裝置（media streaming device）、增強現實裝置、虛擬實境裝置和/或類似裝置。在例示性例子中，一個或多個裝置132的第一裝置可為虛擬實境顯示裝置，所述虛擬實境顯示裝置被配置成對媒體的三維表示（例如，視訊、視訊遊戲、一個或多個圖像等）進行投影。如果媒體不包括三維資料（例如，媒體是二維的，等等），則計算裝置104可使用資料處理器124執行單目深度估測以產生深度圖，並從所述深度圖產生媒體的三維表示。計算裝置104然後可經由輸入/輸出介面116將媒體的三維表示傳送到虛擬實境顯示器。一個或多個裝置132可通過有線連接（例如，通用序列匯流排（universal serial bus，USB）類型A、USB類型B或USB類型C；高解析度多媒體介面（high-definition multimedia interface，HDMI）；數位視訊介面（digital visual interface，DVI）；顯示埠（DisplayPort）；等等）或無線連接（例如，諸如但不限於無線保真（wireless fidelity，Wi-Fi）、藍牙、紫蜂（Zigbee）、Z波（Z-wave）、紅外線、超寬頻等）。The input/output interface 116 may include one or more hardware and/or software interfaces configured to receive data from and/or transmit data to one or more devices 132 connected to the computing device 104, such as, but not limited to, a display device, a keyboard and mouse, a sensor, a peripheral device, a media streaming device, an augmented reality device, a virtual reality device, and/or the like. In an illustrative example, a first device of the one or more devices 132 may be a virtual reality display device configured to project a three-dimensional representation of a media (e.g., a video, a video game, one or more images, etc.). If the media does not include three-dimensional data (e.g., the media is two-dimensional, etc.), the computing device 104 may perform monocular depth estimation using the data processor 124 to generate a depth map and generate a three-dimensional representation of the media from the depth map. The computing device 104 may then transmit the three-dimensional representation of the media to the virtual reality display via the input/output interface 116. One or more devices 132 can be connected via a wired connection (e.g., universal serial bus (USB) type A, USB type B, or USB type C; high-definition multimedia interface (HDMI); digital visual interface (DVI); DisplayPort; etc.) or a wireless connection (e.g., such as but not limited to wireless fidelity (Wi-Fi), Bluetooth, Zigbee, Z-wave, infrared, ultra-wideband, etc.).

網路介面120可實現通過網路128（例如，網際網路、區域網路、廣域網路、雲端網路等）向一個或多個遠端裝置的連接。在一些例子中，計算裝置104可通過網路介面120接收使用資料處理器124對資料進行處理的請求。一旦接收到請求，計算裝置104便可將資料儲存在記憶體112中，使用資料處理器124對資料進行處理，且通過網路128將輸出傳送到請求裝置（或者一個或多個其他裝置）。作為另外一種選擇或附加地，可通過一個或多個裝置132來呈現所述輸出。在一些例子中，資料處理器124可即時地對所接收的資料進行處理。在該些例子中，資料處理器124可在接收到串流資料（streamed data）（經由網路介面120或輸入/輸出介面116接收到串流資料）時對串流資料進行處理，或者可將串流的一部分儲存在記憶體112中的緩衝器中且每當緩衝器滿時對儲存在緩衝器中的串流資料的一部分進行處理。The network interface 120 may enable connection to one or more remote devices via a network 128 (e.g., the Internet, a local area network, a wide area network, a cloud network, etc.). In some examples, the computing device 104 may receive a request to process data using the data processor 124 via the network interface 120. Once the request is received, the computing device 104 may store the data in the memory 112, process the data using the data processor 124, and transmit the output to the requesting device (or one or more other devices) via the network 128. Alternatively or additionally, the output may be presented via one or more devices 132. In some examples, the data processor 124 may process the received data in real time. In these examples, the data processor 124 may process the streamed data as it is received (via the network interface 120 or the input/output interface 116), or may store a portion of the stream in a buffer in the memory 112 and process a portion of the streamed data stored in the buffer whenever the buffer is full.

在一些實施方案中，資料處理器124可為計算裝置104的經由匯流排連接到CPU 108、記憶體112、輸入/輸出介面116及網路介面120的獨立元件。資料處理器124可被配置成在計算裝置104內進行操作或者可獨立於計算裝置104進行操作。舉例來說，資料處理器124可為被配置成對儲存在資料處理器124的記憶體中的指令進行處理的特殊應用積體電路（application specific integrated circuit，ASIC）、現場可程式化邏輯閘陣列、光罩可程式化閘陣列（mask programmable gate array）、微控制器或類似裝置。作為另外一種選擇，資料處理器124可為被配置成對各種資料集進行處理的非揮發性記憶體（作為連接到匯流排的獨立元件或記憶體112的子組件），所述非揮發性記憶體儲存指令。指令可由CPU 108（和/或計算裝置104的其他元件）執行。In some implementations, the data processor 124 may be a separate component of the computing device 104 that is connected to the CPU 108, the memory 112, the input/output interface 116, and the network interface 120 via a bus. The data processor 124 may be configured to operate within the computing device 104 or may operate independently of the computing device 104. For example, the data processor 124 may be an application specific integrated circuit (ASIC), a field programmable logic gate array, a mask programmable gate array, a microcontroller, or the like that is configured to process instructions stored in a memory of the data processor 124. Alternatively, data processor 124 may be a non-volatile memory (either as a separate component connected to the bus or as a subcomponent of memory 112) configured to process various data sets, the non-volatile memory storing instructions. The instructions may be executed by CPU 108 (and/or other components of computing device 104).

資料處理器124可包括：模型選擇器136，被配置成選擇用於對特定資料集進行處理的特定的機器學習模型；特徵提取器140，被配置成產生所選擇機器學習模型（例如，不需要額外特徵提取器的模型）的輸入特徵向量；訓練資料144，對用於機器學習模型的訓練資料進行儲存；大型機器學習（ML）模型148；以及一個或多個小型機器學習模型（例如，諸如小型ML模型1 152到小型ML模型n 156，其中n可為大於1的任何整數）。The data processor 124 may include: a model selector 136 configured to select a particular machine learning model for processing a particular data set; a feature extractor 140 configured to generate an input feature vector for a selected machine learning model (e.g., a model that does not require an additional feature extractor); training data 144 for storing training data for the machine learning model; a large machine learning (ML) model 148; and one or more small machine learning models (e.g., such as small ML model 1 152 to small ML model n 156, where n can be any integer greater than 1).

資料處理器124可使用兩個或更多個機器學習模型對各種類型的資料集進行處理。所述兩個或更多個機器學習模型可具有不同的大小，所述不同的大小允許資料處理器124基於資料處理器124和/或計算裝置104的當前狀態來動態地選擇用於對給定的資料集進行處理的最高效的機器學習模型，或者動態地切換到不同的機器學習模型。所述兩個或更多個機器學習模型可包括大型機器學習模型（例如，具有大於閾值的參數數量或層數量的機器學習模型）及一個或多個小型機器學習模型（例如，具有小於閾值的參數數量或層數量的機器學習模型）。The data processor 124 may use two or more machine learning models to process various types of data sets. The two or more machine learning models may have different sizes, which allow the data processor 124 to dynamically select the most efficient machine learning model for processing a given data set based on the current state of the data processor 124 and/or the computing device 104, or dynamically switch to a different machine learning model. The two or more machine learning models may include a large machine learning model (e.g., a machine learning model with a number of parameters or a number of layers greater than a threshold) and one or more small machine learning models (e.g., a machine learning model with a number of parameters or a number of layers less than a threshold).

機器學習模型的大小（例如，神經網路的參數數量、層數量等）可指示機器學習模型的學習潛力。可對大型機器學習模型進行訓練以對一般資料集（例如，可能不與任何分類法（taxonomy）對應或者可能不具有任何特定的共用特性的資料集）進行處理。舉例來說，被訓練成對圖像內的物件進行分類的大型圖像分類器可對隨機採樣的輸入圖像（例如，日光、室內、夜間或弱光、待分類的對象被遮擋或遠離相機、待分類的對象清晰且靠近相機等）進行分類。當對特定類型的圖像進行分類時，小型機器學習模型可具有較低的準確度和/或較高的損失。舉例來說，被訓練成對圖像內的物件進行分類的小型圖像分類器可對共用特定特性的圖像（例如，諸如在白天或在大量光照下拍攝的圖像）進行分類且可在對具有不同特性的圖像（例如，諸如在夜間或弱光條件等下捕獲的圖像）進行分類時具有較低的準確度或較高的損失。The size of a machine learning model (e.g., the number of parameters of a neural network, the number of layers, etc.) can indicate the learning potential of the machine learning model. Large machine learning models can be trained to process general datasets (e.g., datasets that may not correspond to any taxonomy or may not have any particular common characteristics). For example, a large image classifier trained to classify objects within an image can classify randomly sampled input images (e.g., daylight, indoors, night or low light, objects to be classified are obscured or far away from the camera, objects to be classified are clear and close to the camera, etc.). Small machine learning models may have lower accuracy and/or higher loss when classifying certain types of images. For example, a small image classifier trained to classify objects within an image may classify images that share certain characteristics (e.g., images taken during the day or with a lot of light) and may have lower accuracy or higher loss when classifying images with different characteristics (e.g., images captured at night or in low-light conditions, etc.).

大型機器學習模型相比於對應的小型機器學習模型可具有更大的記憶體佔用面積且可使用更多的處理資源（例如，CPU 108、快取記憶體或揮發性記憶體、非揮發性記憶體、頻寬等）。大型機器學習模型也可具有與小型機器學習模型不同的訓練時間間隔且在更長的時間間隔內執行，使得大型機器學習模型的使用對於時間敏感操作（time-sensitive operation）來說更加複雜。Large machine learning models may have a larger memory footprint and may use more processing resources (e.g., CPU 108, cache or volatile memory, non-volatile memory, bandwidth, etc.) than corresponding small machine learning models. Large machine learning models may also have different training intervals than small machine learning models and execute over longer time intervals, making the use of large machine learning models more complex for time-sensitive operations.

機器學習模型148到156可為任何類型的機器學習模型，包括但不限於神經網路、深度神經網路、變換器（transformer）、分類器、支援向量機、決策樹等。在一些例子中，可通過對大型機器學習模型148進行壓縮（在大型機器學習模型148被訓練之前、期間或之後）來產生機器學習模型152到156。在該些例子中，可通過修剪（pruning）（例如，移除不必要的參數或層等）、量化（例如，減小參數的記憶體佔用面積等）、知識提煉（knowledge distillation）（例如，對小型機器學習模型進行訓練以類比大型機器學習模型）、低秩因數分解（low-rank factorization）等來對大型機器學習模型148進行壓縮。Machine learning models 148-156 may be any type of machine learning model, including but not limited to neural networks, deep neural networks, transformers, classifiers, support vector machines, decision trees, etc. In some examples, machine learning models 152-156 may be generated by compressing a large machine learning model 148 (before, during, or after the large machine learning model 148 is trained). In these examples, the large machine learning model 148 can be compressed by pruning (e.g., removing unnecessary parameters or layers, etc.), quantization (e.g., reducing the memory area occupied by parameters, etc.), knowledge distillation (e.g., training a small machine learning model to simulate a large machine learning model), low-rank factorization, etc.

可通過資料處理器124對大型機器學習模型148進行訓練和/或壓縮。一個或多個小型機器學習模型152到156可通過對大型機器學習模型148進行壓縮來產生和/或由資料處理器124來訓練。模型選擇器136可針對所選擇的處理任務來確定要執行處理任務的類型的機器學習模型。模型選擇器136可將訓練請求傳遞到特徵提取器140。特徵提取器140可產生訓練資料集以對要執行處理任務的類型的機器學習模型進行訓練。資料處理器124可使用儲存在訓練資料144中的、所產生的（例如，由特徵提取器140按程式產生的或從使用者輸入接收的）、或從一個或多個遠端裝置（例如，一個或多個裝置132、經由網路128連接的一個或多個遠端裝置等）接收的訓練資料對機器學習模型進行訓練，以實行一個或多個操作。訓練資料144可儲存資料，所述資料被配置成對機器學習模型進行訓練以對特定類型的輸入資料進行處理。舉例來說，訓練資料144可儲存圖像資料，使得可將機器學習模型訓練成對圖像進行處理（例如，產生深度圖、對圖像進行分類、檢測物件等）。訓練資料144還可儲存歷史資料（例如，與機器學習模型148到156的歷史執行相關聯的資料等）、所產生的資料、所接收的資料等。如果將要獨立於大型機器學習模型148而對一個或多個小型機器學習模型152到156進行訓練，則特徵提取器140可基於機器學習模型的類型及將要被訓練的機器學習模型的大小來產生用於所述一個或多個小型機器學習模型152到156的訓練集。用於對所述一個或多個小型機器學習模型152到156進行訓練的訓練資料集可與用於對大型機器學習模型148進行訓練的訓練資料集類似或相同。The large machine learning model 148 may be trained and/or compressed by the data processor 124. One or more small machine learning models 152 to 156 may be generated by compressing the large machine learning model 148 and/or trained by the data processor 124. The model selector 136 may determine a machine learning model of the type of processing task to be performed for the selected processing task. The model selector 136 may pass a training request to the feature extractor 140. The feature extractor 140 may generate a training data set to train a machine learning model of the type of processing task to be performed. The data processor 124 may train a machine learning model to perform one or more operations using training data stored in the training data 144, generated (e.g., programmatically generated by the feature extractor 140 or received from user input), or received from one or more remote devices (e.g., one or more devices 132, one or more remote devices connected via the network 128, etc.). The training data 144 may store data configured to train the machine learning model to process a specific type of input data. For example, training data 144 may store image data so that a machine learning model may be trained to process images (e.g., generate depth maps, classify images, detect objects, etc.). Training data 144 may also store historical data (e.g., data associated with historical executions of machine learning models 148 to 156, etc.), generated data, received data, etc. If one or more small machine learning models 152-156 are to be trained independently of the large machine learning model 148, the feature extractor 140 may generate a training set for the one or more small machine learning models 152-156 based on the type of machine learning model and the size of the machine learning model to be trained. The training data set used to train the one or more small machine learning models 152-156 may be similar to or the same as the training data set used to train the large machine learning model 148.

特徵提取器140可使用訓練資料集對機器學習模型148到156進行訓練。可在預定的時間間隔內對機器學習模型148到156進行訓練達預定數量的反覆運算，直到達到目標準確度度量，直到達到目標損失值（根據一個或多個損失函數），等等。The feature extractor 140 can use the training data set to train the machine learning models 148 to 156. The machine learning models 148 to 156 can be trained for a predetermined number of iterations at a predetermined time interval, until a target accuracy metric is achieved, until a target loss value (according to one or more loss functions) is achieved, and so on.

資料處理器124可接收資料集，以使用經訓練的機器學習模型148到156中的一者或多者進行處理。資料集可經由輸入/輸出介面116、網路介面120接收，或者儲存在記憶體112中。資料集可為離散的資料集（例如，明確的大小和/或長度等）或者可為連續的串流（例如，諸如廣播媒體、視訊遊戲或不確定大小或長度的其他媒體等）。模型選擇器136可通過以下方法來確定機器學習模型148到156中的哪一者將最高效地對所接收的資料集（或所接收的資料集的一部分）進行處理：對資料集進行採樣、使用機器學習模型對樣本進行處理、以及通過對結果進行比較來確定哪個機器學習模型應對資料集進行處理。The data processor 124 may receive a data set for processing using one or more of the trained machine learning models 148 to 156. The data set may be received via the input/output interface 116, the network interface 120, or stored in the memory 112. The data set may be a discrete data set (e.g., of a definite size and/or length, etc.) or may be a continuous stream (e.g., such as broadcast media, video games, or other media of uncertain size or length, etc.). The model selector 136 may determine which of the machine learning models 148-156 will most efficiently process a received data set (or a portion of a received data set) by sampling the data set, processing the samples using a machine learning model, and determining which machine learning model should process the data set by comparing the results.

模型選擇器136可通過對資料集的一部分進行提取來對資料集進行採樣。模型選擇器136可對資料集的初始部分（例如，第一數量的位元、第一數量的圖像或視訊幀、第一預定秒的音訊等）進行採樣。作為另外一種選擇或附加地，模型選擇器136可通過例如使用亂數產生器隨機地選擇資料集的一部分來獲得資料集的隨機樣本。模型選擇器136可向特徵提取器140發送樣本的標識及關於哪些機器學習模型將被利用的指示。The model selector 136 may sample the dataset by extracting a portion of the dataset. The model selector 136 may sample an initial portion of the dataset (e.g., a first number of bits, a first number of images or video frames, a first predetermined second of audio, etc.). Alternatively or additionally, the model selector 136 may obtain a random sample of the dataset by randomly selecting a portion of the dataset, for example, using a random number generator. The model selector 136 may send an identification of the sample and an indication of which machine learning models are to be utilized to the feature extractor 140.

特徵提取器140可針對所選擇的機器學習模型（例如，大型機器學習模型148及不包含內部特徵抽取功能的一個或多個機器學習模型152到156等）產生特徵向量。大型機器學習模型148可使用來自特徵提取器140的特徵向量來執行且產生第一輸出（例如，第一預測結果）。小型機器學習模型1 152到小型機器學習模型n 156也可使用同一特徵向量（或者由特徵提取器140針對相應的小型機器學習模型定制的特徵向量）來執行，以產生第二輸出（例如，來自小型機器學習模型1 152的預測結果）到第n輸出（來自小型機器學習模型n 156等）。The feature extractor 140 may generate a feature vector for a selected machine learning model (e.g., the large machine learning model 148 and one or more machine learning models 152 to 156 that do not include an internal feature extraction function, etc.). The large machine learning model 148 may be executed using the feature vector from the feature extractor 140 and generate a first output (e.g., a first prediction result). Small machine learning models 1 152 to small machine learning models n 156 may also be executed using the same feature vector (or a feature vector customized by the feature extractor 140 for the corresponding small machine learning model) to generate a second output (e.g., a prediction result from small machine learning model 1 152) to the nth output (from small machine learning model n 156, etc.).

模型選擇器136可對第一輸出與第二輸出到第n輸出進行比較，以確定出小型機器學習模型中的哪個小型機器學習模型應正在對資料集進行處理。舉例來說，模型選擇器136可將第一輸出標記為真值，且然後使用第一輸出對第二輸出到第n輸出的準確度和/或損失進行測量，以確定出對於特定資料集來說每一小型機器學習模型152到156相對於大型機器學習模型148準確度（例如，使用準確度度量等）和/或損失（例如，使用損失函數等）。模型選擇器136可選擇具有最高準確度和/或最低損失的小型機器學習模型152到156。作為另外一種選擇，模型選擇器136可相對於第一輸出和/或相對於第二輸出到第n輸出來對第二輸出到第n輸出進行測量，從而產生相對輸出（例如，第二輸出相對於第三輸出，第二輸出相對於第四輸出，第二輸出相對於第n輸出，等等）的分佈，可根據所述相對輸出選擇特定的輸出作為優於其他輸出的優選輸出。然後，可選擇與特定輸出對應的小型機器學習模型來對資料集進行處理。作為另外一種選擇，模型選擇器136可獨立於其他輸出來對第二輸出到第n輸出中的每一輸出進行測量，以確定出應對資料集進行處理的小型機器學習模型。在此實例中，可不使用大型機器學習模型148（例如，可不產生第一輸出等）。模型選擇器136可使用任何準確度度量和/或損失函數相對於其他輸出來對輸出進行測量。The model selector 136 may compare the first output with the second to nth outputs to determine which of the small machine learning models should be processing the data set. For example, the model selector 136 may mark the first output as true, and then use the first output to measure the accuracy and/or loss of the second to nth outputs to determine the accuracy (e.g., using an accuracy metric, etc.) and/or loss (e.g., using a loss function, etc.) of each small machine learning model 152 to 156 relative to the large machine learning model 148 for the particular data set. The model selector 136 may select the small machine learning model 152 to 156 with the highest accuracy and/or lowest loss. Alternatively, the model selector 136 may measure the second to nth outputs relative to the first output and/or relative to the second to nth outputs to generate a distribution of relative outputs (e.g., the second output relative to the third output, the second output relative to the fourth output, the second output relative to the nth output, etc.), based on which a particular output may be selected as a preferred output over other outputs. Then, a small machine learning model corresponding to the particular output may be selected to process the data set. Alternatively, the model selector 136 may measure each of the second to nth outputs independently of the other outputs to determine the small machine learning model that should process the data set. In this example, the large machine learning model 148 may not be used (eg, the first output may not be generated, etc.). The model selector 136 may measure the output relative to other outputs using any accuracy metric and/or loss function.

作為另外一種選擇，模型選擇器136可確定使用大型機器學習模型148而不是小型機器學習模型152到156來對資料集進行處理。模型選擇器136可確定出小型機器學習模型152到156可具有小於第一閾值的準確度度量和/或大於第二閾值的損失函數。因此，模型選擇器136可選擇大型機器學習模型148將是用於對資料集進行處理的最高效的機器學習模型。模型選擇器136可通過對處理效率進行平衡（例如，小型機器學習模型152到156使用更少的處理資源可能更高效）及對準確度進行平衡（大型機器學習模型148有時可能比小型機器學習模型152到156準確）來選擇機器學習模型。Alternatively, model selector 136 may determine that large machine learning model 148 should be used to process the data set instead of small machine learning models 152-156. Model selector 136 may determine that small machine learning models 152-156 may have an accuracy metric that is less than a first threshold and/or a loss function that is greater than a second threshold. Therefore, model selector 136 may select that large machine learning model 148 will be the most efficient machine learning model to use to process the data set. The model selector 136 may select a machine learning model by balancing processing efficiency (e.g., a small machine learning model 152-156 may be more efficient using fewer processing resources) and accuracy (a large machine learning model 148 may sometimes be more accurate than a small machine learning model 152-156).

模型選擇器136可選擇小型機器學習模型，只要所選擇的小型機器學習模型相對於大型機器學習模型148的準確度為如下情況即可：1）大於正在考慮的其他小型機器學習模型；以及2）大於第一閾值。準確度度量和/或損失函數的例子包括但不限於準確度、精度、曲線下面積、對數損失、F1分數、加權人類不同意率、交叉熵、平均絕對誤差、均方誤差或類似函數。模型選擇器136可使用具有最高準確度和/或最低損失的小型機器學習模型來開始對資料集的其餘部分進行處理。Model selector 136 may select a small machine learning model as long as the accuracy of the selected small machine learning model relative to the large machine learning model 148 is: 1) greater than other small machine learning models under consideration; and 2) greater than a first threshold. Examples of accuracy metrics and/or loss functions include, but are not limited to, accuracy, precision, area under the curve, log loss, F1 score, weighted human disagreement rate, cross entropy, mean absolute error, mean squared error, or similar functions. Model selector 136 may use the small machine learning model with the highest accuracy and/or lowest loss to begin processing the remainder of the data set.

在一些實例中，模型選擇器可在對資料集的處理期間再次執行機器學習模型選擇過程，以確保所選擇的小型機器學習模型仍然是用於對資料集進行處理的最高效的機器學習模型。舉例來說，資料處理器124可對視訊串流進行處理以產生每一視訊幀（或每第n幀等）的所估測深度圖。視訊串流的前幾個視訊幀可對應地包括高光照（high-light）條件，對於所述高光照條件，小型機器學習模型1 152被示出為最高效的（基於前述模型選擇過程的執行）。視訊串流的後續部分可包括包含低光照條件的視訊幀，對於所述低光照條件，小型機器學習模型1 152可能不是最高效的（例如，當對低光照視訊幀進行處理時，小型機器學習模型1 152可具有較低的準確度和/或較高的損失）。模型選擇器136可使用對小型機器學習模型1 152的一個或多個最近的輸入來重新執行模型選擇過程且選擇大型機器學習模型148和/或小型機器學習模型2（未示出）到小型機器學習模型n 156中的一者來接管對視訊串流的處理。In some examples, the model selector may perform the machine learning model selection process again during processing of the data set to ensure that the selected small machine learning model is still the most efficient machine learning model for processing the data set. For example, the data processor 124 may process a video stream to produce an estimated depth map for each video frame (or every nth frame, etc.). The first few video frames of the video stream may correspondingly include high-light conditions for which the small machine learning model 1 152 is shown to be the most efficient (based on the execution of the aforementioned model selection process). Subsequent portions of the video stream may include video frames that include low-light conditions for which small machine learning model 1 152 may not be most efficient (e.g., small machine learning model 1 152 may have lower accuracy and/or higher loss when processing low-light video frames). Model selector 136 may use one or more recent inputs to small machine learning model 1 152 to re-execute the model selection process and select one of large machine learning model 148 and/or small machine learning model 2 (not shown) through small machine learning model n 156 to take over processing of the video stream.

在檢測到事件時，在接收到用戶輸入時，在檢測到被輸入到所選擇的小型機器學習模型的資料集的所述部分的一個或多個特性的改變（在特定反覆運算期間）和/或來自所選擇的小型機器學習模型的輸出（例如，諸如在先前例子中平均像素值的改變等）、準確度度量和/或損失函數、在以上情況的組合下或在類似情況下，可以規則的間隔（例如，每n個視訊幀、每n秒等）重新執行模型選擇過程。模型選擇器136可連續地對用於給定的資料集的所選擇的小型機器學習模型的執行進行監控，以確保正在執行最高效的小型機器學習模型。The model selection process may be re-executed at regular intervals (e.g., every n video frames, every n seconds, etc.) upon detecting an event, upon receiving user input, upon detecting a change in one or more characteristics of the portion of the data set that is input to the selected small machine learning model (during a particular iteration) and/or an output from the selected small machine learning model (e.g., a change in average pixel value, etc., as in the previous example), an accuracy metric and/or a loss function, a combination of the above, or the like. The model selector 136 may continuously monitor the execution of the selected small machine learning model for a given data set to ensure that the most efficient small machine learning model is being executed.

圖2示出根據本揭露各方面的示例性三維內容表示系統。在一些實例中，計算裝置104可通過向一個或多個用戶端裝置（例如用戶端裝置204）提供處理服務來作為負載平衡器（load balancer）進行操作。舉例來說，用戶端裝置204可為任何處理裝置，例如但不限於桌上型電腦或筆記型電腦、行動裝置（例如，諸如智慧型手機、平板電腦等）、視訊遊戲機，伺服器等。用戶端裝置204可對處理密集型應用（processing intensive application）進行操作。用戶端裝置204可通過向計算裝置104傳送和/或以串流方式傳播（streaming）資料集來使用計算裝置104的資源。計算裝置104可使用資料處理器124來選擇小型機器學習模型，所述小型機器學習模型被配置成對資料集進行處理以產生輸出（或輸出串流）。計算裝置104可將輸出傳送（或以串流方式傳播）回用戶端裝置204。FIG. 2 illustrates an exemplary three-dimensional content representation system according to various aspects of the present disclosure. In some examples, computing device 104 may operate as a load balancer by providing processing services to one or more client devices (e.g., client device 204). For example, client device 204 may be any processing device, such as but not limited to a desktop or laptop computer, a mobile device (e.g., a smart phone, a tablet computer, etc.), a video game console, a server, etc. Client device 204 may operate processing intensive applications. Client device 204 may use the resources of computing device 104 by transmitting and/or streaming data sets to computing device 104. The computing device 104 may use the data processor 124 to select a small machine learning model that is configured to process the data set to produce an output (or output stream). The computing device 104 may transmit (or stream) the output back to the client device 204.

在其他實例中，計算裝置104可對可能無法由用戶端裝置204進行本地處理的資料集進行處理。舉例來說，計算裝置104可對被配置成呈現各種媒體（例如，電影、視訊遊戲、類比等）的三維表示的虛擬實境應用進行操作。計算裝置104可通過網路208從內容遞送網路212接收與虛擬實境應用相關聯的內容。在另一實例中，計算裝置104可接收來自直播相機饋送（live camera feed）的圖像（或者從直播相機饋送聚集的圖像），如果內容還不是三維表示，則計算裝置104使用單目深度估測來將內容轉換成三維表示。單目深度估測是用於確定圖像（或視訊幀）中所表示的表面與拍攝圖像的相機之間的近似距離的過程。在某些實例中，可對圖像（或視訊幀）中的每一像素實行單目深度估測，從而產生深度圖。可使用所述距離來產生二維圖像的三維表示。三維表示可用於電腦視覺應用，例如增強現實、虛擬實境、3D電視、視訊遊戲、地圖三維環境、模擬、車輛自動化（例如無人駕駛汽車）等。In other examples, computing device 104 may process data sets that may not be locally processed by client device 204. For example, computing device 104 may operate a virtual reality application configured to present a three-dimensional representation of various media (e.g., movies, video games, analogies, etc.). Computing device 104 may receive content associated with the virtual reality application from content delivery network 212 via network 208. In another example, computing device 104 may receive images from a live camera feed (or images gathered from a live camera feed), and if the content is not already a three-dimensional representation, computing device 104 uses monocular depth estimation to convert the content into a three-dimensional representation. Monocular depth estimation is the process of determining the approximate distance between a surface represented in an image (or video frame) and the camera that captured the image. In some examples, monocular depth estimation may be performed for each pixel in an image (or video frame), resulting in a depth map. The distances may be used to generate a three-dimensional representation of the two-dimensional image. The three-dimensional representation may be used in computer vision applications such as augmented reality, virtual reality, 3D television, video games, mapping three-dimensional environments, simulations, vehicle automation (e.g., self-driving cars), and the like.

計算裝置104可從用戶端裝置204接收對三維內容的請求。計算裝置104可從內容遞送網路212請求內容且對內容（即時地）進行處理。在一些例子中，計算裝置104可向用戶端裝置204傳送由計算裝置104的資料處理器（例如，諸如資料處理器124等）產生的內容及深度圖。用戶端裝置204可使用內容及深度圖來為虛擬實境應用產生內容的三維表示。作為另外一種選擇，用戶端裝置204可直接從內容遞送網路212接收內容且從計算裝置104接收深度圖。每一深度圖可與指示和深度圖對應的內容的位置的元資料（metadata）相關聯。在其他例子中，計算裝置104可產生內容的三維表示且將內容的三維表示傳送或以串流方式傳播到用戶端裝置204。舉例來說，用戶端裝置204可連接到計算裝置104且以串流方式傳播由計算裝置104產生的來自內容遞送網路的內容的各種三維表示。Computing device 104 may receive a request for three-dimensional content from client device 204. Computing device 104 may request content from content delivery network 212 and process the content (in real time). In some examples, computing device 104 may transmit content and a depth map generated by a data processor (e.g., such as data processor 124) of computing device 104 to client device 204. Client device 204 may use the content and the depth map to generate a three-dimensional representation of the content for a virtual reality application. Alternatively, client device 204 may receive the content directly from content delivery network 212 and the depth map from computing device 104. Each depth map may be associated with metadata indicating the location of the content corresponding to the depth map. In other examples, computing device 104 may generate 3D representations of content and transmit or stream the 3D representations of content to client device 204. For example, client device 204 may connect to computing device 104 and stream various 3D representations of content generated by computing device 104 from a content delivery network.

圖3示出根據本揭露各方面的示例性分散式資料處理網路的方塊圖。計算裝置104可在分散式網路中進行操作，所述分散式網路被配置成向一個或多個裝置（例如用戶端裝置204、其他裝置、伺服器、網路等）提供處理服務。計算裝置104可包括被配置成對各種資料集進行處理的資料處理器124。資料處理器124可對大型機器學習模型及一個或多個小型機器學習模型使用模型選擇過程，以確定出在對特定資料集進行處理時將要使用的最高效的機器學習模型。小模型選擇過程可平衡或減少計算裝置104的處理負載且在選擇將要對特定資料集進行處理的機器學習模型時實現總體準確度。在一些例子中，計算裝置104可對多個大型機器學習模型及對應的一個或多個小型機器學習模型進行操作，以實現對類似和/或完全不同的資料集的並行處理。FIG3 illustrates a block diagram of an exemplary distributed data processing network according to aspects of the present disclosure. Computing device 104 may operate in a distributed network configured to provide processing services to one or more devices (e.g., client device 204, other devices, servers, networks, etc.). Computing device 104 may include data processor 124 configured to process various data sets. Data processor 124 may use a model selection process for a large machine learning model and one or more small machine learning models to determine the most efficient machine learning model to use when processing a particular data set. The small model selection process can balance or reduce the processing load of the computing device 104 and achieve overall accuracy when selecting a machine learning model to process a particular data set. In some examples, the computing device 104 can operate on multiple large machine learning models and one or more corresponding small machine learning models to achieve parallel processing of similar and/or completely different data sets.

在一些實例中，計算裝置104可作為分散式資料處理網路中的節點進行操作。任何數目的附加計算裝置（例如，計算裝置104-1、計算裝置104-2、計算裝置104-3、計算裝置104-n等）也可在分散式資料處理網路中進行操作。計算裝置104及104-1到104-n中的每一計算裝置可包括具有大型機器學習模型及一個或多個小型機器學習模型的資料處理器（例如，諸如資料處理器124）以及模型選擇器，所述模型選擇器被配置成對能夠以閾值準確度和/或損失對給定的資料集進行處理的最高效的小型機器學習模型進行識別。In some examples, computing device 104 may operate as a node in a distributed data processing network. Any number of additional computing devices (e.g., computing device 104-1, computing device 104-2, computing device 104-3, computing device 104-n, etc.) may also operate in the distributed data processing network. Computing devices 104 and each of computing devices 104-1 to 104-n may include a data processor (e.g., such as data processor 124) having a large machine learning model and one or more small machine learning models, and a model selector configured to identify the most efficient small machine learning model that can process a given data set with a threshold accuracy and/or loss.

計算裝置104還可包括負載平衡器，所述負載平衡器被配置成對能夠對特定資料集進行處理的特定的計算裝置進行識別。舉例來說，用戶端裝置204可向計算裝置104傳送帶有將要處理的特定資料集的標識的請求。負載平衡器可從計算裝置104及計算裝置104-1到104-n選擇能夠對特定資料集進行處理的計算裝置。負載平衡器可使用特定資料集及計算裝置104以及計算裝置104-1到104-n的一個或多個特徵來選擇計算裝置，所述一個或多個特徵是例如但不限於每一相應計算裝置的處理負載、特定資料集的資料類型、預期輸出、能夠由相應計算裝置處理的資料類型、網路頻寬、傳送路徑（例如，用於將特定資料集傳送到每一相應計算裝置以及用於將輸出傳送回用戶端裝置104，等等）、被配置成對特定資料集進行處理的機器學習模型的準確度和/或損失（如使用如先前闡述的模型選擇過程確定的）、其組合或類似特徵。在一些例子中，可對所述一個或多個特徵進行加權，其中基於分散式資料處理網路的狀態來連續地對權重進行調整。舉例來說，可向與基於特定資料集的計算裝置的能力對應的特徵賦予高權重，以確保所選擇的計算裝置能夠對特定資料集進行處理（例如，如果特定資料集包括圖像資料，則所選擇的計算裝置的小型機器學習模型被訓練成對圖像進行處理資料，等等）。可向其他特徵賦予權重，以對分散式資料處理網路上的處理負載進行平衡。Computing device 104 may also include a load balancer configured to identify a specific computing device that can process a specific data set. For example, client device 204 may transmit a request with an identification of a specific data set to be processed to computing device 104. The load balancer may select a computing device that can process the specific data set from computing device 104 and computing devices 104-1 to 104-n. The load balancer may select a computing device using one or more characteristics of a particular data set and computing devices 104 and computing devices 104-1 to 104-n, the one or more characteristics being, for example, but not limited to, the processing load of each corresponding computing device, the data type of the particular data set, the expected output, the type of data that can be processed by the corresponding computing device, the network bandwidth, the transmission path (e.g., for transmitting the particular data set to each corresponding computing device and for transmitting the output back to the client device 104, etc.), the accuracy and/or loss of a machine learning model configured to process the particular data set (as determined using the model selection process as previously described), combinations thereof, or the like. In some examples, the one or more features may be weighted, where the weights are continuously adjusted based on the state of the distributed data processing network. For example, features corresponding to the capabilities of the computing device based on a particular data set may be given high weights to ensure that the selected computing device is capable of processing the particular data set (e.g., if the particular data set includes image data, the small machine learning model of the selected computing device is trained to process image data, etc.). Other features may be weighted to balance the processing load on the distributed data processing network.

在某些實例中，可通過多於一個計算裝置對資料集進行處理。舉例來說，可以離散的區段（例如，每一圖像或視訊幀、每n秒組塊、每n位元資料等）對大的資料集或媒體串流進行處理。計算裝置104可產生特徵向量的序列。作為另外一種選擇，對於即時操作，當計算裝置接收到資料時，計算裝置可產生特徵向量。每一特徵向量可與序列識別符（identifier）相關聯，序列識別符指示資料集的部分，特徵向量源自所述部分。計算裝置104然後可將特徵向量傳送到由負載平衡器選擇的計算裝置，以對特徵向量進行處理且產生輸出。可由計算裝置104接收來自所選擇的計算裝置的輸出以及特徵向量的標識和/或序列識別符。計算裝置104然後可將從對資料集進行處理的計算裝置接收的輸出組成（assemble）輸出序列（當對非即時資料進行處理時），或者在產生輸出時將每一輸出傳送到用戶端裝置204。通過將資料集分佈在分散式資料處理網路的計算裝置上，計算裝置104可減少分散式資料處理網路的處理負載，通過平行地對資料集的部分進行處理來減少處理等待時間，維持正在被處理的資料集的準確度，等等。In some instances, a data set may be processed by more than one computing device. For example, a large data set or media stream may be processed in discrete segments (e.g., per image or video frame, per n-second chunk, per n-bits of data, etc.). Computing device 104 may generate a sequence of feature vectors. Alternatively, for real-time operation, the computing device may generate feature vectors as the computing device receives the data. Each feature vector may be associated with a sequence identifier that indicates the portion of the data set from which the feature vector originates. Computing device 104 may then transmit the feature vector to a computing device selected by the load balancer to process the feature vector and generate an output. The output from the selected computing device and the identifier of the feature vector and/or the sequence identifier may be received by the computing device 104. The computing device 104 may then assemble the output sequence (when processing non-real-time data) from the output received from the computing devices processing the data set, or transmit each output to the client device 204 as it is generated. By distributing the data set across the computing devices of the distributed data processing network, the computing device 104 may reduce the processing load of the distributed data processing network, reduce processing latency by processing portions of the data set in parallel, maintain the accuracy of the data set being processed, and so on.

在其他實例中，計算裝置104可將所選擇的小型機器學習模型傳送到用戶端裝置204，以使用戶端裝置204能夠在本地處理資料集。在該些實例中，計算裝置104的資料處理器可使用資料集的樣本來執行模型選擇過程，以對能夠以閾值準確度或損失對資料集進行處理的特定的小型機器學習模型進行識別（如先前所闡述）。計算裝置104然後可將所選擇的小型機器學習模型傳送到用戶端裝置204。用戶端裝置可使用所選擇的小型機器學習模型對資料集的其餘部分進行處理。In other examples, computing device 104 may transmit the selected small machine learning model to client device 204 to enable client device 204 to process the data set locally. In these examples, a data processor of computing device 104 may use a sample of the data set to perform a model selection process to identify a specific small machine learning model that can process the data set with a threshold accuracy or loss (as previously explained). Computing device 104 may then transmit the selected small machine learning model to client device 204. The client device may process the remainder of the data set using the selected small machine learning model.

圖4示出根據本揭露各方面的用於選擇用於對資料集進行處理的小型機器學習模型的示例性模型選擇過程的方塊圖。模型選擇過程可對將要用於對給定的資料集進行處理的特定的機器學習模型進行識別。可對大型機器學習模型408（例如，具有大於閾值的參數數量和/或層數量）進行訓練以對一般資料集進行處理。可對大型機器學習模型408進行壓縮以產生一個或多個小型機器學習模型（例如，小型模型1 412、小型模型2 416、小型模型3 420、小型模型n 424等）。可在訓練之前、期間或之後對大型機器學習模型408進行壓縮。可通過修剪（例如，移除不必要的參數或層等）、量化（例如，減小參數的記憶體佔用面積等）、知識提煉（例如，對小型機器學習模型進行訓練以類比大型機器學習模型）、低秩因數分解或通過任何其他壓縮演算法來對大型機器學習模型148進行壓縮。作為另外一種選擇，可獨立地定義及訓練所述一個或多個小型機器學習模型。可產生任意數目的小型機器學習模型（通過壓縮或通過獨立訓練），其中n是大於1的任意整數。FIG4 illustrates a block diagram of an exemplary model selection process for selecting a small machine learning model for processing a data set in accordance with aspects of the present disclosure. The model selection process may identify a particular machine learning model to be used to process a given data set. A large machine learning model 408 (e.g., having a number of parameters and/or a number of layers greater than a threshold) may be trained to process a general data set. The large machine learning model 408 may be compressed to produce one or more small machine learning models (e.g., small model 1 412, small model 2 416, small model 3 420, small model n 424, etc.). The large machine learning model 408 may be compressed before, during, or after training. The large machine learning model 148 may be compressed by pruning (e.g., removing unnecessary parameters or layers, etc.), quantization (e.g., reducing the memory footprint of parameters, etc.), knowledge distillation (e.g., training small machine learning models to resemble the large machine learning model), low-rank factorization, or by any other compression algorithm. Alternatively, the one or more small machine learning models may be defined and trained independently. Any number of small machine learning models may be generated (either by compression or by independent training), where n is any integer greater than 1.

當接收到用於對特定資料集進行處理的請求時，模型選擇過程可開始。可對特定資料集進行採樣，以產生資料集的可由機器學習模型408到424進行處理的一個或多個離散部分。可從資料集的所述一個或多個離散部分中的每一者匯出一個或多個特徵向量。在一些實例中，可針對機器學習模型408到424匯出單個特徵向量。在其他實例中，可針對每一機器學習模型匯出特徵向量，所述特徵向量可為針對機器學習模型定制的（例如，基於模型的參數數量和/或層數量等）。可使用所述一個或多個特徵向量來執行每一機器學習模型408到424，以產生相應的模型輸出。大型機器學習模型408可對特徵向量進行處理以產生模型輸出428。小型機器學習模型1 412可對特徵向量進行處理以產生模型1輸出432，且小型機器學習模型n 424可對特徵向量進行處理以產生模型n輸出444，等等。When a request is received to process a particular data set, the model selection process can begin. The particular data set can be sampled to produce one or more discrete portions of the data set that can be processed by the machine learning models 408 to 424. One or more feature vectors can be exported from each of the one or more discrete portions of the data set. In some examples, a single feature vector can be exported for the machine learning models 408 to 424. In other examples, a feature vector can be exported for each machine learning model, and the feature vector can be customized for the machine learning model (e.g., based on the number of parameters and/or the number of layers of the model, etc.). Each machine learning model 408-424 may be executed using the one or more feature vectors to produce a corresponding model output. Large machine learning model 408 may process the feature vectors to produce model output 428. Small machine learning model 1 412 may process the feature vectors to produce model 1 output 432, and small machine learning model n 424 may process the feature vectors to produce model n output 444, and so on.

在基準（benchmark）選擇452處，可選擇一個或多個基準來對模型輸出428到444進行評估。基準選擇452可將來自大型機器學習模型408的模型輸出（例如，模型輸出428）指定為真值，且對來自小型機器學習模型412到424的模型輸出（例如，模型輸出432到444）相對於模型輸出428進行比較。At benchmark selection 452, one or more benchmarks may be selected to evaluate model outputs 428-444. Benchmark selection 452 may designate a model output from large machine learning model 408 (e.g., model output 428) as a true value and compare model outputs from small machine learning models 412-424 (e.g., model outputs 432-444) relative to model output 428.

在一些實例中，基準選擇可基於模型輸出428到444的資料類型來確定基準。舉例來說，用於分類器的基準可為根據布林值（Boolean value）（例如，真/假或正確/不正確等）對輸出進行評估的準確度度量或誤差度量）。對數字輸出（numerical output）進行輸出的機器學習模型（例如，諸如深度估測機器學習模型，其可輸出深度圖或反向深度圖等）的基準可為損失函數（例如，其確定控制值與輸出之間的差）。當對輸出428到444進行評估時，基準選擇452可使用一個或多個基準。In some examples, the benchmark selection may determine a benchmark based on the data type of the model outputs 428-444. For example, a benchmark for a classifier may be an accuracy measure or error measure that evaluates the output based on a Boolean value (e.g., true/false or correct/incorrect, etc.). A benchmark for a machine learning model that outputs a numerical output (e.g., such as a depth estimation machine learning model that may output a depth map or an inverse depth map, etc.) may be a loss function (e.g., that determines the difference between a control value and the output). The benchmark selection 452 may use one or more benchmarks when evaluating the outputs 428-444.

在一些例子中，對於包括深度圖的輸出，基準選擇452可使用經加權人類不同意率、平均絕對相對誤差（mean absolute relative error）、強健函數損失（robust function loss）或類似函數。經加權人類不同意率使用相等的權重（例如，設定為1）且針對輸出深度圖的每一像素識別出像素是更靠近還是更遠離真值模型輸出428的對應像素。模型輸出的每一像素可使用0（指示模型輸出的像素相比於真值深度圖的對應像素更靠近）或者1（指示模型輸出的像素相比於真值深度圖的對應像素更遠離）來代替每一像素。可使用0及1的分佈來對模型輸出深度圖偏離真值的程度進行評估。平均絕對相對誤差可使用對誤差進行評估，其中與正在被評估的深度圖上的像素I的值對應，與真值深度圖428的像素i的基本真值對應，且M與深度圖的像素總量對應。 In some examples, for outputs including depth maps, the benchmark selection 452 may use a weighted human disagreement rate, a mean absolute relative error, a robust function loss, or a similar function. The weighted human disagreement rate uses equal weights (e.g., set to 1) and identifies for each pixel of the output depth map whether the pixel is closer or farther away from the corresponding pixel of the ground truth model output 428. Each pixel of the model output may be replaced with 0 (indicating that the pixel of the model output is closer than the corresponding pixel of the ground truth depth map) or 1 (indicating that the pixel of the model output is farther away than the corresponding pixel of the ground truth depth map). The distribution of 0 and 1 can be used to evaluate the degree to which the model output depth map deviates from the truth. The mean absolute relative error can be used The error is evaluated, where corresponds to the value of pixel I on the depth map being evaluated, corresponds to the ground truth value of pixel i in the ground truth depth map 428, and M corresponds to the total number of pixels in the depth map.

可使用d來表示強健損失函數，d表示（給定像素的真值與模型輸出之間的）所預測的差異（disparity），d*表示來自真值模型輸出的像素的真值，且M表示深度圖中的像素數量。The robust loss function can be represented by d, d represents the predicted disparity (between the true value of a given pixel and the model output), d* represents the true value of the pixel from the ground-truth model output, and M represents the number of pixels in the depth map.

在其他例子中，可使用任何準確度度量、損失函數、誤差率或類似函數來相對於真值模型輸出對模型輸出進行評估。準確度度量和/或損失函數的例子包括但不限於準確度、精度、曲線下面積、對數損失、F1分數、加權人類不同意率、交叉熵、平均絕對相對誤差、均方誤差或類似函數。In other examples, any accuracy metric, loss function, error rate, or similar function can be used to evaluate the model output relative to the true model output. Examples of accuracy metrics and/or loss functions include, but are not limited to, accuracy, precision, area under the curve, log loss, F1 score, weighted human disagreement rate, cross entropy, mean absolute relative error, mean square error, or similar functions.

基準選擇452然後可對具有最高準確度或最低損失的模型輸出432到444進行識別。模型選擇456然後可從小型機器學習模型412到424選擇與所識別的模型輸出對應的小型機器學習模型。可使用所選擇的小型機器學習模型對特定資料集的其餘部分進行處理。Baseline selection 452 may then identify the model outputs 432 to 444 having the highest accuracy or lowest loss. Model selection 456 may then select the small machine learning model corresponding to the identified model output from the small machine learning models 412 to 424. The remainder of the particular data set may be processed using the selected small machine learning model.

圖5示出根據本揭露各方面的用於單目深度估測的模型選擇的示例性過程的流程圖表。單目深度估測可由一個或多個機器學習模型（例如深度神經網路或類似模型）來實行，所述一個或多個機器學習模型用於各種電腦視覺操作（例如，但不限於分類、語義分割、物件檢測、實例分割、深度估測等）（例如，諸如用於無人駕駛汽車的自動駕駛、虛擬實境、增強現實、三維類比、目標獲取等）。可執行模型選擇過程，以基於由所選擇的機器學習模型消耗的處理資源及機器學習模型在對特定資料集進行處理時的準確度來選擇高效的機器學習模型。FIG5 is a flow chart of an exemplary process for model selection for monocular depth estimation according to aspects of the present disclosure. Monocular depth estimation may be implemented by one or more machine learning models (e.g., deep neural networks or similar models) that are used for various computer vision operations (e.g., but not limited to classification, semantic segmentation, object detection, instance segmentation, depth estimation, etc.) (e.g., such as for autonomous driving of self-driving cars, virtual reality, augmented reality, 3D analogy, target acquisition, etc.). A model selection process may be performed to select an efficient machine learning model based on the processing resources consumed by the selected machine learning model and the accuracy of the machine learning model when processing a particular data set.

舉例來說，在方塊504處，計算裝置可接收多個圖像。所述多個圖像可為獨立的圖像（與所述多個圖像中的其他圖像等無關）、從視訊幀提取的圖像、從視訊片段提取的圖像或類似圖像。可從相機、內容遞送網路、用戶端裝置、另一計算裝置、伺服器等接收圖像。作為另外一種選擇，計算裝置可通過從儲存在計算裝置的記憶體中的視訊片段或者從直播相機串流提取圖像來接收圖像。對於直播相機串流，當由相機捕獲圖像時，可連續地接收圖像。由於隨著時間的推移將接收到直播相機串流的附加圖像，因此一旦接收到來自直播相機串流的第一圖像時，便認為計算裝置已經接收到所述多個圖像。For example, at block 504, a computing device may receive a plurality of images. The plurality of images may be independent images (independent of other images in the plurality of images, etc.), images extracted from a video frame, images extracted from a video clip, or the like. The images may be received from a camera, a content delivery network, a client device, another computing device, a server, etc. Alternatively, the computing device may receive the images by extracting the images from a video clip stored in a memory of the computing device or from a live camera stream. For a live camera stream, the images may be received continuously as they are captured by the camera. Since additional images of the live camera stream will be received over time, once the first image from the live camera stream is received, the computing device is considered to have received the multiple images.

在方塊508處，計算裝置可從所述多個圖像選擇一個或多個圖像。舉例來說，計算裝置可對所述多個圖像進行採樣以匯出所述一個或多個圖像。對於直播相機串流，所述一個或多個圖像可與從直播相機串流接收的所述一個或多個圖像對應。在一些實例中，計算裝置可隨機地對所述多個圖像進行採樣。在其他實例中，計算裝置可根據一個或多個參數從所述多個圖像選擇所述一個或多個圖像。所述一個或多個參數可基於所述多個圖像中的圖像的數量和/或順序、圖像的特性（例如，像素值（例如，平均紅色值、綠色值、藍色值和/或像素亮度值等））、與所述多個圖像相關聯的中繼資料、其組合或類似資料。舉例來說，計算裝置可通過基於將要包括在採樣中的圖像的數量在所述多個圖像的分佈之上均勻地選擇圖像（例如，諸如當從所述多個圖像對兩個圖像進行採樣時的第一個圖像及最後一個圖像等）來對所述多個圖像進行採樣。At block 508, the computing device may select one or more images from the plurality of images. For example, the computing device may sample the plurality of images to export the one or more images. For a live camera stream, the one or more images may correspond to the one or more images received from the live camera stream. In some examples, the computing device may randomly sample the plurality of images. In other examples, the computing device may select the one or more images from the plurality of images based on one or more parameters. The one or more parameters may be based on the number and/or order of images in the plurality of images, characteristics of the images (e.g., pixel values (e.g., average red value, green value, blue value and/or pixel brightness value, etc.)), metadata associated with the plurality of images, combinations thereof, or the like. For example, the computing device may sample the plurality of images by selecting images uniformly over the distribution of the plurality of images based on the number of images to be included in the sampling (e.g., such as the first image and the last image when two images are sampled from the plurality of images, etc.).

在方塊512處，計算裝置可使用第一機器學習模型對所述一個或多個圖像進行處理，以產生第一預測結果。舉例來說，計算裝置可使用所述一個或多個圖像來產生特徵向量。可將特徵向量作為輸入傳遞到第一機器學習模型中。第一機器學習模型可對特徵向量進行處理且輸出第一預測結果。第一機器學習模型可為大型機器學習模型。大型機器學習模型可為具有大於閾值的參數數量和/或層數量的機器學習模型。在一些例子中，計算裝置可將第一預測結果標記為可用於對其他機器學習模型的輸出與第一預測結果進行比較的偽真值。At block 512, the computing device may process the one or more images using a first machine learning model to generate a first prediction result. For example, the computing device may use the one or more images to generate a feature vector. The feature vector may be passed as an input to the first machine learning model. The first machine learning model may process the feature vector and output a first prediction result. The first machine learning model may be a large machine learning model. The large machine learning model may be a machine learning model having a number of parameters and/or a number of layers greater than a threshold. In some examples, the computing device may mark the first prediction result as a false value that can be used to compare the output of other machine learning models with the first prediction result.

在方塊516處，計算裝置可使用多個機器學習模型對所述一個或多個圖像進行處理，以產生第二預測結果（例如，使用所述多個機器學習模型中的第一機器學習模型）、第三預測結果（例如，使用所述多個機器學習模型中的第二機器學習模型）等。所述多個機器學習模型可為小型機器學習模型。小型機器學習模型可包括小於閾值的參數數量和/或層數量。第一機器學習模型可包括比所述多個機器學習模型多的參數和/或層。可在對第一個機器學習模型進行訓練之前、期間或之後通過對第一機器學習模型進行壓縮（例如，使用修剪、量化、知識提取、低秩因數分解等）來產生所述多個機器學習模型。在一些例子中，計算裝置判斷是否對所述多個機器學習模型進行訓練。計算裝置可確定使用與用於對第一機器學習模型進行訓練相同的訓練資料、與用於對第一機器學習模型進行訓練類似的訓練資料或與用於對第一機器學習模型進行訓練不同的訓練資料來對所述多個機器學習模型中的一者或多者進行訓練。作為另外一種選擇，可獨立地（例如，與第一機器學習模型分開）定義及訓練所述多個機器學習模型。在該些實例中，所述多個機器學習模型可具有與第一機器學習模型相同、類似或不同的類型（例如，不同的模型、不同的參數、不同類型的層、不同的演算法、不同的訓練過程或反覆運算等）。At block 516, the computing device may process the one or more images using multiple machine learning models to generate a second prediction result (e.g., using a first machine learning model from the multiple machine learning models), a third prediction result (e.g., using a second machine learning model from the multiple machine learning models), etc. The multiple machine learning models may be small machine learning models. A small machine learning model may include a number of parameters and/or a number of layers that is less than a threshold. The first machine learning model may include more parameters and/or layers than the multiple machine learning models. The multiple machine learning models may be generated by compressing the first machine learning model (e.g., using pruning, quantization, knowledge extraction, low-rank factorization, etc.) before, during, or after training the first machine learning model. In some examples, the computing device determines whether to train the multiple machine learning models. The computing device may determine to train one or more of the multiple machine learning models using the same training data as used to train the first machine learning model, training data similar to that used to train the first machine learning model, or training data different from that used to train the first machine learning model. Alternatively, the multiple machine learning models can be defined and trained independently (e.g., separately from the first machine learning model). In these examples, the multiple machine learning models can be of the same, similar, or different types as the first machine learning model (e.g., different models, different parameters, different types of layers, different algorithms, different training processes or iterations, etc.).

計算裝置可使用作為輸入被傳遞到第一機器學習模型中的相同或類似的特徵向量（從所述一個或多個圖像匯出）輸入所述多個機器學習模型，以產生第二預測結果、第三預測結果等。在其他實例中，計算裝置可針對所述多個機器學習模型定制特徵向量。由於所述多個機器學習模型具有更少的參數和/或層，因此機器學習模型可接受輸入特徵向量中更少的特徵。計算裝置可對特徵向量進行壓縮（例如，使用任何前述壓縮技術）以減少輸入特徵的數量。The computing device may use the same or similar feature vector (exported from the one or more images) that is passed as input to the first machine learning model as input to input the multiple machine learning models to generate a second prediction result, a third prediction result, etc. In other examples, the computing device may customize the feature vectors for the multiple machine learning models. Because the multiple machine learning models have fewer parameters and/or layers, the machine learning models may accept fewer features in the input feature vectors. The computing device may compress the feature vectors (e.g., using any of the aforementioned compression techniques) to reduce the number of input features.

在方塊520處，計算裝置可基於對第一預測結果與第二預測結果及第三預測結果等的比較而從所述多個機器學習模型選擇第二機器學習模型。在一些實例中，計算裝置可使用一個或多個準確度度量和/或損失函數來對第二預測結果、第三預測結果等相對於第一預測結果進行比較。舉例來說，深度圖可將圖像的每一像素表示為所述像素所表示的環境中的位置與相機之間的距離（例如，實數）。計算裝置可對第二預測結果的每一像素相對於第一預測結果（例如，出於比較的目的被視為真值）中的每一對應像素的每一距離值進行逐像素比較。可使用損失函數來確定對第二預測結果與第一預測結果、第三預測結果與第一預測結果等進行比較。損失函數的例子包括但不限於自適性強健損失、均方誤差、平均絕對誤差、交叉熵、加權人類不同意率（WHDR）、其組合或類似函數。計算裝置可從所述多個機器學習模型選擇具有最高準確度度量、最低誤差率、最低損失等的機器學習模型，以對所述多個圖像進行處理。At block 520, the computing device may select a second machine learning model from the plurality of machine learning models based on a comparison of the first prediction to the second prediction and the third prediction, etc. In some examples, the computing device may use one or more accuracy metrics and/or loss functions to compare the second prediction, the third prediction, etc. relative to the first prediction. For example, a depth map may represent each pixel of an image as a distance (e.g., a real number) between a location in the environment represented by the pixel and the camera. The computing device may perform a pixel-by-pixel comparison of each distance value for each pixel of the second prediction relative to each corresponding pixel in the first prediction (e.g., considered to be a true value for comparison purposes). A loss function may be used to determine the comparison between the second prediction result and the first prediction result, the third prediction result and the first prediction result, etc. Examples of loss functions include but are not limited to adaptive robust loss, mean square error, mean absolute error, cross entropy, weighted human disagreement rate (WHDR), combinations thereof, or similar functions. The computing device may select a machine learning model with the highest accuracy metric, the lowest error rate, the lowest loss, etc. from the multiple machine learning models to process the multiple images.

在方塊524處，計算裝置可使用第二機器學習模型（例如，從方塊520選擇的具有最高準確度或最低損失的機器學習模型等）對所述多個圖像進行處理。在一些實例中，計算裝置可對所述多個圖像中的每一者進行處理。在其他實例中，計算裝置可通過對所述多個圖像（例如，諸如每第n個圖像等）進行採樣來對所述多個圖像的一部分進行處理。返回單目深度估測例子，計算裝置可使用第二機器學習模型及產生深度圖的序列的所述多個圖像。由於第二機器學習模型是小型機器學習模型，因此可近似即時地實行單目深度估測（例如，使用直播相機串流、來自視訊遊戲的動態地或按照程式產生的圖像等）。At block 524, the computing device may process the plurality of images using a second machine learning model (e.g., the machine learning model with the highest accuracy or lowest loss selected from block 520, etc.). In some examples, the computing device may process each of the plurality of images. In other examples, the computing device may process a portion of the plurality of images by sampling the plurality of images (e.g., such as every nth image, etc.). Returning to the monocular depth estimation example, the computing device may use the second machine learning model and the plurality of images to generate a sequence of depth maps. Because the second machine learning model is a small machine learning model, monocular depth estimation can be performed in near real time (e.g., using a live camera stream, dynamic or procedurally generated images from a video game, etc.).

在一些例子中，可重複進行模型選擇過程，以確保第二機器學習模型仍然是用於對所述多個圖像進行處理的最高效的機器學習模型。在檢測到事件時、在檢測到用戶輸入時、在執行第二機器學習模型的預定反覆運算之後、在檢測到所述多個圖像的一個或多個特性的改變（例如，諸如平均像素值的改變等）時、在以上情況的組合下或在類似情況下，可以規則的時間間隔重新執行模型選擇過程。計算裝置可持續地確保使用最高效的機器學習模型對所述多個圖像進行處理（例如，當對所述多個圖像進行處理時具有最高準確度、最低誤差率、最低損失等的機器學習模型）。In some examples, the model selection process may be repeated to ensure that the second machine learning model remains the most efficient machine learning model for processing the multiple images. The model selection process may be re-executed at regular intervals upon detection of an event, upon detection of user input, after predetermined repetitive operations of the second machine learning model, upon detection of a change in one or more characteristics of the multiple images (e.g., such as a change in average pixel value, etc.), in a combination of the above situations, or in similar situations. The computing device may continuously ensure that the most efficient machine learning model is used to process the multiple images (e.g., a machine learning model with the highest accuracy, lowest error rate, lowest loss, etc. when processing the multiple images).

模型選擇過程可應用於各種機器學習模型，以確定出對完全不同的資料集進行處理的高效方式。這樣一來，本文中闡述的技術可應用於深度神經網路（如先前所闡述）以及任何其他類型的機器學習模型或資料集。The model selection process can be applied to a variety of machine learning models to determine how to efficiently process completely different datasets. In this way, the techniques described in this article can be applied to deep neural networks (as described previously) as well as any other type of machine learning model or dataset.

圖6示出根據本揭露各方面的示例性計算裝置。舉例來說，計算裝置600可實施本文中闡述的系統或方法中的任意者。在一些實例中，計算裝置600可為媒體裝置的元件或者包括在媒體裝置內。計算裝置600的元件被示出為使用例如匯流排等連接件606彼此進行電通信。示例性計算裝置600包括處理器（例如，CPU、處理器或類似裝置）604及連接件606（例如，諸如匯流排或類似裝置），連接件606被配置成將計算裝置600的元件（例如但不限於記憶體620、唯讀記憶體（read only memory，ROM）618、隨機存取記憶體（random access memory，RAM）616和/或儲存裝置608）耦合到處理器604。FIG6 illustrates an exemplary computing device according to various aspects of the present disclosure. For example, computing device 600 can implement any of the systems or methods described herein. In some examples, computing device 600 can be an element of a media device or included in a media device. The elements of computing device 600 are shown as being in electrical communication with each other using connectors 606, such as buses. The exemplary computing device 600 includes a processor (e.g., a CPU, a processor, or the like) 604 and a connection 606 (e.g., such as a bus or the like) configured to couple elements of the computing device 600 (e.g., but not limited to, a memory 620, a read only memory (ROM) 618, a random access memory (RAM) 616, and/or a storage device 608) to the processor 604.

計算裝置600可包括高速記憶體的快取記憶體602，所述高速記憶體直接與處理器604連接、緊鄰處理器604或者集成在處理器604內。計算裝置600可將資料從記憶體620和/或儲存裝置608複製到快取記憶體602，以供處理器604更快地存取。這樣一來，快取記憶體602可提供性能提升（boost），性能提升會避免處理器604等待資料時的延遲。作為另外一種選擇，處理器604可直接從記憶體620、ROM 618、RAM 616和/或儲存裝置608存取資料。記憶體620可包括多種類型的同質記憶體或異質記憶體（例如，諸如但不限於，磁性記憶體、光學記憶體、固態記憶體等）。The computing device 600 may include a cache 602 of high-speed memory that is directly connected to, adjacent to, or integrated into the processor 604. The computing device 600 may copy data from the memory 620 and/or storage device 608 to the cache 602 for faster access by the processor 604. In this way, the cache 602 may provide a performance boost that avoids delays when the processor 604 is waiting for data. Alternatively, the processor 604 may access data directly from the memory 620, ROM 618, RAM 616, and/or storage device 608. The memory 620 may include various types of homogeneous memory or heterogeneous memory (eg, such as but not limited to, magnetic memory, optical memory, solid-state memory, etc.).

儲存裝置608可包括一個或多個非暫時性電腦可讀媒體，例如揮發性記憶體和/或非揮發性記憶體。非暫時性電腦可讀媒體可儲存可由計算裝置600存取的指令和/或資料。非暫時性電腦可讀媒體可包括但不限於磁帶盒（magnetic cassette）、硬碟（hard-disk drive，HDD）、快閃記憶體、固態記憶體裝置、數位多功能光碟（digital versatile disk）、盒式磁帶（cartridge）、光碟、隨機存取記憶體（RAM）616、唯讀記憶體（ROM）618、其組合或類似裝置。The storage device 608 may include one or more non-transitory computer-readable media, such as volatile memory and/or non-volatile memory. The non-transitory computer-readable media may store instructions and/or data that may be accessed by the computing device 600. The non-transitory computer-readable media may include, but are not limited to, magnetic cassettes, hard-disk drives (HDDs), flash memory, solid-state memory devices, digital versatile disks, cartridges, optical disks, random access memory (RAM) 616, read-only memory (ROM) 618, combinations thereof, or the like.

儲存裝置608可對可由處理器604和/或其他電子硬體執行的一個或多個服務（例如服務1 610、服務2 612及服務3 614）進行儲存。所述一個或多個服務包括可由處理器604執行以進行如下操作的指令：實行操作，例如本文中闡述的技術、步驟、過程、區塊和/或操作中的任意者；控制與計算裝置600進行通信的裝置的操作；處理器604和/或任何專用處理器的操作；其組合；或者類似裝置。處理器604可為系統晶片（system on a chip，SOC），系統晶片包括一個或多個核心或處理器、匯流排、記憶體、時鐘、記憶體控制器、快取記憶體、其他處理器元件及/或類似裝置。多核處理器可為對稱的或非對稱的。The storage device 608 may store one or more services (e.g., service 1 610, service 2 612, and service 3 614) executable by the processor 604 and/or other electronic hardware. The one or more services include instructions executable by the processor 604 to perform the following operations: perform operations, such as any of the techniques, steps, processes, blocks, and/or operations described herein; control the operation of devices in communication with the computing device 600; the operation of the processor 604 and/or any special purpose processor; combinations thereof; or similar devices. The processor 604 may be a system on a chip (SOC) that includes one or more cores or processors, buses, memory, clocks, memory controllers, caches, other processor components, and/or the like. Multi-core processors may be symmetric or asymmetric.

計算裝置600可包括一個或多個輸入裝置622，所述一個或多個輸入裝置622可表示任何數目的輸入機制，例如麥克風、用於圖形輸入的觸摸螢幕、鍵盤、滑鼠、運動輸入、語音、媒體裝置、感測器、其組合或類似裝置。計算裝置600可包括向使用者輸出資料的一個或多個輸出裝置624。此種輸出裝置624可包括但不限於媒體裝置、投影機、電視、揚聲器、其組合或類似裝置。在一些實例中，多模計算裝置（multimodal computing device）可使用戶能夠提供多種類型的輸入來與計算裝置600進行通信。通信介面626可被配置成對使用者輸入及計算裝置輸出進行管理。通信介面626還可被配置成經由一個或多個通信協議和/或經由一個或多個通信媒體（例如，有線的、無線的等）對與遠端裝置的通信進行管理（例如，建立連接、接收/傳送通信等）。The computing device 600 may include one or more input devices 622, which may represent any number of input mechanisms, such as a microphone, a touch screen for graphic input, a keyboard, a mouse, motion input, voice, a media device, a sensor, a combination thereof, or the like. The computing device 600 may include one or more output devices 624 for outputting data to a user. Such output devices 624 may include, but are not limited to, a media device, a projector, a television, a speaker, a combination thereof, or the like. In some examples, a multimodal computing device may enable a user to provide multiple types of input to communicate with the computing device 600. The communication interface 626 may be configured to manage user input and computing device output. The communication interface 626 may also be configured to manage communications with remote devices (e.g., establish connections, receive/send communications, etc.) via one or more communication protocols and/or via one or more communication media (e.g., wired, wireless, etc.).

計算裝置600並不限於如圖6中所示的組件。計算裝置600可包括未示出的其他組件和/或可省略所示出的組件。The computing device 600 is not limited to the components shown in Figure 6. The computing device 600 may include other components not shown and/or may omit the components shown.

以下例子例示出本揭露的各方面。如下文所使用，對一系列例子的任何引用將被理解為對這些例子中的每一者的分開引用（例如，“例子1到例子4”應被理解為“例子1、例子2、例子3或例子4”）。The following examples illustrate various aspects of the present disclosure. As used below, any reference to a series of examples will be understood as a separate reference to each of these examples (e.g., "Example 1 to Example 4" should be understood as "Example 1, Example 2, Example 3, or Example 4").

例子1是一種電腦實施的方法，所述電腦實施的方法包括：接收多個圖像；從所述多個圖像選擇一個或多個圖像；使用第一機器學習模型對所述一個或多個圖像進行處理，以產生第一預測結果；使用多個機器學習模型對所述一個或多個圖像進行處理，以產生至少第二預測結果及第三預測結果，其中第一機器學習模型大於所述多個機器學習模型；基於對第一預測結果與所述至少第二預測結果及第三預測結果的比較而從所述多個機器學習模型選擇第二機器學習模型；以及使用第二機器學習模型對所述多個圖像進行處理。Example 1 is a computer-implemented method, which includes: receiving multiple images; selecting one or more images from the multiple images; processing the one or more images using a first machine learning model to generate a first prediction result; processing the one or more images using multiple machine learning models to generate at least a second prediction result and a third prediction result, wherein the first machine learning model is larger than the multiple machine learning models; selecting a second machine learning model from the multiple machine learning models based on a comparison of the first prediction result with the at least second prediction result and the third prediction result; and processing the multiple images using the second machine learning model.

例子2是例子1及例子3到例子8中的任意者所述的電腦實施的方法，其中從所述多個機器學習模型選擇第二機器學習模型包括：產生第一預測結果的第一準確度值、第二預測結果的第二準確度值及第三預測結果的第三準確度值；對第一準確度值、第二準確度值及第三準確度值進行比較，其中基於第二準確度值高於第一準確度值及第三準確度值而選擇第二機器學習模型。Example 2 is the computer-implemented method described in any of Examples 1 and Examples 3 to 8, wherein selecting a second machine learning model from the multiple machine learning models includes: generating a first accuracy value for the first prediction result, a second accuracy value for the second prediction result, and a third accuracy value for the third prediction result; comparing the first accuracy value, the second accuracy value, and the third accuracy value, wherein the second machine learning model is selected based on the second accuracy value being higher than the first accuracy value and the third accuracy value.

例子3是例子1到例子2及例子4到例子8中的任意者所述的電腦實施的方法，其中選擇第二機器學習模型包括：使用損失函數產生第一預測結果的第一值、第二預測結果的第二值及第三預測結果的第三值；對第一值、第二值及第三值進行比較，其中基於第二值高於第一值及第三值而選擇第二機器學習模型。Example 3 is a computer-implemented method as described in any of Examples 1 to 2 and Examples 4 to 8, wherein selecting a second machine learning model includes: using a loss function to generate a first value of a first prediction result, a second value of a second prediction result, and a third value of a third prediction result; comparing the first value, the second value, and the third value, wherein the second machine learning model is selected based on the second value being higher than the first value and the third value.

例子4是例子1到例子3及例子5到例子8中的任意者所述的電腦實施的方法，其中第二機器學習模型被配置成為所述多個圖像中的圖像產生深度估測圖。Example 4 is the computer-implemented method of any of Examples 1 to 3 and 5 to 8, wherein the second machine learning model is configured to generate a depth estimation map for an image in the plurality of images.

例子5是例子1到例子4及例子6到例子8中的任意者所述的電腦實施的方法，其中第二機器學習模型被配置成對所述多個圖像中的圖像執行語義分割。Example 5 is the computer-implemented method of any of Examples 1 to 4 and Examples 6 to 8, wherein the second machine learning model is configured to perform semantic segmentation on an image in the plurality of images.

例子6是例子1到例子5及例子7到例子8中的任意者所述的電腦實施的方法，其中第二機器學習模型被配置成使用所述多個圖像中的圖像執行實例分割。Example 6 is the computer-implemented method of any of Examples 1 to 5 and Examples 7 to 8, wherein the second machine learning model is configured to perform instance segmentation using an image from the plurality of images.

例子7是例子1到例子6及例子8中的任意者所述的電腦實施的方法，其中所述多個機器學習模型是深度神經網路。Example 7 is a computer-implemented method as described in any of Examples 1 to 6 and 8, wherein the multiple machine learning models are deep neural networks.

例子8是例子1到例子7中的任意者所述的電腦實施的方法，其中第一機器學習模型包括比所述多個機器學習模型中的每一者多的層。Example 8 is the computer-implemented method of any of Examples 1 to 7, wherein the first machine learning model includes more layers than each of the plurality of machine learning models.

例子9是一種系統，所述系統包括：一個或多個處理器；非暫時性電腦可讀媒體，儲存指令，指令在由所述一個或多個處理器執行時使所述一個或多個處理器實行例子1到例子8中的任意者所述的操作。Example 9 is a system comprising: one or more processors; a non-transitory computer-readable medium storing instructions that, when executed by the one or more processors, cause the one or more processors to perform the operations described in any of Examples 1 to 8.

例子10是一種非暫時性電腦可讀媒體，所述非暫時性電腦可讀媒體儲存指令，指令在由一個或多個處理器執行時使所述一個或多個處理器實行例子1到例子8中的任意者所述的操作。Example 10 is a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform the operations described in any of Examples 1 to 8.

用語“電腦可讀媒體”包括但不限於可攜式儲存裝置或非可攜式儲存裝置、光學儲存裝置以及能夠儲存、包含或承載指令和/或資料的各種其他媒體。電腦可讀媒體可包括非暫時性媒體，在所述非暫時性媒體中，資料可以不包括載波和/或電子信號的形式進行儲存。非暫時性媒體的例子可包括但不限於磁碟或磁帶、光學儲存媒體（例如光碟（compact disk，CD）或數位多功能光碟（DVD））、快閃記憶體、記憶體或記憶體裝置。電腦可讀媒體可在上面儲存代碼和/或機器可執行指令，所述機器可執行指令可表示程序、函數、子程式、程式、常式、子常式、模組、軟體封包、類、或者指令、資料結構或程式語句的任何組合。可通過傳遞和/或接收資訊、資料、引數、參數或記憶體內容而將代碼片段耦合到另一代碼片段或硬體電路。可通過任何合適的手段（包括記憶體共用、消息傳遞、令牌傳遞、網路傳送或類似方式）來傳遞、轉發或傳送資訊、引數、參數、資料等。The term "computer-readable medium" includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other media capable of storing, containing, or carrying instructions and/or data. Computer-readable media may include non-transitory media in which data may be stored in a form that does not include carrier waves and/or electronic signals. Examples of non-transitory media may include, but are not limited to, magnetic disks or tapes, optical storage media (such as compact disks (CDs) or digital versatile disks (DVDs)), flash memory, memory, or memory devices. Computer-readable media may store thereon code and/or machine-executable instructions, which may represent procedures, functions, subroutines, programs, routines, subroutines, modules, software packages, classes, or any combination of instructions, data structures, or programming statements. A code segment may be coupled to another code segment or hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted by any suitable means, including memory sharing, message passing, token passing, network transmission, or the like.

本說明書的某些部分根據對資訊進行的操作的演算法及符號表示來闡述例子。這些操作（儘管在功能上、計算上或邏輯上進行闡述）可由電腦程式或等效電路、微代碼等來實施。此外，可不失一般性地將操作的佈置稱為模組。可以軟體、韌體、硬體或其任意組合來實施所闡述的操作及其相關聯模組。Portions of this specification describe examples in terms of algorithms and symbolic representations of operations on information. These operations (although described functionally, computationally, or logically) may be implemented by computer programs or equivalent circuits, microcodes, etc. In addition, arrangements of operations may be referred to as modules without loss of generality. The described operations and their associated modules may be implemented in software, firmware, hardware, or any combination thereof.

本文中所述的步驟、操作或過程中的任意者可使用一個或多個硬體或軟體模組單獨地實行或實施或者與其他裝置結合地實行或實施。在一些例子中，可使用對電腦程式代碼進行儲存的電腦可讀媒體來實施軟體模組，電腦程式代碼可由處理器執行以實行所闡述的步驟、操作或過程中的任意者或全部。Any of the steps, operations or processes described herein may be implemented or performed using one or more hardware or software modules alone or in combination with other devices. In some examples, a computer-readable medium storing computer program code may be used to implement a software module, which may be executed by a processor to perform any or all of the steps, operations or processes described.

一些示例可涉及用於實行所闡述的步驟、操作或過程中的任意者或全部的設備或系統。設備或系統可為針對所需目的而專門構造，和/或設備或系統可包括通用計算裝置，所述通用計算裝置由儲存在計算裝置的記憶體中的電腦程式選擇性地啟動或重新配置。記憶體可為或包括非暫時性的有形電腦可讀儲存媒體、或者適合於對電子指令進行儲存的任何類型的媒體（其可耦合到匯流排）。此外，說明書中提到的任何計算系統可包括單個處理器或多個處理器。Some examples may involve an apparatus or system for performing any or all of the steps, operations, or processes described. The apparatus or system may be specially constructed for the desired purpose, and/or the apparatus or system may include a general-purpose computing device that is selectively activated or reconfigured by a computer program stored in a memory of the computing device. The memory may be or include a non-transitory tangible computer-readable storage medium, or any type of medium suitable for storing electronic instructions (which may be coupled to a bus). In addition, any computing system mentioned in the specification may include a single processor or multiple processors.

儘管已經針對具體例子詳細闡述本主題，但是應理解，所屬領域中的技術人員在理解前述內容的基礎上可容易地生成此種實施例的變更、變型及等效物。本文中陳述許多具體細節，以提供對所主張主題的透徹理解。然而，所屬領域中的技術人員將理解，可在沒有這些具體細節的情況下實踐所主張主題。在其他實例中，尚未詳細闡述所屬領域中的普通技術人員已知的方法、設備或系統，以免使所主張主題模糊。因此，已經出於示例而非限制的目的而呈現本揭露，且不排除包括對本主題的這種修改、變型和/或添加，這對於所屬領域中的普通技術人員來說將顯而易見。Although the subject matter has been described in detail with respect to specific examples, it should be understood that variations, modifications, and equivalents of such embodiments can be easily generated by a person skilled in the art based on an understanding of the foregoing. Many specific details are set forth herein to provide a thorough understanding of the claimed subject matter. However, a person skilled in the art will understand that the claimed subject matter can be practiced without these specific details. In other examples, methods, apparatuses, or systems known to a person of ordinary skill in the art have not been described in detail to avoid obscuring the claimed subject matter. Therefore, the present disclosure has been presented for purposes of example and not limitation, and does not exclude the inclusion of such modifications, modifications, and/or additions to the subject matter as would be apparent to a person of ordinary skill in the art.

為使闡釋清楚起見，在一些實例中，可將本揭露呈現為包括各別的功能區塊，這些功能區塊包括包含裝置、裝置元件、以軟體實施的方法中的步驟或常式、或硬體與軟體的組合的功能區塊。也可使用除了在圖中示出的和/或在本文中闡述的功能區塊之外的附加的功能區塊。舉例來說，可以方塊圖形式將電路、系統、網路、過程及其他元件示出為元件，以免以不必要的細節使實施例模糊不清。在其他實例中，為了避免使實施例模糊不清，所示出的眾所周知的電路、過程、演算法、結構及技術可不具有不必要的細節。For clarity of explanation, in some examples, the disclosure may be presented as including separate functional blocks, including functional blocks including devices, device elements, steps or routines in methods implemented in software, or combinations of hardware and software. Additional functional blocks other than those shown in the figures and/or described herein may also be used. For example, circuits, systems, networks, processes, and other elements may be shown as elements in block diagram form to avoid obscuring the embodiments with unnecessary detail. In other examples, to avoid obscuring the embodiments, the well-known circuits, processes, algorithms, structures, and techniques shown may be free of unnecessary detail.

各別例子在本文中可被闡述為過程或方法，所述過程或方法可被繪示出為流程圖表、流程圖、資料流程圖、結構圖或方塊圖。儘管流程圖表可將操作闡述為順序過程，但是可並行或同時實行操作中的許多操作。另外，可重新安排操作的次序。當一過程的操作完成時，所述過程被終止，但是可能具有未示出的附加步驟。過程可對應於方法、函數、程序、子常式、子程式等。當過程對應於函數時，它的終止可與將函數返送到調用函數或主函數對應。Individual examples may be described herein as processes or methods, which may be depicted as flow charts, flow diagrams, data flow diagrams, structure diagrams, or block diagrams. Although a flow chart may describe an operation as a sequential process, many of the operations may be performed in parallel or simultaneously. In addition, the order of the operations may be rearranged. When the operations of a process are completed, the process is terminated, but may have additional steps that are not shown. A process may correspond to a method, function, procedure, subroutine, subprogram, etc. When a process corresponds to a function, its termination may correspond to returning the function to the calling function or the main function.

根據上述例子的過程及方法可使用儲存在電腦可讀媒體中或以其他方式從電腦可讀媒體獲得的電腦可執行指令來實施。此種指令可包括例如指令及資料，所述指令及資料使得或以其他方式將通用電腦、專用電腦或處理裝置配置成實行特定功能或功能群組。可經由網路對所使用的電腦資源的部分進行存取。電腦可執行指令可為例如二進位、中間格式指令，例如組合語言、韌體、原始程式碼等。The processes and methods according to the above examples may be implemented using computer executable instructions stored in or otherwise obtained from a computer readable medium. Such instructions may include, for example, instructions and data that cause or otherwise configure a general purpose computer, a special purpose computer, or a processing device to perform a particular function or group of functions. Portions of the computer resources used may be accessed via a network. The computer executable instructions may be, for example, binary, intermediate format instructions such as assembly language, firmware, source code, etc.

實施本文中闡述的方法及系統的裝置可包括硬體、軟體、韌體、中介軟體、微代碼、硬體描述語言或其任意組合，且可採用多種形式因素中的任意者。當以軟體、韌體、中介軟體或微代碼實施時，可將用於實行必要任務的程式碼或代碼片段（例如，電腦程式產品）儲存在電腦可讀或機器可讀媒體中。程式碼可由處理器執行，處理器可包括一個或多個處理器，例如但不限於一個或多個數位訊號處理器（digital signal processor，DSP）、通用微處理器、應用專用積體電路（ASIC）、現場可程式設計閘陣列（field programmable gate array，FPGA）或其他等效的集成或分立邏輯電路系統。此種處理器可被配置成實行本揭露中闡述的技術中的任何技術。處理器可為微處理器、傳統處理器、控制器、微控制器、狀態機或類似裝置。處理器還可被實施為計算元件的組合（例如，DSP與微處理器的組合、多個微處理器、結合有DSP核心的一個或多個微處理器、或者任何其他此種配置）。因此，本文中使用的用語“處理器”可指前述結構中的任意者、前述結構的任何組合、或者適於實施本文中闡述的技術的任何其他結構或設備。本文中闡述的功能也可在週邊裝置或附加卡（add-in card）中實施。作為進一步的實例，此種功能也可在電路板上在單個裝置中所執行的不同晶片或不同過程之中實施。Devices implementing the methods and systems described herein may include hardware, software, firmware, middleware, microcode, hardware description language, or any combination thereof, and may take any of a variety of form factors. When implemented in software, firmware, middleware, or microcode, the code or code segments (e.g., computer program products) used to perform the necessary tasks may be stored in a computer-readable or machine-readable medium. The program code may be executed by a processor, which may include one or more processors, such as but not limited to one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuit systems. Such processors may be configured to implement any of the techniques described in this disclosure. The processor may be a microprocessor, a conventional processor, a controller, a microcontroller, a state machine, or the like. The processor may also be implemented as a combination of computing elements (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors combined with a DSP core, or any other such configuration). Thus, the term "processor" as used herein may refer to any of the aforementioned structures, any combination of the aforementioned structures, or any other structure or device suitable for implementing the techniques described herein. The functionality described herein may also be implemented in a peripheral device or add-in card. As a further example, such functionality may also be implemented in different chips or different processes executed in a single device on a circuit board.

在前面說明中，參照本揭露的具體例子闡述本揭露的各方面，但是所屬領域中的技術人員將認識到本揭露並不限於此。因此，儘管在本文中已經詳細闡述本揭露的例示性例子，但是應理解，本發明概念可以不同的方式實施及採用，且所附申請專利範圍旨在被解釋為包括此種變型。上述揭露的各種特徵及方面可單獨使用或以任何組合使用。此外，在不背離本揭露的更廣泛的精神及範圍的條件下，例子可被用在本文中闡述的環境及應用之外的任何數目的環境及應用中。因此，本揭露及圖應被視為例示性的而非限制性的。In the foregoing description, various aspects of the disclosure are described with reference to specific examples of the disclosure, but those skilled in the art will recognize that the disclosure is not limited thereto. Therefore, although illustrative examples of the disclosure have been described in detail herein, it should be understood that the inventive concepts may be implemented and employed in different ways, and the appended patent claims are intended to be interpreted as including such variations. The various features and aspects of the above disclosure may be used alone or in any combination. In addition, without departing from the broader spirit and scope of the disclosure, the examples may be used in any number of environments and applications beyond those described herein. Therefore, the disclosure and the figures should be regarded as illustrative rather than restrictive.

結合本文中所揭露實施例闡述的各種例示性邏輯區塊、模組、電路及演算法步驟可被實施為電子硬體、電腦軟體、韌體或其組合。為了清楚地例示硬體與軟體的此種可互換性，已經在上文根據各種例示性的元件、區塊、模組、電路及步驟的功能對各種例示性的元件、區塊、模組、電路及步驟進行一般闡述。此種功能被實施為硬體還是軟體取決於特定應用及對整個系統的設計約束。熟練的技術人員可針對每一特定應用以不同的方式實施所闡述的功能，但是此種實施方案細節決定不應被解釋為導致背離本申請的範圍。The various exemplary logic blocks, modules, circuits, and algorithm steps described in conjunction with the embodiments disclosed herein may be implemented as electronic hardware, computer software, firmware, or a combination thereof. In order to clearly illustrate this interchangeability of hardware and software, various exemplary components, blocks, modules, circuits, and steps have been generally described above according to their functions. Whether such functions are implemented as hardware or software depends on the specific application and the design constraints on the entire system. A skilled technician may implement the described functions in different ways for each specific application, but such implementation details decisions should not be interpreted as causing a departure from the scope of this application.

除非另有具體說明，否則應理解，本說明書通篇利用例如“處理”、“計算”、“運算”、“確定”及“識別”或類似用語等用語的論述是指計算裝置（例如一個或多個電腦或類似的一個或多個電子計算裝置）的動作或過程，所述動作或過程對被表示為如下內容的資料進行操縱或轉換：計算平臺的記憶體、寄存器或其他資訊儲存裝置、傳送裝置或媒體裝置內的物理電子或磁量。本文中使用的“適於”或“被配置成”意指開放式及包含性的語言，而不排除適於或被配置成實行附加任務或步驟的裝置。另外，使用“基於”意指是開放式及包含性的，這是因為“基於”一個或多個引用的條件或值的過程、步驟、計算或其他動作實際上可基於該些引用的條件或值之外的附加條件或值。本文中包括的標題、列表及編號只是為了易於闡釋，而不意味著進行限制。Unless otherwise specifically stated, it should be understood that discussions throughout this specification utilizing terms such as "processing," "computing," "calculating," "determining," and "identifying" or similar terms refer to actions or processes of a computing device (e.g., one or more computers or similar electronic computing devices) that manipulate or transform data represented as physical electronic or magnetic quantities within a computing platform's memory, registers, or other information storage devices, transmission devices, or media devices. As used herein, "suitable for" or "configured to" is intended to be an open and inclusive language, and does not exclude devices that are suitable for or configured to perform additional tasks or steps. Additionally, the use of "based on" is meant to be open and inclusive in that a process, step, calculation, or other action "based on" one or more referenced conditions or values may actually be based on additional conditions or values beyond those referenced conditions or values. The headings, lists, and numbers included herein are for ease of explanation only and are not meant to be limiting.

出於例示及說明的目的，已經呈現出所述技術的前述詳細說明。它並不旨在窮舉或將所述技術限制於所公開的精確形式。可根據以上教示進行許多修改及變化。選擇所闡述的實施例是為了最好地闡釋所述技術的原理、其實際應用，且使所屬領域中的其他技術人員能夠在各種實施例中利用所述技術，且具有適合於所設想的特定用途的各種修改。旨在通過申請專利範圍來限定技術的範圍。The foregoing detailed description of the technology has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teachings. The embodiments described were selected in order to best illustrate the principles of the technology, its practical applications, and to enable others skilled in the art to utilize the technology in a variety of embodiments and with various modifications as are suitable for the particular use contemplated. It is intended that the scope of the technology be defined by the scope of the patent application.

104、104-1、104-2、140-3、104-n、600：計算裝置 108：中央處理單元（CPU） 112、620：記憶體 116：輸入/輸出介面 120：網路介面 124：資料處理器 128、208：網路 132：裝置 136：模型選擇器 140：特徵提取器 144：訓練資料 148、408：大型機器學習模型 152、156、412、416、420、424：小型機器學習模型 204：用戶端裝置 212：內容遞送網路 404：輸入樣本 428、432、436、440、444：模型輸出 452：基準選擇 456：模型選擇 504、508、512、516、520、524：方塊 602：快取記憶體 604：處理器 606：連接件 608：儲存裝置 610：服務1 612：服務2 614：服務3 616：隨機存取記憶體（RAM） 618：唯讀記憶體（ROM） 622：輸入裝置 624：輸出裝置 626：通信介面 104, 104-1, 104-2, 140-3, 104-n, 600: computing device 108: central processing unit (CPU) 112, 620: memory 116: input/output interface 120: network interface 124: data processor 128, 208: network 132: device 136: model selector 140: feature extractor 144: training data 148, 408: large machine learning model 152, 156, 412, 416, 420, 424: small machine learning model 204: client device 212: content delivery network 404: input sample 428, 432, 436, 440, 444: Model output 452: Benchmark selection 456: Model selection 504, 508, 512, 516, 520, 524: Blocks 602: Cache memory 604: Processor 606: Connectors 608: Storage devices 610: Service 1 612: Service 2 614: Service 3 616: Random access memory (RAM) 618: Read-only memory (ROM) 622: Input devices 624: Output devices 626: Communication interfaces

圖1示出根據本揭露各方面的用於選擇被配置成對完全不同的資料集進行處理的機器學習模型的示例性系統的方塊圖。圖2示出根據本揭露各方面的示例性三維內容表示系統。圖3示出根據本揭露各方面的示例性分散式資料處理網路的方塊圖。圖4示出根據本揭露各方面的用於選擇用於對資料集進行處理的小型機器學習模型的示例性模型選擇過程的方塊圖。圖5示出根據本揭露各方面的用於單目深度估測的模型選擇的示例性過程的流程圖表。圖6示出根據本揭露各方面的可實施本文中闡述的各種技術的示例性計算裝置的示例性計算裝置架構。 FIG. 1 illustrates a block diagram of an exemplary system for selecting a machine learning model configured to process disparate data sets according to aspects of the present disclosure. FIG. 2 illustrates an exemplary three-dimensional content representation system according to aspects of the present disclosure. FIG. 3 illustrates a block diagram of an exemplary distributed data processing network according to aspects of the present disclosure. FIG. 4 illustrates a block diagram of an exemplary model selection process for selecting a small machine learning model for processing a data set according to aspects of the present disclosure. FIG. 5 illustrates a flow chart of an exemplary process for model selection for monocular depth estimation according to aspects of the present disclosure. FIG. 6 illustrates an exemplary computing device architecture of an exemplary computing device that can implement various techniques described herein according to aspects of the present disclosure.

104：計算裝置 204：用戶端裝置 208：網路 212：內容遞送網路 104: Computing device 204: Client device 208: Network 212: Content delivery network

Claims

A three-dimensional content representation system includes: a content delivery network, providing representation content including one or more images; a client device, sending a request for the representation content; and a computing device, connected to the content delivery device and the client device, configured to: receive the request from the client device; receive the representation content from the content delivery network according to the request; use a first machine learning model to process one or more images in the representation content to generate a first prediction result; use multiple machine learning models to process the one or more images to generate at least a second prediction result and a third prediction result, wherein the first machine learning model includes more layers than each of the plurality of machine learning models; selecting a second machine learning model from the plurality of machine learning models based on a comparison of the first prediction result with the at least the second prediction result and the third prediction result; and processing the plurality of images using the second machine learning model and transmitting the processing results to the client device, wherein the client device uses the processing results to generate a three-dimensional representation of the representation content.

A three-dimensional content representation system as described in claim 1, wherein the computing device includes transmitting the representation content and the depth map obtained by processing the multiple images using the second machine learning model to the client device, and the client device includes generating a three-dimensional representation of the representation content using the representation content and the depth map.

A three-dimensional content representation system as described in claim 1, wherein the computing device includes transmitting a depth map obtained by processing the multiple images using the second machine learning model to the client device, and the client device includes receiving the representation content from the content delivery network, and using the representation content and the depth map to generate a three-dimensional representation of the representation content.

A three-dimensional content representation system as described in claim 1, wherein the computing device includes a depth map obtained by processing the plurality of images using the second machine learning model, and generates a three-dimensional representation of the representation content using the representation content and the depth map, and transmits or streams the three-dimensional representation of the representation content to the client device for display.

A three-dimensional content representation system as described in claim 1, wherein the computing device comprises: generating a first accuracy value of the first prediction result, a second accuracy value of the second prediction result, and a third accuracy value of the third prediction result; and comparing the first accuracy value, the second accuracy value, and the third accuracy value, wherein the second machine learning model is selected based on the second accuracy value being higher than the first accuracy value and the third accuracy value.

A three-dimensional content representation system as described in claim 1, wherein the computing device comprises: Using a loss function to generate a first value of the first prediction result, a second value of the second prediction result, and a third value of the third prediction result; and comparing the first value, the second value, and the third value, wherein the second machine learning model is selected based on the second value being higher than the first value and the third value.

A three-dimensional content representation method is applicable to a three-dimensional content representation system including a content delivery network, a client device and a computing device, the method comprising the following steps: the computing device receives a request for a representation content from the client device, the representation content including one or more images; the computing device receives the representation content from the content delivery network according to the request; the computing device processes one or more images in the representation content using a first machine learning model to generate a first prediction result; the computing device processes the one or more images using multiple machine learning models , to generate at least a second prediction result and a third prediction result, wherein the first machine learning model includes more layers than each of the plurality of machine learning models; the computing device selects a second machine learning model from the plurality of machine learning models based on a comparison of the first prediction result with the at least the second prediction result and the third prediction result; the computing device processes the one or more images using the second machine learning model and transmits the processing result to the client device; and the client device generates a three-dimensional representation of the representation content using the processing result.

The method as described in claim 7 further includes: the computing device transmits the representation content and the depth map obtained by processing the one or more images using the second machine learning model to the client device; and the client device uses the representation content and the depth map to generate a three-dimensional representation of the representation content.

The method of claim 7 further includes: the computing device transmits a depth map obtained by processing the one or more images using the second machine learning model to the client device; and the client device receives the representation content from the content delivery network and generates a three-dimensional representation of the representation content using the representation content and the depth map.

The method as claimed in claim 7 further includes: the computing device uses the second machine learning model to process the one or more images to obtain a depth map, and uses the representation content and the depth map to generate a three-dimensional representation of the representation content, and transmits or streams the three-dimensional representation of the representation content to the client device for display.

The method of claim 7, wherein selecting the second machine learning model from the plurality of machine learning models comprises: generating a first accuracy value for the first prediction result, a second accuracy value for the second prediction result, and a third accuracy value for the third prediction result; and comparing the first accuracy value, the second accuracy value, and the third accuracy value, wherein the second machine learning model is selected based on the second accuracy value being higher than the first accuracy value and the third accuracy value.

The method of claim 7, wherein selecting the second machine learning model comprises: using a loss function to generate a first value of the first prediction result, a second value of the second prediction result, and a third value of the third prediction result; and comparing the first value, the second value, and the third value, wherein the second machine learning model is selected based on the second value being higher than the first value and the third value.

A three-dimensional content computing device comprises: a communication interface for communicating with a client device and a content delivery network; a storage device for storing instructions for execution by a processor; and a processor, coupled to the communication interface and the storage device, configured to access and execute the instructions stored in the storage device to: receive a request for a representation content from the client device through the communication interface, the representation content comprising one or more images; receive the representation content from the content delivery network through the communication interface according to the request; process one or more images in the representation content using a first machine learning model to generate a first prediction result; Result; Using multiple machine learning models to process the one or more images to generate at least a second prediction result and a third prediction result, wherein the first machine learning model includes more layers than each of the multiple machine learning models; selecting a second machine learning model from the multiple machine learning models based on a comparison of the first prediction result with the at least the second prediction result and the third prediction result; and using the second machine learning model to process the one or more images and transmitting the processing results to the client device, so that the client device uses the processing results to generate a three-dimensional representation of the representation content.

A three-dimensional content computing device as described in claim 13, wherein the processor includes transmitting the representation content and the depth map obtained by processing the one or more images using the second machine learning model to the client device through the communication interface, so that the client device uses the representation content and the depth map to generate a three-dimensional representation of the representation content.

A three-dimensional content computing device as described in claim 13, wherein the processor includes transmitting a depth map obtained by processing the one or more images using the second machine learning model to the client device through the communication interface, so that the client device receives the representation content from the content delivery network and generates a three-dimensional representation of the representation content using the representation content and the depth map.

A three-dimensional content computing device as described in claim 13, wherein the processor includes a depth map obtained by processing the one or more images using the second machine learning model, and generates a three-dimensional representation of the representation content using the representation content and the depth map, and transmits or streams the three-dimensional representation of the representation content to the client device for display via the communication interface.

A three-dimensional content computing device as described in claim 13, wherein the processor includes: generating a first accuracy value of the first prediction result, a second accuracy value of the second prediction result, and a third accuracy value of the third prediction result; and comparing the first accuracy value, the second accuracy value, and the third accuracy value, wherein the second machine learning model is selected based on the second accuracy value being higher than the first accuracy value and the third accuracy value.

A three-dimensional content computing device as described in claim 13, wherein the processor includes: using a loss function to generate a first value of the first prediction result, a second value of the second prediction result, and a third value of the third prediction result; and comparing the first value, the second value, and the third value, wherein the second machine learning model is selected based on the second value being higher than the first value and the third value.