TWI911362B - Method for reducing network bandwidth required for image streaming by using artificial intelligence processing module - Google Patents
Method for reducing network bandwidth required for image streaming by using artificial intelligence processing moduleInfo
- Publication number
- TWI911362B TWI911362B TW110148954A TW110148954A TWI911362B TW I911362 B TWI911362 B TW I911362B TW 110148954 A TW110148954 A TW 110148954A TW 110148954 A TW110148954 A TW 110148954A TW I911362 B TWI911362 B TW I911362B
- Authority
- TW
- Taiwan
- Prior art keywords
- training
- images
- resolution
- image
- weighting parameters
- Prior art date
Links
Abstract
Description
本發明是關於一種利用人工智慧處理模組來降低影像串流所需網路頻寬的方法,尤指一種藉由在伺服器端先將欲傳輸之影像的解析度降低後再透過網路傳輸給客戶端,然後在客戶端藉由一預先訓練之人工智慧(Artificial Intelligent;簡稱AI)處理模組來把所接收到的影像還原成高解析度,藉以降低影像串流所需網路頻寬的方法。 This invention relates to a method for reducing the network bandwidth required for video streaming using an artificial intelligence (AI) processing module. Specifically, it describes a method that first reduces the resolution of the image to be transmitted on the server side before transmitting it over the network to the client, and then uses a pre-trained AI processing module on the client side to restore the received image to high resolution, thereby reducing the network bandwidth required for video streaming.
近幾年來,網路線上遊戲在全世界越來越受歡迎。在雲端(Cloud-Based)運算相關系統與技術的發展下,一種由伺服器將遊戲內容以影像串流透過網路傳輸給玩家來提供線上遊戲服務的雲端技術也被開發出。 In recent years, online games have become increasingly popular worldwide. With the development of cloud-based computing systems and technologies, a cloud technology has been developed that provides online gaming services by having servers stream game content over the network to players.
傳統上提供此類雲端線上遊戲(On-Line Game)服務的方式是讓伺服器進行幾乎全部的運算。換言之,當提供線上遊戲服務時,一特定應用程式會在伺服器中執行以產生一包含許多3D(Three Dimensional)物件的虛擬3D環境,其中包含了可被玩家控制或移動的3D物件。然後,依據玩家的控制結果,該伺服器把這些3D物件與虛擬3D環境渲染(Render)至一2D(Two Dimensional)遊戲畫面中,以供顯示在玩家的裝置上。接著,伺服器將渲染後的影像編碼(Encode)壓縮成2D影像串流並透過網路傳送給玩家的裝置。該玩家裝置只需將接收到的2D影像串流解碼後加以「播放」,無須進行3D渲染的運算。然而,此種雲端線上遊戲服務仍有數項問題需注意,例如,當為大量玩家同時提供3D渲染程序時伺服器的高負載、因編碼壓縮及串流程序所造成之影像畫面品質的降低、以及經由網路傳送2D影像 串流所消耗的大量通訊頻寬。 Traditionally, this type of cloud-based online game service is provided by having the server perform almost all the computation. In other words, when providing an online game service, a specific application runs on the server to generate a virtual 3D environment containing many 3D (three-dimensional) objects, including 3D objects that the player can control or move. Then, based on the player's control, the server renders these 3D objects and the virtual 3D environment into a 2D (two-dimensional) game screen for display on the player's device. Next, the server encodes and compresses the rendered image into a 2D video stream and transmits it over the network to the player's device. The player device only needs to decode and "play" the received 2D video stream, without performing 3D rendering calculations. However, this type of cloud-based online gaming service still has several issues to consider, such as the high server load when providing 3D rendering to a large number of players simultaneously, the reduction in image quality caused by encoding compression and streaming processes, and the large amount of communication bandwidth consumed by transmitting 2D video streams over the network.
解決畫面品質降低的一種習知方式是在伺服器端提高由遊戲應用程式產生的原圖影像的解析度,並提高傳輸影像時的位元速率(Bitrate),亦即,降低伺服器把原圖影像編碼成2D影像串流時的壓縮率(Compression Ratio)。然而,很顯然地,這樣做將導致伺服器的負載及頻寬消耗量都將因影像的高解析度與高傳輸位元速率而顯著增加。例如,假設圖框率(Frame Rate)與編碼壓縮率都是定值時,當在伺服器端之遊戲應用程式產生的原圖影像的解析度由720p提高到1080p時,則伺服器的運算負載及所需的網路傳輸位元速率都將提高成2.25倍。相對地,倘若嘗試降低伺服器負載或網路頻寬消耗量時,則遊戲影像的畫面品質就會被犧牲。因此,想要同時獲得完美影像品質與經濟的頻寬消耗就成為無法兼得的兩難。 A common approach to addressing image quality degradation is to increase the resolution of the original image generated by the game application on the server side and increase the bitrate when transmitting the image; that is, to reduce the compression ratio when the server encodes the original image into a 2D video stream. However, it's obvious that this will significantly increase server load and bandwidth consumption due to the high image resolution and high bitrate. For example, assuming both the frame rate and compression ratio are constant, increasing the resolution of the original image generated by the game application on the server from 720p to 1080p will increase the server's computational load and the required network bitrate by 2.25 times. Conversely, attempting to reduce server load or network bandwidth consumption will sacrifice the game's visual quality. Therefore, achieving both perfect visual quality and cost-effective bandwidth consumption becomes an impossible dilemma.
解決此問題的另一種方式是降低由伺服器端之遊戲應用程式產生之原圖影像的解析度、或是以一較高壓縮率來把原圖影像編碼成2D影像串流、或兩者併行。藉此,藉由網路傳送2D影像串流的頻寬消耗得以降低,雖然遊戲影像的畫面品質也會被犧牲。同時,在客戶端裝置使用一影像增強技術。一旦收到2D影像串流,客戶端裝置會解碼影像串流並藉由該影像增強技術來來改善影像的視覺效果。直方圖均化(Histogram equalization;簡稱HE)因具有簡單性與效率性,所以是用於提高影像對比的最常用方法之一。然而,HE可能導致過度的對比度增強和特徵丟失問題,導致不自然的外觀和所處理之影像細節的損失。此外,不僅HE而且本領域已知的所有其他影像增強技術都遇到相同的困境,亦即,它們都試圖使用同一套演算法來處理具有完全不同畫面內容的各種影像,而這種想法是不可行的。拿雲端線上遊戲服務為例,由伺服器產生的原圖影像的畫面內容會因為遊戲場景的變化而顯著改變。舉例來說,一城市遊戲場景可能讓遊戲的原圖影像包含許多具有簡單且清晰的外觀輪廓以及雖不同但卻大致同色系的色彩。一個黑暗洞穴的遊戲場景則會使遊戲的原圖影像充滿單調且低色調及低色度值的色彩,但卻具有不規則但不起眼的景觀輪廓。而一茂盛花園的場景則會讓遊戲的原圖影像包含許多生氣勃勃且色彩鮮豔的物件並具有詳細且複雜的輪廓。毫無疑問地,沒有任何一種習知增強技術可以對具有完全不同畫面內容的各種不同場景都同樣提供良好的影像增強效果。 Another way to solve this problem is to reduce the resolution of the original image generated by the server-side game application, or to encode the original image into a 2D video stream at a higher compression rate, or both. This reduces the bandwidth consumption of transmitting the 2D video stream over the network, although the image quality of the game will be sacrificed. Simultaneously, an image enhancement technique is used on the client device. Once the 2D video stream is received, the client device decodes the stream and uses the image enhancement technique to improve the visual effect of the image. Histogram equalization (HE) is one of the most commonly used methods to improve image contrast due to its simplicity and efficiency. However, HE (High-Contrast Enhancement) can lead to excessive contrast enhancement and feature loss, resulting in unnatural appearances and loss of image detail. Furthermore, not only HE but all other known image enhancement techniques in this field face the same dilemma: they all attempt to use the same algorithm to process images with completely different content, which is impractical. Take cloud-based online game services as an example: the content of the original image generated by the server changes significantly due to variations in the game scene. For instance, a city game scene might result in the original image containing many images with simple and clear outlines and colors that, while different, are roughly of the same color family. A dark cave scene will result in a game's original image filled with monotonous, low-toned, and low-saturation colors, but with irregular and unremarkable landscape outlines. Conversely, a lush garden scene will result in an original image containing many vibrant and colorful objects with detailed and complex outlines. Undoubtedly, no single learning enhancement technique can provide equally good image enhancement for such diverse scenes with completely different content.
此外,這些習知影像增強技術的另一缺點是,雖然這些習知影像增強技術的數學運算式可以改善影像的畫面效果例如對比度、銳利度、飽和度等等,但這些運算式及其參數都完全與伺服器產生的原圖影像無關。所以,這些習知影像增強技術的增強過程絕不會讓被增強後的影像在視覺上更接近於其對應之原圖影像,也因此客戶端的遊戲玩家並無法完全享受到由伺服器端之遊戲應用程式所產生的原圖影像的畫面效果。 Furthermore, another drawback of these learned image enhancement techniques is that while their mathematical formulas can improve image quality such as contrast, sharpness, and saturation, these formulas and their parameters are completely unrelated to the original image generated by the server. Therefore, the enhancement process of these techniques never makes the enhanced image visually closer to its corresponding original image. Consequently, client-side gamers cannot fully enjoy the visual effects of the original image generated by the server-side game application.
緣此,本發明的主要目的在提供一種利用人工智慧處理模組來降低影像串流所需網路頻寬的方法。藉由在伺服器端先將欲傳輸之影像的解析度降低後再透過網路把低解析度影像傳輸給客戶端,藉以降低傳輸影像串流所需的網路頻寬。然後,在客戶端藉由一預先訓練之人工智慧(Artificial Intelligent;簡稱AI)處理模組,把所接收到的低解析度影像還原成高解析度影像,同時享受高影像品質及低網路頻寬消耗的雙重優點。 Therefore, the main objective of this invention is to provide a method for reducing the network bandwidth required for image streaming using an artificial intelligence (AI) processing module. By reducing the resolution of the image to be transmitted on the server side before transmitting the low-resolution image to the client over the network, the network bandwidth required for image streaming is reduced. Then, on the client side, a pre-trained AI processing module restores the received low-resolution image to a high-resolution image, simultaneously enjoying the dual advantages of high image quality and low network bandwidth consumption.
為達上述目的,本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的一實施例包括:步驟(A):在一伺服器中執行一第一應用程式以產生相對應於複數個原圖影像的複數個來源影像;複數個該來源影像具有一第一解析度;複數個該來源影像被該伺服器內的一編碼器進行編碼與壓縮以產生相對應的複數個編碼後影像;步驟(B):在遠離該伺服器的一客戶端裝置內執行一第二應用程式;該第二應用程式是關聯於且合作於該第一應用程式;步驟(C):該客戶端裝置經由一網路連結於該伺服器,然後以一影像串流的方式經由該網路接收由該伺服器產生的該些編碼後影像;步驟(D):該客戶端裝置將該些編碼後影像解碼成相對應的複數個解碼後影像,並使用一人工智慧(Artificial Intelligent;簡稱AI)處理模組來提升該些解碼後影像的解析度,以產生相對應的複數個高解析度影像;複數個該高解析度影像具有一第二解析度;其中,該第二解析度是高於該第一解析度,並且,複數個該原圖影像的解析度等於該第二解析度;步驟(E):該客戶端裝置將複數個該高解析度影像依序輸出至一螢幕以作為被播放的複數個輸出影像;其中,該AI處理模組藉由分析該些解碼後影像與相對應之該些原圖影像之間的差異所預先得到的至少一數學運算式以及複數個加權參數來處理該些解碼後影像;藉此,所得到的該些高解析度 影像的解析度會等於相對應的該些原圖影像、且高於複數個該來源影像的解析度;該AI處理模組的該至少一數學運算式以及複數個該加權參數是預先藉由一訓練伺服器內的一人工神經網路模組所執行的一訓練程序來定義。 To achieve the above objectives, an embodiment of the method for reducing the network bandwidth required for video streaming using an artificial intelligence processing module includes: Step (A): Executing a first application in a server to generate a plurality of source images corresponding to a plurality of original image images; the plurality of source images having a first resolution; the plurality of source images being encoded and compressed by an encoder within the server to generate a plurality of corresponding encoded images; Step (B): [The text abruptly ends here, likely due to an incomplete translation or missing information.] A second application runs within a client device on the server; the second application is associated with and works in conjunction with the first application; Step (C): The client device connects to the server via a network and receives the encoded images generated by the server via the network as a video stream; Step (D): The client device decodes the encoded images into a corresponding plurality of decoded images and uses an artificial intelligence (AI)... Intelligent (AI) processing module is used to improve the resolution of the decoded images to generate a plurality of high-resolution images; the plurality of high-resolution images have a second resolution; wherein the second resolution is higher than the first resolution, and the resolution of the plurality of original images is equal to the second resolution; Step (E): The client device sequentially outputs the plurality of high-resolution images to a screen as a plurality of output images to be played; wherein the AI processing module uses The decoded images are processed using at least one mathematical formula and a plurality of weighting parameters, derived from analyzing the differences between the decoded images and their corresponding original images. Consequently, the resolution of the resulting high-resolution images is equal to that of the corresponding original images and higher than the resolution of the plurality of source images. The at least one mathematical formula and the plurality of weighting parameters of the AI processing module are pre-defined by a training procedure executed by an artificial neural network module within a training server.
較佳者,步驟(A)中所述的複數個該編碼後影像是藉由以下步驟來產生:在該伺服器中執行該第一應用程式以產生複數個該原圖影像,複數個該原圖影像具有該第二解析度;使用一解析度降低程序,將複數個該原圖影像的解析度降低至該第一解析度,以獲得相對應的複數個該來源影像;以及使用該編碼器來將複數個該來源影像進行編碼,以獲得相對應的複數個該編碼後影像。 Preferably, the plurality of encoded images described in step (A) are generated by the following steps: executing the first application on the server to generate a plurality of original images having the second resolution; using a resolution reduction procedure to reduce the resolution of the plurality of original images to the first resolution to obtain a corresponding plurality of source images; and using the encoder to encode the plurality of source images to obtain a corresponding plurality of encoded images.
較佳者,該伺服器包括一AI編碼模組;步驟(A)中所述的複數個該編碼後影像是藉由以下步驟來產生:在該伺服器中執行該第一應用程式以產生複數個該原圖影像,複數個該原圖影像具有該第二解析度;使用該AI編碼模組來將複數個該原圖影像進行降低解析度以獲得相對應的複數個該來源影像、以及將複數個該來源影像進行編碼以獲得相對應的複數個該編碼後影像;其中,該AI編碼模組包含預設的至少一AI編碼運算式;該至少一AI編碼運算式包含預設的複數個編碼加權參數。 Preferably, the server includes an AI encoding module; the plurality of encoded images in step (A) are generated by the following steps: executing the first application on the server to generate a plurality of original images, the plurality of original images having the second resolution; using the AI encoding module to reduce the resolution of the plurality of original images to obtain a corresponding plurality of source images, and encoding the plurality of source images to obtain a corresponding plurality of encoded images; wherein the AI encoding module includes at least one preset AI encoding formula; the at least one AI encoding formula includes a preset plurality of encoding weighting parameters.
較佳者,該AI處理模組的該至少一數學運算式包括一第一預設的AI運算式以及一第二預設的AI運算式;該第一預設的AI運算式包括複數個第一加權參數;該第二預設的AI運算式包括複數個第二加權參數;其中,該第一預設的AI運算式搭配複數個該第一加權參數可用於提高影像的解析度,藉此,由該第一預設的AI運算式搭配複數個該第一加權參數所處理過的影像的解析度可以由該第一解析度提高到該第二解析度;其中,該第二預設的AI運算式搭配複數個該第二加權參數可用於增強影像的品質,藉此,由該第二預設的AI運算式搭配複數個該第二加權參數所處理過的影像的品質比該解碼後影像的品質更高、且更接近於該原圖影像的品質。 Preferably, the at least one mathematical operation of the AI processing module includes a first preset AI operation and a second preset AI operation; the first preset AI operation includes a plurality of first weighting parameters; the second preset AI operation includes a plurality of second weighting parameters; wherein the first preset AI operation, in combination with the plurality of the first weighting parameters, can be used to improve the resolution of the image, thereby increasing the resolution of the image processed by the first preset AI operation in combination with the plurality of the first weighting parameters from the first resolution to the second resolution; wherein the second preset AI operation, in combination with the plurality of the second weighting parameters, can be used to enhance the quality of the image, thereby increasing the quality of the image processed by the second preset AI operation in combination with the plurality of the second weighting parameters to be higher than the quality of the decoded image and closer to the quality of the original image.
較佳者,當該客戶端裝置將所接收到的複數個該編碼後影像解碼成相對應的複數個解碼後影像以後,該客戶端裝置會使用以下其中之一方式來處理複數個該解碼後影像: Preferably, after the client device decodes the received plurality of encoded images into a corresponding plurality of decoded images, the client device processes the plurality of decoded images using one of the following methods:
方式一:該客戶端裝置先使用該第一預設的AI運算式及複數個 該第一加權參數來處理複數個該解碼後影像以產生相對應的具第二解析度的複數個解析度提升影像;接著,該客戶端裝置使用該第二預設的AI運算式及複數個該第二加權參數來處理複數個該解析度提升影像以產生具高影像品質且具該第二解析度的複數個該高解析度影像; Method 1: The client device first uses the first preset AI algorithm and multiple first weighting parameters to process multiple decoded images to generate multiple resolution-upgraded images with a corresponding second resolution; then, the client device uses the second preset AI algorithm and multiple second weighting parameters to process multiple resolution-upgraded images to generate multiple high-resolution images with high image quality and the second resolution;
方式二:該客戶端裝置先使用該第二預設的AI運算式及複數個該第二加權參數來處理複數個該解碼後影像以產生相對應的具高影像品質的複數個品質提升影像;接著,該客戶端裝置使用該第一預設的AI運算式及複數個該第一加權參數來處理複數個該品質提升影像以產生具該第二解析度且具高影像品質的複數個該高解析度影像。 Method 2: The client device first uses the second preset AI algorithm and multiple second weighting parameters to process multiple decoded images to generate multiple quality-enhanced images with corresponding high image quality; then, the client device uses the first preset AI algorithm and multiple first weighting parameters to process multiple quality-enhanced images to generate multiple high-resolution images with the second resolution and high image quality.
1、501、701:伺服器 1. 501, 701: Server
2、21、22、23、502、702:客戶端裝置 2, 21, 22, 23, 502, 702: Client devices
3:基地台 3: Base station
30:路由器 30: Router
4:網路 4: Internet
100、200:應用程式(App) 100, 200: Application (App)
101、201:記憶體 101, 201: Memory
102:編碼 102: Encoding
103:串流 103: Streaming
104:網路設備 104: Network Equipment
105:人工神經網路模組 105: Artificial Neural Network Module
106:神經網路 106: Neural Networks
107:解碼模組 107: Decoding Module
108:比較與訓練模組 108: Comparison and Training Modules
202:網路模組 202: Network Module
203:解碼模組 203: Decoding Module
204:AI增強模組 204: AI Enhancement Module
205:輸出模組 205: Output Module
301-308、400-466、711-723、7161-7229:步驟 301-308, 400-466, 711-723, 7161-7229: Steps
本發明的較佳實施例將配合以下圖式說明,其中: A preferred embodiment of the present invention will be illustrated with reference to the following figures, wherein:
圖一示意地介紹了本發明之利用人工智慧處理模組來降低影像串流所需網路頻寬的系統; Figure 1 schematically illustrates the system of this invention that utilizes an artificial intelligence processing module to reduce the network bandwidth required for video streaming;
圖二是本發明之利用人工智慧處理模組來降低影像串流所需網路頻寬的系統架構的一實施例示意圖; Figure 2 is a schematic diagram of an embodiment of the system architecture of this invention, which utilizes an artificial intelligence processing module to reduce the network bandwidth required for video streaming;
圖三是本發明本發明利用人工智慧處理模組處理影像串流的方法的第一實施例的示意圖; Figure 3 is a schematic diagram of a first embodiment of the method for processing image streams using an artificial intelligence processing module according to the present invention;
圖四是本發明所述的人工神經網路模組105的訓練程序的第一實施例的示意圖; Figure 4 is a schematic diagram of a first embodiment of the training program of the artificial neural network module 105 described in this invention;
圖五是本發明所述的人工神經網路模組105的訓練程序的第二實施例的示意圖; Figure 5 is a schematic diagram of a second embodiment of the training program of the artificial neural network module 105 described in this invention;
圖六是本發明所述的人工神經網路模組105的訓練程序的第三實施例的示意圖; Figure 6 is a schematic diagram of a third embodiment of the training program of the artificial neural network module 105 described in this invention;
圖七是如圖六所示之鑑別器的訓練程序的一實施例示意圖; Figure 7 is a schematic diagram of an embodiment of the training procedure for the detector shown in Figure 6;
圖八,其揭露了本發明之神經網路的訓練過程的一實施例,其中,原圖影像是YUV420、且輸出影像是RGB或YUV420; Figure 8 illustrates an embodiment of the training process of the neural network of this invention, wherein the original image is YUV420 and the output image is RGB or YUV420;
圖九是本發明處理具YUV420格式之解碼後的影像的程序的一實施例示意圖; Figure 9 is a schematic diagram of an embodiment of the program for processing images decoded in YUV420 format according to the present invention;
圖十是本發明處理具YUV420格式之解碼後的影像的程序的另一 實施例示意圖; Figure 10 is a schematic diagram of another embodiment of the procedure for processing images decoded in YUV420 format according to the present invention;
圖十一A是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第二實施例示意圖; Figure 11A is a schematic diagram of a second embodiment of the method of the present invention for reducing the network bandwidth required for video streaming using an artificial intelligence processing module;
圖十一B是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第三實施例示意圖; Figure 11B is a schematic diagram of a third embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module;
圖十二A是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第四實施例示意圖; Figure 12A is a schematic diagram of the fourth embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module according to the present invention;
圖十二B是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第五實施例示意圖; Figure 12B is a schematic diagram of the fifth embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module according to the present invention;
圖十三是本發明所述AI處理模組的第一預設的AI運算式及第一加權參數的訓練方式的一實施例示意圖; Figure 13 is a schematic diagram of an embodiment of the training method for the first preset AI algorithm and the first weighting parameters of the AI processing module described in this invention;
圖十四A是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第六實施例示意圖; Figure 14A is a schematic diagram of the sixth embodiment of the method of the present invention for reducing the network bandwidth required for video streaming using an artificial intelligence processing module;
圖十四B是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第七實施例示意圖; Figure 14B is a schematic diagram of the seventh embodiment of the method of the present invention for reducing the network bandwidth required for video streaming using an artificial intelligence processing module;
圖十五是本發明所述AI處理模組的第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數的訓練方式的一實施例示意圖; Figure 15 is a schematic diagram of an embodiment of the training method for the first preset AI algorithm, the second preset AI algorithm, the first weighting parameter, and the second weighting parameter of the AI processing module described in this invention;
圖十六是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第八實施例示意圖; Figure 16 is a schematic diagram of the eighth embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module according to the present invention;
圖十七是本發明所述人工神經網路模組的該AI編碼運算式、該AI解碼運算式、第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數的訓練方式的一實施例示意圖。 Figure 17 is a schematic diagram illustrating an embodiment of the training method for the AI encoding algorithm, the AI decoding algorithm, the first preset AI algorithm, the second preset AI algorithm, the first weighting parameter, and the second weighting parameter of the artificial neural network module described in this invention.
本發明是關於一種利用人工智慧處理模組來降低影像串流所需網路頻寬的方法。藉由在伺服器端先將欲傳輸之影像的解析度降低後再透過網路把低解析度影像傳輸給客戶端,藉以降低傳輸影像串流所需的網路頻寬。然後,在客戶端藉由一預先訓練之人工智慧(Artificial Intelligent;簡稱AI)處理模組,把所接收到的低解析度影像還原成高解析度影像,可同時享受高影像品質及低網路頻寬消耗的雙重優點。 This invention relates to a method for reducing the network bandwidth required for video streaming using an artificial intelligence (AI) processing module. By reducing the resolution of the image to be transmitted on the server side before transmitting the low-resolution image to the client over the network, the network bandwidth required for video streaming is reduced. Then, on the client side, a pre-trained AI processing module restores the received low-resolution image to a high-resolution image, simultaneously enjoying the dual advantages of high image quality and low network bandwidth consumption.
本發明的其中一項應用是雲端線上遊戲(cloud-based online games),其中,玩家使用客戶端裝置透過網路連線於一伺服器以遊玩由伺服器提供的遊戲。伺服器可回應由玩家輸入的指令並產生對應的影像視頻。因此,舉例來說,玩家可在客戶端裝置執行一移動的指令。此移動指令透過網路被傳送給伺服器,然後伺服器依據該移動指令計算出一影像,並將該影像傳回並播放於客戶端裝置上。在許多遊戲中,伺服器是產生包含了若干位在可視範圍中的3D渲染物件的2D影像。 One application of this invention is cloud-based online games, where players use client devices connected to a server via a network to play games provided by the server. The server responds to commands input by the player and generates corresponding video images. For example, a player can execute a movement command on their client device. This movement command is transmitted to the server via the network, and the server calculates an image based on the movement command, sends the image back, and plays it on the client device. In many games, the server generates 2D images containing several 3D rendered objects within the visible field of view.
請參閱圖一,其示意地介紹了本發明之利用人工智慧處理模組來降低影像串流所需網路頻寬的系統。一伺服器1被應用於提供由一執行於該伺服器1上的一應用程式的服務;該服務可以是、但不侷限於是一雲端線上遊戲服務。複數個客戶端裝置21、22、23可經一網路4連接(登入)該伺服器1來使用由執行於該伺服器1上之該應用程式所提供的服務。於本實施例中,該網路4是網際網路(Internet),且該客戶端裝置21、22、23可以是任何型式的可連網的電子裝置,例如(但不侷限於)智慧手機21、數位平板、筆記型電腦22、桌上型電腦23、電子遊戲機、或甚至是智慧電視。部分客戶端裝置21、22是透過一無線通訊基地台3或一無線路由器30以無線方式連接於該網路4,而其他則是以有線方式透過網路路由器或網路分享器連接於網路4。執行於伺服器1的應用程式產生一虛擬3D環境其包含了複數3D物件;其中一部份的3D物件可依據使用者的操作而被移動或被破壞、另一些則不能。於一較佳實施例中,對於每一客戶端裝置來說,該應用程式會有一獨立運行事例;也就是說,每一個應用程式只提供該服務給一位客戶端裝置,但是在該伺服器1中可同時執行複數應用程式來提供服務給複數客戶端裝置。該客戶端裝置21、22、23經由網路4連結於該伺服器1來接收由該應用程式產生且包含至少一部份3D物件的畫面。本發明之系統的架構及功能將透過圖二及其相關說明詳述之。 Please refer to Figure 1, which schematically illustrates the system of the present invention that utilizes an artificial intelligence processing module to reduce the network bandwidth required for video streaming. A server 1 is used to provide services by an application running on the server 1; the service may be, but is not limited to, a cloud-based online game service. A plurality of client devices 21, 22, 23 may connect to (log in to) the server 1 via a network 4 to use the services provided by the application running on the server 1. In this embodiment, network 4 is the Internet, and client devices 21, 22, and 23 can be any type of network-connected electronic device, such as (but not limited to) a smartphone 21, a digital tablet, a laptop 22, a desktop computer 23, a video game console, or even a smart TV. Some client devices 21 and 22 are wirelessly connected to network 4 via a wireless communication base station 3 or a wireless router 30, while others are wired connected to network 4 via a network router or network sharer. The application running on server 1 generates a virtual 3D environment containing a plurality of 3D objects; some of these 3D objects can be moved or destroyed according to user actions, while others cannot. In a preferred embodiment, the application operates independently for each client device; that is, each application provides the service to only one client device, but multiple applications can run simultaneously on server 1 to provide services to multiple client devices. Client devices 21, 22, and 23 are connected to server 1 via network 4 to receive images generated by the application, which contain at least a portion of 3D objects. The architecture and functionality of the system of this invention will be detailed in Figure 2 and its related description.
圖二是本發明之系統架構的一實施例示意圖。應用程式(App)100儲存於記憶體101內並於伺服器1上執行的應用程式(通常是3D遊戲程式),其可產生由一系列之原圖影像所構成的3D畫面渲染結果。編碼102及串流103分別是編碼模組及串流模組,其可接受由該應用程式100產生的該些原圖影像,並將其編碼及串流程一2D影像串流。該2D影像串流接著經由伺 服器的網路設備104透過網路4被傳送給位於遠端的客戶端裝置2。每一個客戶端裝置2都分別預先安裝了一應用程式200,該應用程式200是儲存在客戶端裝置2的記憶體201內且可以和與伺服器1上的應用程式100相關聯與合作。客戶端裝置2的應用程式200可和伺服器1的應用程式100建立連結,並藉由該網路模組202來從該伺服器1接收該編碼後的2D影像串流。該編碼後的2D影像串流接著會被解碼模組203解碼以產生解碼後的影像。由於這些編碼、串流及解碼的程序,解碼後的影像的品質很顯然會比原圖影像差很多。內建於客戶端裝置2中的AI處理模組204可以增強那些解碼後的影像的品質,以產生相對應之增強後的影像。其中,該AI處理模組204藉由分析該些解碼後的影像與相對應之該些原圖影像之間的差異所得到的至少一數學運算式來處理該些解碼後的影像;藉此,所得到的增強後的影像在視覺上將會比解碼後的影像更接近於原圖影像。之後,該增強後的影像經由輸出模組205被輸出(播放)於客戶端裝置2的螢幕(顯示面板)上。於本發明中,該客戶端裝置2的AI處理模組204所使用的數學運算式是藉由位於該伺服器1上之的人工神經網路(Artificial Neural Network)模組105所執行的一訓練程序來定義。該人工神經網路模組105設於該伺服器1內且包括有:一人工神經網路106、一解碼模組107及一比較與訓練模組108。本發明所述之人工神經網路模組105的該訓練程序的實施例將於稍後詳述之。 Figure 2 is a schematic diagram of an embodiment of the system architecture of the present invention. The application (App) 100, stored in memory 101 and executed on server 1, is an application (typically a 3D game) that can generate a 3D rendered image result composed of a series of original image files. Encoding 102 and streaming 103 are encoding and streaming modules, respectively, which can accept the original image files generated by the application 100 and encode and stream them as a 2D image stream. The 2D image stream is then transmitted via network 4 through network 4 via network device 104 of the server to a remote client device 2. Each client device 2 has an application 200 pre-installed. This application 200 is stored in the memory 201 of the client device 2 and can associate and cooperate with the application 100 on the server 1. The application 200 of the client device 2 can establish a connection with the application 100 of the server 1 and receive the encoded 2D image stream from the server 1 through the network module 202. The encoded 2D image stream is then decoded by the decoding module 203 to generate a decoded image. Due to these encoding, streaming, and decoding processes, the quality of the decoded image is obviously much worse than the original image. The AI processing module 204, built into the client device 2, can enhance the quality of the decoded images to produce corresponding enhanced images. Specifically, the AI processing module 204 processes the decoded images using at least one mathematical expression derived from analyzing the differences between the decoded images and their corresponding original images; thereby, the resulting enhanced images will visually be closer to the original images than the decoded images. The enhanced images are then output (played) on the screen (display panel) of the client device 2 via the output module 205. In this invention, the mathematical formulas used by the AI processing module 204 of the client device 2 are defined by a training program executed by the Artificial Neural Network (ANN) module 105 located on the server 1. The ANN module 105 is located within the server 1 and includes: an artificial neural network 106, a decoding module 107, and a comparison and training module 108. An embodiment of the training program for the ANN module 105 described in this invention will be detailed later.
圖三是本發明利用人工智慧處理模組處理影像串流的方法的第一實施例的示意圖。藉由利用如圖二及圖三所示之本發明的系統與架構,該方法大體上包括以下步驟: Figure 3 is a schematic diagram of a first embodiment of the method for processing image streams using an artificial intelligence processing module according to the present invention. By utilizing the system and architecture of the present invention as shown in Figures 2 and 3, the method generally includes the following steps:
步驟301:在一伺服器中執行一第一應用程式。該第一應用程式依據至少一指令來產生複數原圖影像(步驟302)。接著,該原圖影像被伺服器內的一解析度降低模組進行解析度降低處理(步驟3021)、然後再被伺服器內的一編碼器(步驟303)進行編碼與壓縮以產生複數編碼後的影像。該些編碼後的影像接著被以一2D影像串流(步驟304)的型式經由網路傳送給客戶端裝置。由於影像在被傳送給客戶端裝置已經先被降低解析度,所以傳送影像串流所需的網路頻寬也因此降低。 Step 301: A first application is executed on a server. This first application generates multiple original image files according to at least one instruction (Step 302). The original image files are then down-processed by a resolution reduction module within the server (Step 3021), and then encoded and compressed by an encoder within the server (Step 303) to generate multiple encoded images. These encoded images are then transmitted to a client device via a network as a 2D video stream (Step 304). Because the images are down-processed before being transmitted to the client device, the network bandwidth required for transmitting the video stream is also reduced.
在遠離該伺服器的一客戶端裝置內執行有一第二應用程式(步驟305)。該第二應用程式是關連於且合作於該第一應用程式,藉此,該客 戶端裝置可供一使用者操作並產生與發送指令給伺服器以享受由伺服器之第一應用程式所提供的服務。該客戶端裝置將該指令經由網路傳送給伺服器,然後經由網路接收由該伺服器產生且相對應於該指令的該編碼後的影像。然後,客戶端裝置將該些編碼後的影像解碼(步驟306)成複數解碼後的影像,並使用一AI處理模組(步驟307)來增強該些解碼後的影像的品質,以產生複數增強後的影像。其中,該AI處理模組藉由分析該些解碼後的影像與相對應之該些原圖影像之間的差異所預先得到的至少一數學運算式來處理該些解碼後的影像;藉此,所得到的增強後的影像在視覺上將會比解碼後的影像更接近於原圖影像。之後,該客戶端裝置將該增強後的影像輸出(步驟308)至螢幕(顯示面板)以作為被播放的輸出影像。 A second application runs on a client device located remotely from the server (step 305). This second application is connected to and works in conjunction with the first application, allowing the client device to be operated by a user to generate and send commands to the server to enjoy services provided by the server's first application. The client device transmits the command to the server via the network and then receives, via the network, the encoded image generated by the server corresponding to the command. The client device then decodes (step 306) these encoded images into multiple decoded images and uses an AI processing module (step 307) to enhance the quality of these decoded images to produce multiple enhanced images. The AI processing module processes the decoded images using at least one pre-derived mathematical formula derived from analyzing the differences between the decoded images and their corresponding original images. This results in an enhanced image that visually resembles the original image more closely than the decoded image. The client device then outputs the enhanced image (step 308) to the screen (display panel) as the output image to be played.
於本發明中,客戶端裝置內的AI處理模組所使用的該至少一數學運算式是包括了複數個加權參數(Weighted Parameters)。該些加權參數是和該解碼後的影像與相對應之該原圖影像之間的差異相關連、且是藉由伺服器內的一人工神經網路模組所執行的一訓練程序來定義。於本發明的一實施例中,該加權參數是預先儲存在客戶端裝置中。於另一實施例中,該加權參數是在客戶端裝置執行了該第二應用程式時,才從該伺服器下載至該客戶端內。 In this invention, the at least one mathematical expression used by the AI processing module within the client device includes a plurality of weighted parameters. These weighted parameters are related to the differences between the decoded image and the corresponding original image, and are defined by a training program executed by an artificial neural network module within the server. In one embodiment of this invention, the weighted parameters are pre-stored in the client device. In another embodiment, the weighted parameters are downloaded from the server to the client device only when the second application is executed.
於本發明之一實施例中,由伺服器所產生的原圖影像所包含的畫面內容會因為遊戲場景的不同而有劇烈變化。舉例來說,一城市遊戲場景可能讓遊戲的原圖影像包含許多具有簡單且清晰的外觀輪廓以及雖不同但卻大致同色系的色彩。另一個黑暗洞穴的遊戲場景則會使遊戲的原圖影像充滿單調且低色調及低色度值的色彩,但卻具有不規則但不起眼的景觀輪廓。而又另一茂盛花園的場景則會讓遊戲的原圖影像包含許多生氣勃勃且色彩鮮豔的物件並具有詳細且複雜的輪廓。本發明的方法運用了數組不同的加權參數來分別適應這些不同的遊戲場景,藉此,經由同一個AU增強模組所增強後的輸出影像的品質可以被維持在一高品質且穩定的水準,即使原圖影像的畫面內容很劇烈地改變。 In one embodiment of the present invention, the image content contained in the original image generated by the server varies drastically depending on the game scene. For example, a city game scene might result in an original image containing many simple and clear outlines and colors that, while different, are generally of the same color family. Another game scene, a dark cave, would result in an original image filled with monotonous, low-toned, and low-saturation colors, but with irregular but unremarkable landscape outlines. Yet another scene, a lush garden, would result in an original image containing many vibrant and brightly colored objects with detailed and complex outlines. This invention employs several different weighting parameters to adapt to different game scenarios. Therefore, the quality of the output image enhanced by the same AU enhancement module can be maintained at a high and stable level, even if the content of the original image changes drastically.
較佳者,由該第一應用程式所產生的該些原圖影像可以被區分為複數組場景(scene-modes),每一場景各包含複數該原圖影像。該些加權參數也被區分為複數組,每一組分別包含複數個加權參數而且是對應於其 中之一該場景。對應於不同場景之原圖影像的該些解碼後的影像會被同一個AI處理模組使用該些不同組之加權參數中與該場景相對應的該組加權參數來進行影像增強處理。於本發明的一實施例中,該些不同組的加權參數是全部預先儲存於該客戶端裝置內,每當該場景改變,相對應於改變後之新場景的該組加權參數就會被運用於該AI處理模組中以供產生該增強後的影像。於另一實施例中,該些不同組的加權參數是全部儲存於該伺服器端,每當該場景改變,相對應於改變後之新場景的該組加權參數就會由伺服器傳送給客戶端裝置,然後被運用於該AI處理模組中以供產生該增強後的影像。 Preferably, the original images generated by the first application can be divided into multiple sets of scene-modes, each scene containing multiple sets of the original images. The weighting parameters are also divided into multiple sets, each set containing multiple weighting parameters corresponding to one of the scenes. The decoded images corresponding to the original images of different scenes are used by the same AI processing module to perform image enhancement processing using the set of weighting parameters corresponding to the scene from the different sets of weighting parameters. In one embodiment of the invention, all the different sets of weighting parameters are pre-stored in the client device. Whenever the scene changes, the set of weighting parameters corresponding to the new scene is applied to the AI processing module to generate the enhanced image. In another embodiment, all the different sets of weighting parameters are stored on the server. Whenever the scene changes, the corresponding set of weighting parameters for the new scene is transmitted from the server to the client device and then used in the AI processing module to generate the enhanced image.
圖四是本發明所述的人工神經網路模組105的訓練程序的第一實施例的示意圖。於本發明中,該客戶端裝置2之AI處理模組204所使用的數學運算式是藉由在伺服器1中的該人工神經網路模組105所執行的一訓練程序來加以訓練及定義。該訓練程序包括以下步驟: Figure 4 is a schematic diagram of a first embodiment of the training program of the artificial neural network module 105 described in this invention. In this invention, the mathematical expressions used by the AI processing module 204 of the client device 2 are trained and defined by a training program executed by the artificial neural network module 105 in the server 1. The training program includes the following steps:
步驟400:在一訓練模式中執行該第一應用程式以產生複數個訓練原圖影像(步驟401),並且,對該些該原圖影像進行解析度降低處理(步驟4011); Step 400: Execute the first application in a training mode to generate a plurality of training source images (Step 401), and perform resolution reduction processing on these source images (Step 4011);
步驟402:將降低解析度後的該些訓練原圖影像藉由該編碼器編碼成為複數個訓練編碼影像; Step 402: Encode the reduced-resolution original training images into multiple encoded training images using the encoder;
步驟403:藉由伺服器中的一訓練解碼器將該些訓練編碼影像解碼成為複數個訓練解碼影像; Step 403: Decode the training encoded images into a plurality of training decoded images using a training decoder on the server;
步驟404:該人工神經網路模組接受該些訓練解碼影像並使用至少一訓練數學運算式來逐一處理該訓練解碼影像,以產生複數個訓練輸出影像(步驟405);該至少一訓練數學運算式包含複數個訓練加權參數;以及 Step 404: The artificial neural network module receives the training decoded images and processes them one by one using at least one training mathematical operation to generate a plurality of training output images (Step 405); the at least one training mathematical operation includes a plurality of training weighting parameters; and
步驟406:以該比較與訓練模組來逐一比較該訓練輸出影像與相對應之該訓練原圖影像之間的差異,並據以調整該至少一訓練數學運算式的該些訓練加權參數;該些訓練加權參數會被調整成可讓該訓練輸出影像與相對應之該訓練原圖影像之間的差異最小化;每一次當該些訓練加權參數被調整後,該些調整後的訓練加權參數就會被回饋給該至少一訓練數學運算式以供在步驟404中處理下一個訓練解碼影像。在進行過預定數量的 訓練輸出影像與相對應之訓練原圖影像的比較、以及預定次數的訓練加權參數的調整程序後,最後完成訓練後所得到的該些訓練加權參數(步驟407)會被取出並應用在該客戶端裝置的AI處理模組內來作為其數學運算式的加權參數。 Step 406: The comparison and training module compares the differences between the training output image and the corresponding training original image one by one, and adjusts the training weighting parameters of the at least one training mathematical formula accordingly. The training weighting parameters are adjusted to minimize the differences between the training output image and the corresponding training original image. Each time the training weighting parameters are adjusted, the adjusted training weighting parameters are fed back to the at least one training mathematical formula for processing the next training decoded image in step 404. After comparing a predetermined number of training output images with their corresponding original training images, and adjusting the training weighting parameters a predetermined number of times, the training weighting parameters obtained after training (step 407) are finally extracted and applied to the AI processing module of the client device as weighting parameters for its mathematical calculations.
於本發明的第一實施例中,該訓練解碼影像被輸入至該人工神經網路模組以產生相對應的該訓練輸出影像。接著,該訓練輸出影像及相對應的該訓練原圖影像會被進行比較以便計算差異值。然後,使用例如:Adam演算法、隨機梯度下降法(Stochastic gradient descent;簡稱SGD)、或前向均方根梯度下降演算法(Root Mean Square Propagation;簡稱RMSProp)等等之數學優化算法來學習所述人工神經網路的加權參數(通常稱為加權weight w、偏異bias b),令該差異值越小越好,藉此該訓練輸出影像可以更接近於其相對應的訓練原圖影像。不同的方法可以被用於計算該差異值(或近似值)以適應不同需求;例如:均方誤差(mean square error;簡稱MSE)、L1正規化(L1 regularization)(使用絕對值誤差absolute value error)、峰值信噪比(peak signal-to-noise ratio;簡稱PSNR),結構相似性(structure similarity;簡稱SSIM)、生成對抗網路損失(generative adversarial networks loss;簡稱GAN loss)及/或其他方法等等。於第一實施例中,以下方法被利用來計算差異值:(1)MSE、L1、及GAN loss的加權平均;(2)MSE;(3)GAN loss並同時訓練鑑別器(Discriminator);(4)MSE的加權平均與MSE的邊際(Edge of MSE)。該訓練程序的更多細節將敘述於後。 In a first embodiment of the present invention, the training decoded image is input into the artificial neural network module to generate a corresponding training output image. Then, the training output image and the corresponding original training image are compared to calculate the difference. Subsequently, mathematical optimization algorithms such as Adam algorithm, Stochastic gradient descent (SGD), or Root Mean Square Propagation (RMSProp) are used to learn the weighting parameters (commonly referred to as weight w and bias b) of the artificial neural network, minimizing the difference so that the training output image can more closely approximate its corresponding original training image. Different methods can be used to calculate the difference (or approximation) to suit different needs; for example: mean square error (MSE), L1 regularization (using absolute value error), peak signal-to-noise ratio (PSNR), structure similarity (SSIM), generative adversarial network loss (GAN loss) and/or other methods, etc. In the first embodiment, the following methods are used to calculate the difference: (1) a weighted average of MSE, L1, and GAN loss; (2) MSE; (3) GAN loss while simultaneously training the discriminator; (4) a weighted average of MSE and the edge of MSE. Further details of the training program will be described later.
圖五是本發明所述的人工神經網路模組105的訓練程序的第二實施例的示意圖。於本發明中,該第二實施例的訓練程序包括以下步驟: Figure 5 is a schematic diagram of a second embodiment of the training procedure for the artificial neural network module 105 described in this invention. In this invention, the training procedure of the second embodiment includes the following steps:
步驟410:在一訓練模式中執行該第一應用程式以產生複數個訓練原圖影像(步驟411),其中,該些訓練原圖影像的顏色格式是色光三原色(RGB);並且,對該些該訓練原圖影像進行解析度降低處理(步驟4111); Step 410: Execute the first application in a training mode to generate a plurality of training source images (Step 411), wherein the color format of these training source images is RGB (primary colors of light); and perform resolution reduction processing on these training source images (Step 4111);
步驟412:將降低解析度後的該些訓練原圖影像藉由該編碼器編碼成為複數個訓練編碼影像; Step 412: Encode the reduced-resolution original training images into multiple encoded training images using the encoder;
步驟413:藉由伺服器中的訓練解碼器將該些訓練編碼影像解碼成為複數個訓練解碼影像; Step 413: Decode the training encoded images into multiple training decoded images using the training decoder on the server;
步驟414:於該第二實施例中,當該訓練解碼影像和該訓練輸 出影像的顏色格式相同時(於本第二實施例中兩者都是RGB),殘差網路模組(residual network module)亦可稱為卷積神經網路(Convolutional Neural Network;簡稱CNN)可被使用於該人工神經網路模組中;用於處理相對應訓練解碼影像的該殘差網路模組的輸出會被和該相對應之訓練解碼影像進行加總(summed up)(步驟415);然後,該殘差網路模組的輸出和該相對應之訓練解碼影像兩者加總的結果會被輸出作為該訓練輸出影像(步驟416);以及 Step 414: In this second embodiment, when the color format of the training decoded image and the training output image is the same (both are RGB in this second embodiment), a residual network module (also known as a convolutional neural network; CNN) can be used in the artificial neural network module; the output of the residual network module used to process the corresponding training decoded image is summed up with the corresponding training decoded image (step 415); then, the sum of the output of the residual network module and the corresponding training decoded image is output as the training output image (step 416); and
步驟417:以該比較與訓練模組來逐一比較該訓練輸出影像與相對應之該訓練原圖影像之間的差異(計算差異值),並據以調整該至少一訓練數學運算式的該些訓練加權參數;該些訓練加權參數會被調整成可讓該訓練輸出影像與相對應之該訓練原圖影像之間的差異最小化;每一次當該些訓練加權參數被調整後,該些調整後的訓練加權參數就會被回饋給該人工神經網路以供在步驟414中處理下一個訓練解碼影像。在進行過預定數量的訓練輸出影像與相對應之訓練原圖影像的比較、以及預定次數的訓練加權參數的調整程序後,最後完成訓練後所得到的該些訓練加權參數(步驟418)會被取出並應用在該客戶端裝置的AI處理模組內來作為其數學運算式的加權參數。 Step 417: The comparison and training module compares the training output image with the corresponding training original image one by one (calculating the difference value), and adjusts the training weighting parameters of the at least one training mathematical formula accordingly. The training weighting parameters are adjusted to minimize the difference between the training output image and the corresponding training original image. Each time the training weighting parameters are adjusted, the adjusted training weighting parameters are fed back to the artificial neural network for processing the next training decoding image in step 414. After comparing a predetermined number of training output images with their corresponding original training images, and adjusting the training weighting parameters a predetermined number of times, the training weighting parameters obtained after training (step 418) are finally extracted and applied to the AI processing module of the client device as weighting parameters for its mathematical calculations.
圖六是本發明所述的人工神經網路模組105的訓練程序的第三實施例的示意圖。於第三實施例中,該比較與訓練模組使用一鑑別器(Discriminator)來比較該訓練輸出影像與相對應之該訓練原圖影像之間的差異並據以調整該訓練加權參數。該第三實施例之訓練程序包括以下步驟: Figure 6 is a schematic diagram of a third embodiment of the training procedure of the artificial neural network module 105 described in this invention. In the third embodiment, the comparison and training module uses a discriminator to compare the difference between the training output image and the corresponding original training image and adjust the training weighting parameters accordingly. The training procedure of the third embodiment includes the following steps:
步驟420:在一訓練模式中執行該第一應用程式以產生複數個訓練原圖影像(步驟421),其中,該些訓練原圖影像包括n個通道,其中n是大於2的正整數;並且,對該些該訓練原圖影像進行解析度降低處理(步驟4211); Step 420: Execute the first application in a training mode to generate a plurality of training source images (Step 421), wherein the training source images include n channels, where n is a positive integer greater than 2; and perform resolution reduction processing on the training source images (Step 4211);
步驟422:將降低解析度後的該些訓練原圖影像藉由該編碼器編碼成為複數個訓練編碼影像; Step 422: Encode the reduced-resolution original training images into multiple encoded training images using the encoder;
步驟423:藉由伺服器中的訓練解碼器將該些訓練編碼影像解碼成為複數個訓練解碼影像;其中該訓練解碼影像包括m個通道,其中m是大於2的正整數;以及 Step 423: Decode the training encoded images into a plurality of training decoded images using a training decoder on the server; wherein each training decoded image includes m channels, where m is a positive integer greater than 2; and
步驟424:該人工神經網路模組接受該些訓練解碼影像(m個通道)並使用至少一訓練數學運算式來逐一處理該訓練解碼影像以供產生複數個訓練輸出影像(n個通道)(步驟425);該至少一訓練數學運算式包含複數個訓練加權參數;該訓練輸出影像(n通道)及與其相對應之該訓練解碼影像(m通道)結合(步驟426)以產生複數個訓練結合影像(具有m+n通道);接著,此些訓練結合影像被回饋至一鑑別器(步驟427)以供分析該訓練輸出影像的品質,藉此訓練該人工神經網路。 Step 424: The artificial neural network module receives the training decoded images (m channels) and processes them one by one using at least one training mathematical operation to generate a plurality of training output images (n channels) (Step 425); the at least one training mathematical operation includes a plurality of training weighting parameters; the training output images (n channels) and their corresponding training decoded images (m channels) are combined (Step 426) to generate a plurality of training combined images (with m+n channels); then, these training combined images are fed back to a discriminator (Step 427) for analysis of the quality of the training output images, thereby training the artificial neural network.
圖七是如圖六所示之鑑別器的訓練程序的一實施例示意圖。該鑑別器的訓練程序包括以下步驟: Figure 7 is a schematic diagram of an embodiment of the training procedure for the detector shown in Figure 6. The training procedure for this detector includes the following steps:
步驟430:在一訓練模式中執行該第一應用程式以產生複數個訓練原圖影像(步驟431),其中,該些訓練原圖影像包括n個通道,其中n是大於2的正整數;並且,對該些該訓練原圖影像進行解析度降低處理(步驟4311); Step 430: Execute the first application in a training mode to generate a plurality of training source images (Step 431), wherein the training source images include n channels, where n is a positive integer greater than 2; and perform resolution reduction processing on the training source images (Step 4311);
步驟432:將降低解析度後的該些訓練原圖影像藉由該編碼器編碼成為複數個訓練編碼影像; Step 432: Encode the reduced-resolution original training images into multiple encoded training images using the encoder;
步驟433:藉由伺服器中的訓練解碼器將該些訓練編碼影像解碼成為複數個訓練解碼影像;其中該訓練解碼影像包括m個通道,其中m是大於2的正整數;以及 Step 433: Decode the training encoded images into a plurality of training decoded images using a training decoder on the server; wherein each training decoded image includes m channels, where m is a positive integer greater than 2; and
步驟434:該人工神經網路模組接受該些訓練解碼影像並使用至少一訓練數學運算式來逐一處理該訓練解碼影像(m通道)以供產生複數個訓練輸出影像(步驟435);該至少一訓練數學運算式包含複數個訓練加權參數;該訓練輸出影像包括n通道; Step 434: The artificial neural network module receives the training decoded images and processes each training decoded image (m channels) sequentially using at least one training mathematical operation to generate a plurality of training output images (Step 435); the at least one training mathematical operation contains a plurality of training weighting parameters; the training output images include n channels;
步驟436:該n通道之訓練輸出影像和其相對應之該m通道的訓練解碼影像兩者結合以產生複數個具有m+n通道的假樣本(false samples);並且,該n通道之訓練原圖影像和其相對應之該m通道的訓練解碼影像兩者結合以產生複數個具有m+n通道的真樣本(true samples)(步驟437);以及 Step 436: The training output image of the n channels and its corresponding training decoded image of the m channels are combined to generate a plurality of false samples with m+n channels; and the original training image of the n channels and its corresponding training decoded image of the m channels are combined to generate a plurality of true samples with m+n channels (Step 437); and
步驟438:該m+n通道的模擬假樣本和該m+n通道的模擬真樣本被回饋至該比較與訓練模組的鑑別器以供訓練該鑑別器去偵測及分辨該模擬假樣本和該模擬真樣本的能力。 Step 438: The simulated fake sample of the m+n channel and the simulated real sample of the m+n channel are fed back to the discriminator of the comparison and training module to train the discriminator to detect and distinguish between the simulated fake sample and the simulated real sample.
在該人工神經網路105(如圖二所示)在伺服器1端被妥善訓練 後,所得到的該加權參數(加權weight w、偏異bias b)會被應用在客戶端裝置內的該AI處理模組204。該AI處理模組204及其相關連的加權參數(加權weight w、偏異bias b)是儲存於該客戶端裝置2。之後,每當該客戶端裝置接收並解碼來自伺服器的2D影像串流中所包含的該編碼後的影像時,每一個該編碼後的影像會被該AI處理模組所處理以產生增強後的影像。然後,該客戶端裝置將該些增強後的影像作為輸出影像播放於其螢幕上。該神經網路可以學習並增強影像的色彩、亮度和細節。由於原圖影像的部分細節會在編碼與串流的過程中受損或失去,一妥善訓練過的神經網路可以修補這些受損或失去的細節。於本發明的實施例中,AI處理模組的神經網路需要以下資訊來操作: After the artificial neural network 105 (as shown in Figure 2) is properly trained on server 1, the obtained weighting parameters (weight w, bias b) are applied to the AI processing module 204 within the client device. The AI processing module 204 and its associated weighting parameters (weight w, bias b) are stored in the client device 2. Subsequently, whenever the client device receives and decodes the encoded images contained in the 2D image stream from the server, each encoded image is processed by the AI processing module to generate an enhanced image. The client device then plays these enhanced images as output images on its screen. The neural network can learn and enhance the color, brightness, and detail of the images. Because some details in the original image may be damaged or lost during encoding and streaming, a properly trained neural network can repair these damaged or lost details. In an embodiment of the invention, the neural network of the AI processing module requires the following information to operate:
相關的函數及參數: Related functions and parameters:
X:輸入影像; X: Input image;
Conv2d(X,a,b,c,d,w,b):執行於X;輸出通道的數量為a(amount of output channel=a);核心大小為b(kernel_size=b);步伐值為c(stride=c);填充尺寸為2d卷積其偏異為d(padding size=2d convolution with bias of d);該訓練的加權參數是核心w(kernel w)及偏異b(bias b); Conv2d(X,a,b,c,d,w,b): Executes on X; the number of output channels is a; the kernel size is b; the stride is c; the padding size is 2d convolution with bias d; the weighting parameters for this training are kernel w and bias b.
Conv2dTranspose(X,a,b,c,w,b)):執行於X;輸出通道的數量為a(amount of output channel=a);核心大小為b(kernel_size=b);步伐值為c(stride=c);裁切尺寸為2d轉置卷積其偏異為d(cropping size=2d transpose convolution with bias of d);該訓練的加權參數是核心w(kernel w)及偏異b(bias b); Conv2dTranspose(X,a,b,c, w , b )): Executes on X; the number of output channels is a; the kernel size is b; the stride is c; the cropping size is 2d transpose convolution with bias of d; the weighting parameters for this training are kernel w and bias b.
σ(X):在X上工作的非線性激活函數; σ(X): A nonlinear activation function operating on X;
uint8(x):用於控制和限制浮點x的值在0到255之間(包括255),u使用無條件捨去方法,轉換為unsigned int8; uint8(x): Used to control and limit the value of floating-point x to between 0 and 255 (inclusive). u uses the unconditional discard method to convert to unsigned int8.
R(X,w):在X上工作的殘差塊(residual blocks),包括很多conv2d及batchnorm,其各自包含自己的加權參數進行訓練(更多資料可以通過以下網站作為參考:https://stats.stackexchange.com/questions/246928/what-exactly-is-a-residual-learning-block-in-the-context-of-deep-residual-networ)。 R(X,w): Residual blocks that operate on X, including many conv2d and batchnorm blocks, each with its own weighted parameters for training (more information can be found at: https://stats.stackexchange.com/questions/246928/what-exactly-is-a-residual-learning-block-in-the-context-of-deep-residual-networ ).
由於輸入及輸出的影像可能具有不同的顏色格式,例如RGB、 YUV420、YUV444等等,所以,以下將討論更多關於具不同顏色格式的輸入及輸出影像的情況。 Since input and output images may have different color formats, such as RGB, YUV420, YUV444, etc., the following discussion will focus more on input and output images with different color formats.
第一種情況:原圖影像是RGB、輸出影像也是RGB。 The first scenario: The original image is RGB, and the output image is also RGB.
此情況是最單純的,因為輸入與輸出影像都是RGB影像。為了提高處理速度,故使用相對較大的核心大小(例如8x8、步伐值stride=4於卷積及轉置卷積結構中)來盡快加速計算,以應付全高清(Full HD)影像的高解析度。在這種情況下使用殘差網絡(Residual network)以使收斂更容易和更穩定。 This is the simplest case, as both input and output images are RGB. To improve processing speed, a relatively large kernel size (e.g., 8x8, stride=4 in convolution and transpose convolution structures) is used to accelerate computation as quickly as possible to handle the high resolution of Full HD video. In this case, a residual network is used to make convergence easier and more stable.
相關的函數及參數: Related functions and parameters:
X:輸入影像,其是RGB格式,且各顏色都是unsigned int8格式; X: Input image, which is in RGB format, and all colors are in unsigned int8 format;
; ;
Y=uint8((Conv2dTranspose(σ(Conv2d(X2,a,b,c,d,w_1,b_1)),w_2,b_2)+X2)*128+128); Y=uint8((Conv2dTranspose(σ(Conv2d(X2,a,b,c,d, w _1, b _1)), w _2, b _2)+X2)*128+128);
w_1是一矩陣其大小是b*b*3*a;b_1是一向量其大小為a; w_1 is a matrix of size b*b*3*a; b_1 is a vector of size a;
w_2是一矩陣其大小是b*b*3*a;b_2是一向量其大小為3; w_2 is a matrix of size b*b*3*a; b_2 is a vector of size 3;
所使用的參數包括: The parameters used include:
X的解析度是1280x720; The resolution of X is 1280x720;
a=128,b=10,c=5,d=0,σ=leaky relu with alpha=0.2; a=128,b=10,c=5,d=0,σ=leaky relu with alpha=0.2;
a=128,b=9,c=5,d=4,σ=leaky relu with alpha=0.2; a=128,b=9,c=5,d=4,σ=leaky relu with alpha=0.2;
a=128,b=8,c=4,d=0,σ=leaky relu with alpha=0.2; a=128,b=8,c=4,d=0,σ=leaky relu with alpha=0.2;
倘若該客戶端裝置具有較快的處理速度,則以下數學式可以被使用: If the client device has a fast processing speed, the following mathematical formula can be used:
Y=uint8((Conv2dTranspose(R(σ(Conv2d(X2,a,b,c,d,w_1,b_1)),w_R),w_2,b_2)+X2)*128+128); Y=uint8((Conv2dTranspose(R(σ(Conv2d(X2,a,b,c,d, w _1, b _1)), w _R), w _2, b _2)+X2)*128+128);
w_1是一矩陣其大小是b*b*3*a;b_1是一向量其大小為a; w_1 is a matrix of size b*b*3*a; b_1 is a vector of size a;
w_2是一矩陣其大小是b*b*3*a;b_2是一向量其大小為3; w_2 is a matrix of size b*b*3*a; b_2 is a vector of size 3;
其中R是殘差塊(residual blocks)其具有n層; Where R represents residual blocks, which have n layers;
其中,包含了很多神經網路層,每一層都各有其被訓練的加權參數其統稱為w_R; It contains many neural network layers, each with its own weighted parameters for training, collectively referred to as w_R ;
所使用的參數包括: The parameters used include:
a=128,b=8,c=4,d=0,σ=leaky relu with alpha=0.2;n=2; a=128,b=8,c=4,d=0, σ =leaky relu with alpha=0.2; n=2;
a=128,b=8,c=4,d=0,σ=leaky relu with alpha=0.2;n=6。 a=128,b=8,c=4,d=0,σ=leaky relu with alpha=0.2; n=6.
第二種情況:原圖影像是YUV420、輸出影像是RGB或是YUV444; The second scenario: The original image is YUV420, and the output image is RGB or YUV444;
如果輸入的原圖影像是YUV420、而輸出影像是RGB或是YUV444時,由於輸入與輸出影像的解析度及格式不同,殘差網路(Residual network)無法直接應用於此情況。本發明的方法會先解碼YUV420的輸入影像,然後使用另一神經網路(稱為A網路,其中N=3)來處理其解碼後的影像並獲得RGB或是YUV444格式的影像(稱為X2)。接著,此X2影像被送入第一種情況所述的神經網路(殘差網路)來進行訓練。並且,相同的訓練方法也應用在A網路上來比較X2和原圖影像之間的差異,藉此訓練A網路。 If the input image is YUV420 and the output image is RGB or YUV444, the residual network cannot be directly applied due to the difference in resolution and format between the input and output images. The method of this invention first decodes the YUV420 input image, and then uses another neural network (called network A, where N=3) to process the decoded image and obtain an RGB or YUV444 image (called X2). This X2 image is then fed into the neural network (residual network) described in the first case for training. Furthermore, the same training method is applied to network A to compare the differences between the X2 image and the original image, thereby training network A.
X_y是具YUV420格式之輸入影像的Y,其格式為unsigned int8; X_y is the Y variable of the input image in YUV420 format, with the format unsigned int8.
X_uv是具YUV420格式之輸入影像的uv,其格式為unsigned int8; X_uv is the UV value of an input image in YUV420 format, with the format unsigned int8.
; ;
; ;
X2=Conv2d(X2_y,3,e,1,w_y,b_y)+Conv2dTranspose(X2_uv,3,f,2,w_uv,b_uv); X2=Conv2d(X2_y,3,e,1, w _ y , b _ y )+Conv2dTranspose(X2_uv,3,f,2, w _uv, b _uv);
w_y是一矩陣其大小是e*e*1*3;b_y是一向量其大小為3; w_y is a matrix of size e*e*1*3; b_y is a vector of size 3;
w_uv是一矩陣其大小是f*f*3*2;b_uv是一向量其大小為3; w_uv is a matrix of size f*f*3*2; b_uv is a vector of size 3;
以上所述是A網路(神經網路編號A)的第一實施例; The above is the first implementation of network A (neural network number A);
最後,用於輸出該輸出影像的數學式和前述第一種情況當輸入與輸出影像都是RGB格式時所使用的數學式相同: Finally, the mathematical formula used to output the image is the same as the formula used in the first case mentioned above when both the input and output images are in RGB format:
Y=uint8((Conv2dTranspose(σ(Conv2d(X2,a,b,c,d,w_1,b_1)),w_2,b_2)+X2)*128+128); Y=uint8((Conv2dTranspose(σ(Conv2d(X2,a,b,c,d, w _1, b _1)), w _2, b _2)+X2)*128+128);
w_1是一矩陣其大小是b*b*3*a;b_1是一向量其大小為a; w_1 is a matrix of size b*b*3*a; b_1 is a vector of size a;
w_2是一矩陣其大小是b*b*3*a;b_2是一向量其大小為3; w_2 is a matrix of size b*b*3*a; b_2 is a vector of size 3;
所使用的參數也同樣和前述當輸入與輸出影像都是RGB格式時所使用的參數相同: The parameters used are the same as those used previously when both the input and output images are in RGB format:
X的解析度是1280x720; The resolution of X is 1280x720;
a=128,b=8,c=4,d=0,e=1,f=2,σ=leaky relu with alpha=0.2; a=128,b=8,c=4,d=0,e=1,f=2,σ=leaky relu with alpha=0.2;
a=128,b=8,c=4,d=0,e=1,f=2,σ=leaky relu with alpha=0.2。 a=128,b=8,c=4,d=0,e=1,f=2,σ=leaky relu with alpha=0.2.
請參閱圖八,其揭露了本發明之神經網路的訓練過程的一實施例,其中,原圖影像是YUV420、且輸出影像是RGB或YUV420。該神經網路的訓練過程包括以下步驟: Please refer to Figure 8, which illustrates an embodiment of the training process of the neural network of this invention, wherein the original image is YUV420 and the output image is RGB or YUV420. The training process of this neural network includes the following steps:
步驟440:在一訓練模式中執行該第一應用程式以產生複數個訓練原圖影像,其中,該些訓練原圖影像是RGB或YUV444格式;並且,對該些該訓練原圖影像進行解析度降低處理(步驟4401); Step 440: Execute the first application in a training mode to generate a plurality of training source images, wherein the training source images are in RGB or YUV444 format; and perform resolution reduction processing on the training source images (Step 4401);
步驟441:將降低解析度後的該些訓練原圖影像藉由該編碼器編碼成為複數個訓練編碼影像; Step 441: Encode the reduced-resolution original training images into multiple encoded training images using the encoder;
步驟442:藉由伺服器中的訓練解碼器將該些訓練編碼影像解碼成為複數個訓練解碼影像;其中該訓練解碼影像是YUV420格式; Step 442: The training encoded images are decoded into a plurality of training decoded images using a training decoder on the server; wherein the training decoded images are in YUV420 format;
步驟443:該人工神經網路模組包括一第一神經網路以及一第二神經網路;該第一神經網路(也稱為A網路)接受該些訓練解碼影像並使用至少一訓練數學運算式來逐一處理該訓練解碼影像(YUV420)以供產生複數個第一輸出影像X2(也稱為X2;如步驟444)其具有和該訓練原圖影像相同的編碼格式;該至少一訓練數學運算式包含複數個訓練加權參數; Step 443: The artificial neural network module includes a first neural network and a second neural network; the first neural network (also referred to as the A network) receives the training decoded images and processes each training decoded image (YUV420) using at least one training mathematical operation to generate a plurality of first output images X2 (also referred to as X2; as in step 444) having the same encoding format as the original training image; the at least one training mathematical operation contains a plurality of training weighting parameters;
步驟445:該第二神經網路是一卷積神經網路(Convolutional Neural Network;簡稱CNN);該第二神經網路(也稱為CNN網路)接受該第一輸出影像X2並使用至少一訓練數學運算式來逐一處理該第一輸出影像X2以供產生複數個第二輸出影像;該至少一訓練數學運算式包含複數個訓練加權參數;接著,該第一輸出影像X2和該第二輸出影像兩者被相加(步驟446)以產生該訓練輸出影像(步驟447); Step 445: The second neural network is a Convolutional Neural Network (CNN); the second neural network (also called a CNN network) receives the first output image X2 and processes the first output image X2 one by one using at least one training mathematical operation to generate a plurality of second output images; the at least one training mathematical operation contains a plurality of training weighting parameters; then, the first output image X2 and the second output image are added together (step 446) to generate the training output image (step 447);
該比較與訓練模組包含一第一比較器及一第二比較器;於步驟448中,該第一比較器比較該第一輸出影像X2與其相對應之訓練原圖影像之間的差異以供訓練該第一神經網路;於步驟449中,該第二比較器比較該訓 練輸出影像與其相對應之訓練原圖影像之間的差異以供訓練該第二神經網路。 The comparison and training module includes a first comparator and a second comparator. In step 448, the first comparator compares the difference between the first output image X2 and its corresponding training original image for training the first neural network. In step 449, the second comparator compares the difference between the training output image and its corresponding training original image for training the second neural network.
圖九是本發明處理具YUV420格式之解碼後的影像的程序的一實施例示意圖。本發明處理具YUV420格式之解碼後的影像的程序包括: Figure 9 is a schematic diagram of an embodiment of the procedure for processing images decoded in YUV420 format according to the present invention. The procedure for processing images decoded in YUV420 format according to the present invention includes:
步驟451:該第一神經網路接受並處理具YUV420顏色格式之訓練解碼影像的步驟包括: Step 451: The steps for the first neural network to accept and process training decoded images in YUV420 color format include:
步驟452:提取該訓練解碼影像中的Y-part資料,由具標準大小(原大小)的該神經網路來處理該訓練解碼影像的Y-part資料以產生具N通道的Y-part輸出資料(例如:步伐值Stride=1於卷積中;如步驟454); Step 452: Extract the Y-part data from the trained decoded image. Process the Y-part data of the trained decoded image using the neural network of standard size (original size) to generate N-channel Y-part output data (e.g., stride=1 in convolution; as in step 454);
步驟453:提取該訓練解碼影像中的UV-part資料,由具兩倍放大的神經網路來處理該訓練解碼影像的UV-part資料以產生具N通道的UV-part輸出資料(例如:步伐值Stride=2於轉置卷積中;如步驟455); Step 453: Extract the UV-part data from the trained decoded image. Process the UV-part data of the trained decoded image using a neural network with double magnification to generate N-channel UV-part output data (e.g., stride=2 in the transposed convolution; as in step 455);
步驟456:將該Y-part輸出資料與該UV-part輸出資料相加以產生該訓練輸出影像(步驟457)。 Step 456: Combine the Y-part output data with the UV-part output data to generate the training output image (Step 457).
第三種情況:原圖影像是YUV420、且輸出影像是YUV444,以另一種更快的方式處理。 The third scenario: The original image is YUV420, and the output image is YUV444, which is processed in a faster way.
如果輸入影像是YUV420、且輸出影像是YUV444,則除了前述的方法以外,還有另一種實施該第一神經網路(A網路)的方式,其是具有更快速度的特例。具YUV420格式的解碼後的影像首先利用第一神經網路(A網路)將其轉換為YUV444格式的影像(亦稱為X2);之後,X2被送入前述的神經網路(殘差網路)進行訓練。並且,相同的訓練方式也實施在A網路來比較X2與原圖影像之間的差異,藉以訓練A網路。 If the input image is YUV420 and the output image is YUV444, there is another, faster, way to implement the first neural network (A-network) in addition to the methods described above. The decoded YUV420 image is first converted to YUV444 format (also known as X2) using the first neural network (A-network); then, X2 is fed into the aforementioned neural network (residual network) for training. Furthermore, the same training method is implemented on the A-network to compare the differences between X2 and the original image, thereby training the A-network.
X_y是具YUV420格式之輸入影像的Y,其格式為unsigned int8; X_y is the Y variable of the input image in YUV420 format, with the format unsigned int8.
X_uv是具YUV420格式之輸入影像的uv,其格式為unsigned int8; X_uv is the UV value of an input image in YUV420 format, with the format unsigned int8.
; ;
; ;
X3_uv=Conv2dTranspose(X2_uv,2,2,2,w_uv,b_uv); X3_uv=Conv2dTranspose(X2_uv,2,2,2, w _ uv , b _ uv );
w_uv是一矩陣其大小是2*2*2*2;b_uv是一向量其大小為2; w_uv is a matrix of size 2*2*2*2; b_uv is a vector of size 2;
X2=concat(X2_y,X3_uv); X2 = concat(X2_y, X3_uv);
以上所述是A網路(神經網路編號A)的另一實施例,其中的“concat”函數是依循通道的方向連接該輸入; The above is another implementation of network A (neural network number A), where the "concat" function connects the input according to the direction of the channel;
最後,用於輸出該輸出影像的數學式和前述第一種情況當輸入與輸出影像都是RGB格式時所使用的數學式相同: Finally, the mathematical formula used to output the image is the same as the formula used in the first case mentioned above when both the input and output images are in RGB format:
Y=uint8((Conv2dTranspose(σ(Conv2d(X2,a,b,c,d,w_1,b_1)),w_2,b_2)+X2)*128+128); Y=uint8((Conv2dTranspose(σ(Conv2d(X2,a,b,c,d, w _1, b _1)), w _2, b _2)+X2)*128+128);
w_1是一矩陣其大小是b*b*3*a;b_1是一向量其大小為a; w_1 is a matrix of size b*b*3*a; b_1 is a vector of size a;
w_2是一矩陣其大小是b*b*3*a;b_2是一向量其大小為3; w_2 is a matrix of size b*b*3*a; b_2 is a vector of size 3;
所使用的參數也同樣和前述當輸入與輸出影像都是RGB格式時所使用的參數相同: The parameters used are the same as those used previously when both the input and output images are in RGB format:
X的解析度是1280x720; The resolution of X is 1280x720;
a=128,b=10,c=5,d=0,σ=leaky relu with alpha=0.2; a=128,b=10,c=5,d=0,σ=leaky relu with alpha=0.2;
a=128,b=9,c=5,d=4,σ=leaky relu with alpha=0.2; a=128,b=9,c=5,d=4,σ=leaky relu with alpha=0.2;
a=128,b=8,c=4,d=0,σ=leaky relu with alpha=0.2。 a=128,b=8,c=4,d=0,σ=leaky relu with alpha=0.2.
圖十是本發明處理具YUV420格式之解碼後的影像的程序的另一實施例示意圖。如圖十所示本發明處理具YUV420格式之解碼後的影像的程序包括: Figure 10 is a schematic diagram of another embodiment of the procedure for processing images decoded in YUV420 format according to the present invention. As shown in Figure 10, the procedure for processing images decoded in YUV420 format according to the present invention includes:
步驟461:該第一神經網路藉由以下步驟來接受並處理具YUV420顏色格式之訓練解碼影像,其中,該訓練解碼影像包括N通道,且N是大於2的正整數; Step 461: The first neural network receives and processes a training decoded image in YUV420 color format using the following steps, wherein the training decoded image includes N channels, and N is a positive integer greater than 2;
步驟462:提取該訓練解碼影像中的Y-part資料以產生Y-part輸出資料; Step 462: Extract the Y-part data from the trained decoded image to generate the Y-part output data;
步驟463:提取該訓練解碼影像中的UV-part資料,並使用具兩倍放大的第一神經網路來處理該訓練解碼影像的UV-part資料以產生具N-1通道的UV-part輸出資料(例如:步伐值Stride=2於轉置卷積中;如步驟464); Step 463: Extract the UV-part data from the trained decoded image and process it using a first neural network with double amplification to generate UV-part output data with N-1 channels (e.g., stride=2 in the transposed convolution; as in step 464);
步驟465:以合併函數Concat(concatenates)處理該Y-part資料及該UV-part資料以產生該訓練輸出影像(步驟466)。 Step 465: Process the Y-part and UV-part data using the concat(concatenates) function to generate the training output image (Step 466).
第四種情況:原圖影像是YUV420、且輸出影像也是YUV420。 The fourth scenario: The original image is YUV420, and the output image is also YUV420.
如果輸入影像是YUV420、且輸出影像也是YUV420,則處理方式將類似前述RGB-to-RGB的方式。然而,由於輸入格式和輸出格式不同,所以不同的卷積方法會應用在不同通道上。例如,當神經網路的核心大小為8x8、步伐值stride為4來處理影像的Y-part時,則該神經網路可改成核心大小為4x4、步伐值stride為2來處理影像的UV-part。 If both the input and output images are YUV420, the processing will be similar to the aforementioned RGB-to-RGB approach. However, due to the different input and output formats, different convolution methods will be applied to different channels. For example, when processing the Y-part of an image using a neural network with a kernel size of 8x8 and a stride of 4, the neural network can be modified to have a kernel size of 4x4 and a stride of 2 to process the UV-part of the image.
X_y是具YUV420格式之輸入影像的Y,其格式為unsigned int8; X_y is the Y variable of the input image in YUV420 format, with the format unsigned int8.
X_uv是具YUV420格式之輸入影像的uv,其格式為unsigned int8; X_uv is the UV value of an input image in YUV420 format, with the format unsigned int8.
; ;
; ;
X3=σ(Conv2d(X2_y,a,b,c,w_y,b_y)+Conv2d(X2_uv,a,b/2,c/2,w_uv,b_uv)); X3=σ(Conv2d(X2_y,a,b,c, w _ y , b _ y )+Conv2d(X2_uv,a,b/2,c/2, w _ uv , b _ uv ));
w_y是一矩陣其大小是b*b*1*a;b_y是一向量其大小為a; w_y is a matrix of size b*b*1*a ; b_y is a vector of size a;
w_uv是一矩陣其大小是(b/2)*(b/2)*2*a;b_uv是一向量其大小為a; w_uv is a matrix of size (b/2)*(b / 2)*2*a; b_uv is a vector of size a;
X4_y=Conv2dTranspose(X3,1,b,c,w_1,b_1)+X2_y; X4_y=Conv2dTranspose(X3,1,b,c, w _ 1 , b _ 1 )+X2_y;
X4_uv=Conv2dTranspose(X3,2,b/2,c/2,w_2,b_2)+X2_uv; X4_uv=Conv2dTranspose(X3,2,b/2,c/2, w _ 2 ,b_2)+X2_uv;
w_1是一矩陣其大小是b*b*1*a;b_1是一向量其大小為1; w_1 is a matrix of size b*b*1*a; b_1 is a vector of size 1;
w_2是一矩陣其大小是(b/2)*(b/2)*2*a;b_2是一向量其大小為2; w_2 is a matrix with size (b/2)*(b/2)*2*a; b_2 is a vector with size 2;
以上所述是A網路(神經網路編號A)的另一實施例,其中的“concat”函數是依循通道的方向連接該輸入; The above is another implementation of network A (neural network number A), where the "concat" function connects the input according to the direction of the channel;
最後輸出: Final output:
Y_y=uint8(X4_y*128+128); Y_y = uint8(X4_y * 128 + 128);
Y_uv=uint8(X4_uv*128+128); Y_uv = uint8(X4_uv * 128 + 128);
使用的參數: Parameters used:
a=128,b=8,c=4,d=0,e=2,f=2,σ=leaky relu with alpha=0.2。 a=128,b=8,c=4,d=0,e=2,f=2,σ=leaky relu with alpha=0.2.
本發明所使用的參數的詳細說明如下: The parameters used in this invention are described in detail below:
訓練參數: Training parameters:
該加權參數的初始值是根據高斯分布(Gaussian distribution),mean=0、stddev=0.02; The initial values of this weighting parameter are based on a Gaussian distribution: mean = 0, stddev = 0.02.
在訓練過程中使用Adam演算法,學習率learning rate=1e-4,beta1=0.9; The Adam algorithm was used during training, with a learning rate of 1e-4 and beta1 of 0.9.
微批次大小mini batch size=1; Mini batch size = 1;
主要差異函數(primary error function)是: The primary error function is:
100*(L2+L2e)+λ *L1+γ * D+α *Lg; 100*(L2+L2e)+λ *L1+γ * D+α *Lg;
所使用的參數其標準值為: The standard values for the parameters used are:
λ=0,γ=0,α=0; λ=0, γ=0, α=0;
λ=0,γ=0,α=100; λ=0, γ=0, α=100;
λ=0,γ=1,α=0; λ=0, γ=1, α=0;
λ=10,γ=0,α=0; λ=10, γ=0, α=0;
λ=10,γ=0,α=100; λ=10, γ=0, α=100;
λ=10,γ=1,α=0; λ=10, γ=1, α=0;
其中: in:
L2=mean((T-Y)2);其中mean是指平均值,T是訓練標的; L2 = mean (( T - Y ) ^2 ); where mean refers to the average value and T is the training target.
L1=mean(|T-Y|);其中mean是指平均值,T是訓練標的; L1 = mean (| T - Y |); where mean refers to the average value and T is the training target.
D是生成對抗網路損失(GAN loss),使用一般GAN訓練方法來訓練鑑別器(Discriminator)以分辨(X,Y)與(X,T); D is the Generative Adversarial Network (GAN) loss, which uses general GAN training methods to train the discriminator to distinguish between (X,Y) and (X,T).
Lg的數學式是: The mathematical expression for Lg is:
對於WxH的影像而言, For WxH images,
Y_dx(i,j)=Y(i+1,j)-Y(i,j)0<=i<W-1,0<=j<H Y_dx(i,j)=Y(i+1,j)-Y(i,j)0<=i<W-1,0<=j<H
T_dx(i,j)=T(i+1,j)-T(i,j)0<=i<W-1,0<=j<H T_dx(i,j)=T(i+1,j)-T(i,j)0<=i<W-1,0<=j<H
Y_dy(i,j)=Y(i,j+1)-Y(i,j)0<=i<W,0<=j<H-1 Y_dy(i,j)=Y(i,j+1)-Y(i,j)0<=i<W,0<=j<H-1
T_dy(i,j)=T(i,j+1)-T(i,j)0<=i<W,0<=j<H-1 T_dy(i,j)=T(i,j+1)-T(i,j)0<=i<W,0<=j<H-1
L g =mean((T dx -Y dx )2)+mean((T dy -Y dy )2) L g = mean (( T dx - Y dx ) 2 )+ mean (( T dy - Y dy ) 2 )
在RGB模式下,所述的訓練標的T就是RGB遊戲影像的原始原圖影像; In RGB mode, the training target T is the original image of the RGB game video;
在YUV444模式下,訓練標的T就是RGB遊戲影像的原始原圖影像; In YUV444 mode, the training target T is the original image of the RGB game video;
在RGB->RGB、以及YUV420->YUV420模式下,L2e=0; In RGB->RGB and YUV420->YUV420 modes, L2e=0;
在YUV420->RGB及YUV420->YUV444模式下, In YUV420->RGB and YUV420->YUV444 modes,
L 2e =mean((T-X 2)2)。 L 2 e = mean (( T - X 2 ) 2 ).
由上述說明可知,本發明的方法具有以下優點: As can be seen from the above description, the method of this invention has the following advantages:
能根據具有不同內容的各種影像隨時保持對神經網絡的訓練,以便對不同的影像內容執行不同的增強效果;例如,對於具有卡通風格、現實風格或不同場景等的影像,不同的加權參數w、b可以預先存儲在客戶端裝置中、或者動態下載到客戶端裝置中; It can continuously train the neural network based on various images with different content, so as to apply different enhancement effects to different image content; for example, for images with cartoon style, realistic style, or different scenes, different weighting parameters w and b can be pre-stored in the client device or dynamically downloaded to the client device;
關於判斷原圖影像應屬於哪種模式的方式,伺服器端的神經網絡可以自動判定原圖影像的模式,並將此類訊息傳輸到客戶端裝置;因為原圖影像的內容具有一致性,所以這種判定過程可以由伺服器週期性地執行,例如每秒一次;可是,在另一實施例中,判定影像模式的過程也可以由客戶端裝置週期性地執行,例如每數秒執行一次,視客戶端裝置的運算能力而定; Regarding the method for determining the mode of the original image, the server-side neural network can automatically determine the mode of the original image and transmit this information to the client device. Because the content of the original image is consistent, this determination process can be performed periodically by the server, for example, once per second. However, in another embodiment, the process of determining the image mode can also be performed periodically by the client device, for example, once every few seconds, depending on the computing power of the client device.
訓練是根據真實視頻影像進行,可以實際測量增強的提高程度;例如,當使用本發明的方法來增強解析度為1280x720和比特率(bitrate)為3000的視頻影像時,類似場景的PSNR值可以增加1.5~2.2db左右,此可證明本發明的方法確實能真實地提高輸出影像的品質,並使輸出影像在視覺上更接近原圖影像的品質;並且,本發明不同於那些眾所周知的影像增強技術,它們只能增加輸出影像的對比度、平滑和濾色,而無法像本發明般使輸出影像在視覺上更接近於原圖影像; The training is conducted using real video footage, allowing for the actual measurement of the enhancement improvement. For example, when using the method of this invention to enhance a 1280x720 resolution video with a bitrate of 3000, the PSNR value of a similar scene can increase by approximately 1.5 to 2.2 dB. This proves that the method of this invention can indeed effectively improve the quality of the output image and make it visually closer to the quality of the original image. Furthermore, this invention differs from well-known image enhancement techniques, which can only increase the contrast, smoothness, and color filtering of the output image, but cannot make the output image visually closer to the original image like this invention.
藉由神經網絡演算法的簡化模型,並利用大核心、大步伐值,使神經網絡的分辨率迅速降低,且模型的處理速度可以大大提高;即使是計算能力有限的客戶端裝置也可以達到60fps和HD解析度的輸出影像的目標;以及 By employing a simplified model of the neural network algorithm and utilizing large cores and large stride values, the resolution of the neural network is rapidly reduced, while the processing speed of the model can be significantly improved; even client devices with limited computing power can achieve the goal of 60fps and HD resolution output video; and
藉由將顏色格式(YUV420和RGB)的轉換工作帶入神經網絡、 並利用UV通道的分辨率低於Y通道的優勢,將UV通道的步伐值設置為Y通道的一半,可提高神經網絡的計算速度。 By incorporating color format conversion (YUV420 and RGB) into the neural network, and leveraging the lower resolution of the UV channel compared to the Y channel by setting the UV channel's step value to half that of the Y channel, the neural network's computational speed can be improved.
圖十一A是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第二實施例示意圖,其包括下列步驟: Figure 11A is a schematic diagram of a second embodiment of the method of the present invention for reducing the network bandwidth required for video streaming using an artificial intelligence processing module, which includes the following steps:
步驟711:在一伺服器701中執行一第一應用程式。該第一應用程式依據至少一指令來產生具高解析度的複數個原圖影像。這些原圖影像的解析度可以是4K解析度或更高(以下亦稱為第二解析度)。所述的至少一指令是由客戶端裝置702所產生並透過網路傳送給伺服器701。 Step 711: A first application is executed on a server 701. This first application generates a plurality of high-resolution original images according to at least one instruction. The resolution of these original images may be 4K or higher (hereinafter also referred to as a second resolution). The at least one instruction is generated by a client device 702 and transmitted to the server 701 via a network.
步驟712:於伺服器701中使用一習知的抽樣法來降低原圖影像的解析度以獲得具低解析度(例如1080i、720p或更低,以下亦稱為第一解析度)的來源影像,第一解析度是低於第二解析度。 Step 712: On server 701, a known sampling method is used to reduce the resolution of the original image to obtain a source image with a low resolution (e.g., 1080i, 720p, or lower, hereinafter referred to as the first resolution). The first resolution is lower than the second resolution.
步驟713:於伺服器701中使用一編碼器對該些來源影像進行編碼與壓縮以產生相對應的複數個編碼後影像。 Step 713: On server 701, an encoder is used to encode and compress the source images to generate a corresponding plurality of encoded images.
步驟714:由伺服器701依據來自客戶端裝置702的指令,將這些編碼後影像以一2D影像串流(步驟304)的型式經由網路傳送給客戶端裝置702。由於影像在被傳送給客戶端裝置已經先被降低解析度,所以傳送影像串流所需的網路頻寬也因此降低。 Step 714: Server 701, based on instructions from client device 702, transmits the encoded images to client device 702 via the network as a 2D video stream (Step 304). Since the images are downscaled before being transmitted to the client device, the network bandwidth required for transmitting the video stream is also reduced.
步驟715:客戶端裝置702接受這些編碼後影像並將其解碼成相對應的複數個解碼後影像。 Step 715: Client device 702 receives these encoded images and decodes them into a corresponding plurality of decoded images.
於本發明中,客戶端裝置702包含一AI處理模組,其包括預設的至少一數學運算式。該至少一數學運算式包括複數個加權參數。該些加權參數是藉由一訓練伺服器的一人工神經網路模組的一訓練模式來預先定義。於客戶端裝置702執行一第二應用程式,其是和第一應用程式相關聯及配合,以供使用者操作客戶端裝置702來產生該指令。客戶端裝置702透過網路將指令傳送給伺服器701、以及自伺服器接收依據該指令所產生的編碼後影像。 In this invention, client device 702 includes an AI processing module comprising at least one preset mathematical expression. The at least one mathematical expression includes a plurality of weighted parameters. These weighted parameters are predefined using a training mode of an artificial neural network module of a training server. A second application, associated with and cooperating with the first application, is executed on client device 702 to allow a user to operate client device 702 to generate the instruction. Client device 702 transmits the instruction to server 701 via a network and receives from the server the encoded image generated according to the instruction.
於本實施例中,該至少一數學運算式包括一第一預設的AI運算式以及一第二預設的AI運算式。該第一預設的AI運算式包括複數個第一加權參數。該第二預設的AI運算式包括複數個第二加權參數。該第一預設的AI運算式搭配複數個該第一加權參數可用於提高影像的解析度,藉此,由 該第一預設的AI運算式搭配複數個該第一加權參數所處理過的影像的解析度可以由該第一解析度提高到該第二解析度。該第二預設的AI運算式搭配複數個該第二加權參數可用於增強影像的品質,藉此,由該第二預設的AI運算式搭配複數個該第二加權參數所處理過的影像的品質比該解碼後影像的品質更高、且更接近於原圖影像的品質。 In this embodiment, the at least one mathematical operation includes a first preset AI operation and a second preset AI operation. The first preset AI operation includes a plurality of first weighting parameters. The second preset AI operation includes a plurality of second weighting parameters. The first preset AI operation, combined with the plurality of the first weighting parameters, can be used to improve the resolution of the image, thereby increasing the resolution of the image processed by the first preset AI operation with the plurality of the first weighting parameters from the first resolution to the second resolution. The second preset AI operation, combined with the plurality of the second weighting parameters, can be used to enhance the image quality, thereby improving the quality of the image processed by the second preset AI operation with the plurality of the second weighting parameters to a higher quality than the decoded image and closer to the quality of the original image.
步驟716:當該客戶端裝置702將所接收到的複數個該編碼後影像解碼成相對應的複數個解碼後影像以後,該客戶端裝置先使用該第一預設的AI運算式及複數個該第一加權參數來處理複數個該解碼後影像以產生相對應的具第二解析度的複數個解析度提升影像。接著,於步驟717中,該客戶端裝置702使用該第二預設的AI運算式及複數個該第二加權參數來處理複數個該解析度提升影像以產生具高影像品質且具該第二解析度的複數個該高解析度影像。之後,如步驟718,客戶端裝置702將該些高解析度影像做為輸出影像並輸出至螢幕(顯示器)上。 Step 716: After the client device 702 decodes the received plurality of encoded images into a corresponding plurality of decoded images, the client device first uses the first preset AI formula and a plurality of the first weighting parameters to process the plurality of decoded images to generate a corresponding plurality of resolution-upgraded images with a second resolution. Next, in step 717, the client device 702 uses the second preset AI formula and a plurality of the second weighting parameters to process the plurality of resolution-upgraded images to generate a plurality of high-resolution images with high image quality and the second resolution. Then, as in step 718, the client device 702 outputs these high-resolution images as output images to the screen (display).
其中,該第一預設的AI運算式的第一加權參數是藉由分析具低解析度之來源影像與相對應之原圖影像之間的差異的方式來預先定義,以使得該解析度提升影像在視覺上更接近於原圖影像而非來源影像。並且,該第二預設的AI運算式的第二加權參數是藉由分析該解碼後影像與相對應之原圖影像之間的差異的方式來預先定義,以使得該高解析度影像在視覺上更接近於原圖影像而非解碼後影像。 The first weighting parameter of the first preset AI algorithm is predefined by analyzing the difference between the low-resolution source image and the corresponding original image, so that the upgraded image visually resembles the original image rather than the source image. Similarly, the second weighting parameter of the second preset AI algorithm is predefined by analyzing the difference between the decoded image and the corresponding original image, so that the high-resolution image visually resembles the original image rather than the decoded image.
圖十一B是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第三實施例示意圖。由於圖十一B所示大部分步驟都和圖十一A所示相同,所以相同或類似的步驟將給予相同的編號且不贅述其細節。於圖十一B所示實施例中,執行於伺服器701的第一應用程式產生具第一解析度的複數來源影像(步驟719),換言之,伺服器701會直接產生低解析度的來源影像,所以無須另執行降低解析度程序。之後,這些來源影像會依據相同於圖十一A所述之步驟713~718被處理。由於伺服器701是直接產生低解析度的來源影像,其所需消耗的運算資源比產生高解析度原圖影像更低;所以,除了如同圖十一A所述實施例具有的節省網路頻寬的好處外,圖十一B所示的實施例還兼具有能夠節省伺服器的運算資源的優勢。 Figure 11B is a schematic diagram of a third embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module. Since most of the steps shown in Figure 11B are the same as those shown in Figure 11A, identical or similar steps will be given the same number and their details will not be elaborated. In the embodiment shown in Figure 11B, the first application running on server 701 generates a plurality of source images with a first resolution (step 719). In other words, server 701 directly generates low-resolution source images, so there is no need to execute a separate resolution reduction procedure. These source images are then processed according to steps 713-718 as described in Figure 11A. Since server 701 directly generates low-resolution source images, it consumes fewer computing resources than generating high-resolution original images. Therefore, in addition to the network bandwidth saving advantage of the embodiment described in Figure 11A, the embodiment shown in Figure 11B also has the advantage of saving server computing resources.
圖十二A是本發明利用人工智慧處理模組來降低影像串流所需 網路頻寬的方法的第四實施例示意圖。由於圖十二A所示大部分步驟都和圖十一A與圖十一B所示相同,所以相同或類似的步驟將給予相同的編號且不贅述其細節。於圖十二A所示實施例中,在伺服器701中執行的第一應用程式產生具第二解析度的複數個原圖影像(步驟711)。這些原圖影像接著被降低解析度處理,成為相對應具有第一解析度的複數個來源影像(步驟712)。之後,這些來源影像被編碼(步驟713)成編碼後影像並傳送給客戶端裝置702(步驟714)。客戶端裝置702把接收到的編碼後影像進行解碼成為解碼後影像(步驟715)。然後,於圖十二A所示實施例的步驟717中,該客戶端裝置702先使用該第二預設的AI運算式及複數個該第二加權參數來處理複數個該解碼後影像以產生具高影像品質但解析度仍為第一解析度的複數個品質提升影像。接著,該客戶端裝置702使用該第一預設的AI運算式及複數個該第一加權參數來處理複數個該品質提升影像以產生具該第二解析度且具高影像品質的複數個該高解析度影像。之後,如步驟718,客戶端裝置702將該些高解析度影像做為輸出影像並輸出至螢幕上。 Figure 12A is a schematic diagram of a fourth embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module. Since most of the steps shown in Figure 12A are the same as those shown in Figures 11A and 11B, identical or similar steps will be given the same number and their details will not be elaborated. In the embodiment shown in Figure 12A, a first application running on server 701 generates a plurality of original images with a second resolution (step 711). These original images are then processed to reduce their resolution, becoming a plurality of corresponding source images with a first resolution (step 712). Afterwards, these source images are encoded (step 713) into encoded images and transmitted to client device 702 (step 714). Client device 702 decodes the received encoded image into a decoded image (step 715). Then, in step 717 of the embodiment shown in FIG12A, client device 702 first processes the plurality of decoded images using the second preset AI formula and a plurality of the second weighting parameters to generate a plurality of quality-enhanced images with high image quality but still at the first resolution. Next, client device 702 processes the plurality of quality-enhanced images using the first preset AI formula and a plurality of the first weighting parameters to generate a plurality of high-resolution images with the second resolution and high image quality. Then, as in step 718, client device 702 outputs these high-resolution images as output images to the screen.
圖十二B是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第五實施例示意圖。由於圖十二B所示大部分步驟都和圖十二A及圖十一B所示相同,所以相同或類似的步驟將給予相同的編號且不贅述其細節。於圖十二B所示實施例中,執行於伺服器701的第一應用程式產生具第一解析度的複數來源影像(步驟719),換言之,伺服器701會直接產生低解析度的來源影像,所以無須另執行降低解析度程序。之後,這些來源影像會依據相同於圖十二A所述之步驟713~718被處理。 Figure 12B is a schematic diagram of a fifth embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module. Since most of the steps shown in Figure 12B are the same as those shown in Figures 12A and 11B, identical or similar steps will be given the same number and their details will not be elaborated. In the embodiment shown in Figure 12B, the first application running on server 701 generates a plurality of source images with a first resolution (step 719). In other words, server 701 directly generates low-resolution source images, so there is no need to execute a separate resolution reduction procedure. These source images are then processed according to steps 713-718 as described in Figure 12A.
圖十三是本發明所述AI處理模組的第一預設的AI運算式及第一加權參數的訓練方式的一實施例示意圖。於本發明中,客戶端裝置702內的AI處理模組中的第一預設的AI運算式及複數第一加權參數是藉由在該訓練伺服器上執行一人工神經網路的訓練程序所預先定義。當訓練完成後,第一預設的AI運算式及複數第一加權參數會被應用在客戶端裝置702的AI處理模組中以執行如圖十一A、十一B、十二A、十二B所示之步驟716所述的AI提升解析度步驟。在訓練伺服器中訓練第一預設的AI運算式及複數第一加權參數的步驟包括: Figure 13 is a schematic diagram of an embodiment of the training method for the first preset AI operation and the first weighted parameters of the AI processing module described in this invention. In this invention, the first preset AI operation and the complex first weighted parameters in the AI processing module within the client device 702 are predefined by executing a training program for an artificial neural network on the training server. After training is completed, the first preset AI operation and the complex first weighted parameters are applied to the AI processing module of the client device 702 to perform the AI resolution enhancement step as shown in step 716 of Figures 11A, 11B, 12A, and 12B. The steps for training the first preset AI operation and the complex first weighted parameters on the training server include:
步驟7161:在該訓練伺服器中啟用一訓練模式以產生複數個訓 練原圖影像(步驟7162);複數個該訓練原圖影像具有該第二解析度(高解析度)。 Step 7161: Enable a training mode on the training server to generate a plurality of training source images (Step 7162); the plurality of training source images have the second resolution (high resolution).
步驟7163:執行一解析度降低程序,將複數個該訓練原圖影像的解析度由該第二解析度降低至該第一解析度,以產生具第一解析度的複數個訓練低解析度影像(步驟7164)。 Step 7163: Perform a resolution reduction procedure to reduce the resolution of a plurality of the training source images from the second resolution to the first resolution, thereby generating a plurality of low-resolution training images with the first resolution (Step 7164).
步驟7165:由該人工神經網路模組接受並使用一第一訓練運算式來逐一處理複數個該訓練低解析度影像以產生相對應之具該第二解析度的複數個訓練輸出影像(步驟7166);該第一訓練運算式具有複數個第一訓練加權參數。 Step 7165: The artificial neural network module receives and uses a first training operation to process a plurality of the training low-resolution images one by one to generate a plurality of corresponding training output images with the second resolution (Step 7166); the first training operation has a plurality of first training weighting parameters.
步驟7167:使用一比較模組來逐一比較複數個該訓練輸出影像和相對應的複數個該訓練原圖影像之間的差異,並據以調整該第一訓練運算式的該些第一訓練加權參數。該些第一訓練加權參數會被調整成可讓該訓練輸出影像與相對應之該訓練原圖影像之間的差異最小化。每一次當該些第一訓練加權參數被調整後,調整後的該些第一訓練加權參數就會被回饋給該第一訓練運算式以供處理下一個該訓練低解析度影像。在進行過預定數量的該訓練輸出影像與相對應之該訓練原圖影像的比較、以及預定次數的複數個該第一訓練加權參數的調整程序後,最後所得到的該些第一訓練加權參數會被應用在客戶端裝置702的AI處理模組內來作為其至少一該數學運算式的複數個該加權參數,以執行如圖十一A、十一B、十二A、十二B所示之步驟716所述的AI提升解析度步驟。 Step 7167: Use a comparison module to compare the differences between a plurality of the training output images and the corresponding plurality of the training source images one by one, and adjust the first training weighting parameters of the first training algorithm accordingly. These first training weighting parameters are adjusted to minimize the differences between the training output images and the corresponding training source images. Each time the first training weighting parameters are adjusted, the adjusted first training weighting parameters are fed back to the first training algorithm for processing the next training low-resolution image. After comparing a predetermined number of the training output images with their corresponding original training images, and adjusting a predetermined number of the plurality of the first training weighting parameters, the resulting first training weighting parameters are applied in the AI processing module of the client device 702 as a plurality of weighting parameters for at least one mathematical expression to perform the AI resolution enhancement step described in step 716 as shown in Figures 11A, 11B, 12A, and 12B.
於本實施例中,對於客戶端裝置702之AI處理模組的第二預設的AI運算式及第二加權參數的訓練方式是和圖四、圖五、或圖六所示之前述人工神經網路模組105的訓練方式相同。當訓練完成後,所得到的該些第二訓練加權參數會被應用在客戶端裝置702的AI處理模組內來作為其至少一該數學運算式的複數個該加權參數,以執行如圖十一A、十一B、十二A、十二B所示之步驟717所述的AI增強影像品質的步驟。 In this embodiment, the training method for the second preset AI formula and the second weighting parameters of the AI processing module of the client device 702 is the same as the training method for the aforementioned artificial neural network module 105 shown in Figures 4, 5, or 6. After training is completed, the obtained second training weighting parameters are applied within the AI processing module of the client device 702 as a plurality of weighting parameters for at least one of the mathematical formulas to perform the AI-enhanced image quality steps described in step 717 as shown in Figures 11A, 11B, 12A, and 12B.
圖十四A是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第六實施例示意圖。由於圖十四A所示大部分步驟都和圖十一A所示相同,所以相同或類似的步驟將給予相同的編號且不贅述其細節。於圖十四A所示實施例中,在伺服器701中執行的第一應用程式產生具 第二解析度的複數個原圖影像(步驟711)。這些原圖影像接著被降低解析度處理,成為相對應具有第一解析度的複數個來源影像(步驟712)。之後,這些來源影像被編碼(步驟713)成編碼後影像並傳送給客戶端裝置702(步驟714)。客戶端裝置702把接收到的編碼後影像進行解碼成為解碼後影像(步驟715)。於本實施例中,該第一預設的AI運算式、該第二預設的AI運算式、複數個該第一加權參數、以及複數個該第二加權參數全部都包含在該客戶端裝置702的同一個該AI處理模組內,以供把複數個該解碼後影像直接處理成具高影像品質且具該第二解析度的複數個該高解析度影像。所以,於步驟720中,客戶端裝置702的AI處理模組接受並使用該第一預設的AI運算式、該第二預設的AI運算式、複數個該第一加權參數、以及複數個該第二加權參數來處理該些解碼後影像以產生相對應具第二解析度的複數個該高解析度影像。之後,如步驟718,客戶端裝置702將該些高解析度影像做為輸出影像並輸出至螢幕上。 Figure 14A is a schematic diagram of a sixth embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module. Since most of the steps shown in Figure 14A are the same as those shown in Figure 11A, identical or similar steps will be given the same number and their details will not be elaborated. In the embodiment shown in Figure 14A, a first application running on server 701 generates a plurality of original images with a second resolution (step 711). These original images are then processed to reduce their resolution, becoming a plurality of corresponding source images with a first resolution (step 712). Afterwards, these source images are encoded (step 713) into encoded images and transmitted to client device 702 (step 714). The client device 702 decodes the received encoded image into a decoded image (step 715). In this embodiment, the first preset AI formula, the second preset AI formula, the plurality of first weighting parameters, and the plurality of second weighting parameters are all included in the same AI processing module of the client device 702, so as to directly process the plurality of decoded images into a plurality of high-resolution images with high image quality and the second resolution. Therefore, in step 720, the AI processing module of the client device 702 accepts and uses the first preset AI formula, the second preset AI formula, the plurality of first weighting parameters, and the plurality of second weighting parameters to process the decoded images to generate a plurality of high-resolution images with corresponding second resolution. Then, as in step 718, client device 702 outputs these high-resolution images as output images to the screen.
圖十四B是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第七實施例示意圖。由於圖十四B所示大部分步驟都和圖十四A及圖十一B所示相同,所以相同或類似的步驟將給予相同的編號且不贅述其細節。於圖十四B所示實施例中,執行於伺服器701的第一應用程式產生具第一解析度的複數來源影像(步驟719),換言之,伺服器701會直接產生低解析度的來源影像,所以無須另執行降低解析度程序。之後,這些來源影像會依據相同於圖十四A所述之步驟713、714、715、720及718被處理。 Figure 14B is a schematic diagram of a seventh embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module. Since most of the steps shown in Figure 14B are the same as those shown in Figures 14A and 11B, identical or similar steps will be given the same number and their details will not be elaborated. In the embodiment shown in Figure 14B, the first application running on server 701 generates a plurality of source images with a first resolution (step 719). In other words, server 701 directly generates low-resolution source images, so there is no need to execute a separate resolution reduction procedure. These source images are then processed according to steps 713, 714, 715, 720, and 718 as described in Figure 14A.
圖十五是本發明所述AI處理模組的第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數的訓練方式的一實施例示意圖。於本發明中,客戶端裝置702內的AI處理模組中的第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數是藉由在該訓練伺服器上執行一人工神經網路的訓練程序所預先定義。當訓練完成後,第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數被應用在客戶端裝置702的AI處理模組中以執行如圖十四A及圖十四B所示之步驟720所述的AI提升解析度+增強的步驟。在訓練伺服器中訓練第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數的步驟包括: Figure 15 is an embodiment of the training method for the first preset AI algorithm, the second preset AI algorithm, the first weighting parameter, and the second weighting parameter of the AI processing module described in this invention. In this invention, the first preset AI algorithm, the second preset AI algorithm, the first weighting parameter, and the second weighting parameter in the AI processing module within the client device 702 are predefined by executing a training program for an artificial neural network on the training server. After training is completed, the first preset AI algorithm, the second preset AI algorithm, the first weighting parameter, and the second weighting parameter are applied in the AI processing module of the client device 702 to perform the AI resolution enhancement step as shown in step 720 of Figures 14A and 14B. The steps for training the first preset AI algorithm, the second preset AI algorithm, the first weighted parameter, and the second weighted parameter on the training server include:
步驟7201:在該訓練伺服器中啟用一訓練模式以產生複數個訓練原圖影像(步驟7202);複數個該訓練原圖影像具有該第二解析度(高解析度)。 Step 7201: Enable a training mode on the training server to generate a plurality of training source images (Step 7202); the plurality of training source images have the second resolution (high resolution).
步驟7203:執行一解析度降低程序,將複數個該訓練原圖影像的解析度由該第二解析度降低至該第一解析度,以產生具該第一解析度的複數個訓練低解析度影像(步驟7204)。 Step 7203: Perform a resolution reduction procedure to reduce the resolution of a plurality of the training source images from the second resolution to the first resolution, thereby generating a plurality of low-resolution training images with the first resolution (Step 7204).
步驟7205:執行一編碼程序,藉由訓練伺服器內的一編碼器來把複數個該訓練低解析度影像編碼成相對應的複數個訓練編碼後影像。 Step 7205: Execute an encoding procedure using an encoder within the training server to encode a plurality of the training low-resolution images into a corresponding plurality of trained encoded images.
步驟7206:執行一解碼程序,藉由訓練伺服器內的一解碼器來把複數個該訓練編碼後影像解碼成相對應的複數個訓練解碼後影像;複數個該訓練解碼後影像具有該第一解析度。 Step 7206: Execute a decoding procedure using a decoder within the training server to decode the plurality of trained encoded images into a corresponding plurality of trained decoded images; the plurality of trained decoded images have the first resolution.
步驟7207:由該人工神經網路模組接受並使用一第一訓練運算式以及一第二訓練運算式來逐一處理複數個該訓練解碼後影像以產生相對應之具該第二解析度的複數個訓練輸出影像(步驟7208)。該第一訓練運算式具有複數個第一訓練加權參數。該第二訓練運算式具有複數個第二訓練加權參數。 Step 7207: The artificial neural network module receives and uses a first training operation and a second training operation to process a plurality of the trained and decoded images one by one to generate a plurality of corresponding trained output images with the second resolution (Step 7208). The first training operation has a plurality of first training weighting parameters. The second training operation has a plurality of second training weighting parameters.
步驟7209:使用一比較模組來逐一比較複數個該訓練輸出影像和相對應的複數個該訓練原圖影像之間的差異,並據以調整該第一訓練運算式的該些第一訓練加權參數以及該第二訓練運算式的該些第二訓練加權參數。該些第一訓練加權參數以及該些第二訓練加權參數會被調整成可讓該訓練輸出影像與相對應之該訓練原圖影像之間的差異最小化。每一次當該些第一訓練加權參數以及該些第二訓練加權參數被調整後,調整後的該些第一訓練加權參數以及該些第二訓練加權參數就會被回饋給該第一訓練運算式以及該第二訓練運算式以供處理下一個該訓練低解析度影像。在進行過預定數量的該訓練輸出影像與相對應之該訓練原圖影像的比較、以及預定次數的複數個該第一訓練加權參數以及該些第二訓練加權參數的調整程序後,最後所得到的該些第一訓練加權參數以及該些第二訓練加權參數會被應用在該客戶端裝置的該AI處理模組內來作為其至少一該數學運算式所包含的該第一訓練運算式以及該第二訓練運算式的加權參數,以供執行如圖十四A及圖十四B中的步驟720所述的AI提升解析度+增強影像品質的步 驟。 Step 7209: Use a comparison module to compare the differences between a plurality of the training output images and the corresponding plurality of the training source images one by one, and adjust the first training weighting parameters of the first training equation and the second training weighting parameters of the second training equation accordingly. The first training weighting parameters and the second training weighting parameters are adjusted to minimize the differences between the training output images and the corresponding training source images. Each time the first training weighting parameters and the second training weighting parameters are adjusted, the adjusted first training weighting parameters and the second training weighting parameters are fed back to the first training algorithm and the second training algorithm to process the next low-resolution training image. After comparing a predetermined number of training output images with their corresponding original training images, and adjusting a predetermined number of the first and second training weighting parameters, the final first and second training weighting parameters are applied within the AI processing module of the client device as weighting parameters for at least one mathematical operation, specifically the first and second training operations, to execute step 720 of Figures 14A and 14B, which involves AI resolution enhancement and image quality improvement.
於本發明的一較佳實施例中,客戶端裝置702的AI處理模組僅包含單一組的AI運算式及複數個加權參數,其是藉由如圖十五所述步驟7201至7209的方式來進行訓練,所以也可以提供如圖十四A及圖十四B中的步驟720所述的「AI提升解析度+增強影像品質」的合併功能。 In a preferred embodiment of the present invention, the AI processing module of the client device 702 includes only a single set of AI formulas and a plurality of weighting parameters, which are trained via steps 7201 to 7209 as described in Figure 15. Therefore, it can also provide the combined function of "AI-enhanced resolution + improved image quality" as described in step 720 of Figures 14A and 14B.
圖十六是本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的方法的第八實施例示意圖。由於圖十六所示大部分步驟都和圖十四A及圖十四B所示相同,所以相同或類似的步驟將給予相同的編號且不贅述其細節。於圖十六所示實施例中,伺服器701更包含一AI編碼模組。執行於伺服器701的第一應用程式依據指令產生具第二解析度的複數原圖影像(步驟721)。接著,於步驟722中,伺服器701使用該AI編碼模組來將複數個該原圖影像進行降低解析度以獲得相對應的複數個該來源影像、以及將複數個該來源影像進行編碼以獲得相對應的複數個該編碼後影像。該AI編碼模組包含預設的至少一AI編碼運算式;該至少一AI編碼運算式包含預設的複數個編碼加權參數。然後,編碼後影像以影像串流的方式傳送給客戶端裝置750(步驟714)。於本實施例中,客戶端裝置702的AI處理模組更包括一AI解碼運算式以供將所接收之編碼後影像解碼成為相對應的解碼後影像。換言之,該AI解碼運算式、該第一預設的AI運算式、該第二預設的AI運算式、複數個該第一加權參數、以及複數個該第二加權參數全部都包含在該客戶端裝置702的同一個該AI處理模組內,以供把接收到的複數個該編碼後影像直接處理成解碼後且具高影像品質且具該第二解析度的複數個該高解析度影像。所以,於步驟723中,客戶端裝置702的AI處理模組接受並使用該AI解碼運算式、該第一預設的AI運算式、該第二預設的AI運算式、複數個該第一加權參數、以及複數個該第二加權參數來處理該些編碼後影像以直接產生相對應具第二解析度的複數個該高解析度影像。之後,如步驟718,客戶端裝置702將該些高解析度影像做為輸出影像並輸出至螢幕上。 Figure 16 is a schematic diagram of an eighth embodiment of the method of reducing the network bandwidth required for video streaming using an artificial intelligence processing module. Since most of the steps shown in Figure 16 are the same as those shown in Figures 14A and 14B, identical or similar steps will be given the same designation and their details will not be elaborated. In the embodiment shown in Figure 16, server 701 further includes an AI encoding module. A first application running on server 701 generates a plurality of original image images with a second resolution according to instructions (step 721). Then, in step 722, server 701 uses the AI encoding module to reduce the resolution of the plurality of original image images to obtain a corresponding plurality of source images, and to encode the plurality of source images to obtain a corresponding plurality of encoded images. The AI encoding module includes at least one preset AI encoding formula; the at least one AI encoding formula includes a plurality of preset encoding weighting parameters. Then, the encoded image is transmitted to the client device 750 via video streaming (step 714). In this embodiment, the AI processing module of the client device 702 further includes an AI decoding formula for decoding the received encoded image into a corresponding decoded image. In other words, the AI decoding formula, the first preset AI formula, the second preset AI formula, the plurality of first weighting parameters, and the plurality of second weighting parameters are all contained within the same AI processing module of the client device 702, so as to directly process the received plurality of encoded images into a plurality of high-resolution images with high image quality and the second resolution after decoding. Therefore, in step 723, the AI processing module of the client device 702 receives and uses the AI decoding formula, the first preset AI formula, the second preset AI formula, the plurality of first weighting parameters, and the plurality of second weighting parameters to process the encoded images to directly generate a plurality of high-resolution images with corresponding second resolution. Then, as in step 718, client device 702 outputs these high-resolution images as output images to the screen.
圖十七是本發明所述人工神經網路模組的該AI編碼運算式、該AI解碼運算式、第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數的訓練方式的一實施例示意圖。於本發明中,伺服器701內的AI編碼運算式及其加權參數、以及客戶端裝置702內的AI處理模組中的 AI解碼運算式及其加權參數、第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數都是藉由在該訓練伺服器上執行人工神經網路的訓練程序所預先定義。當訓練完成後,AI編碼運算式及其加權參數會被應用於伺服器701的AI編碼模組中以供執行圖十六所示的步驟722(AI編碼步驟);同時,AI解碼運算式及其加權參數、第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數被應用在客戶端裝置702的AI處理模組中以執行如圖十六所示之步驟723(合併處理AI解碼+提升解析度+增強影像品質的步驟)。在訓練伺服器中訓練該AI編碼運算式及其加權參數、該AI解碼運算式及其加權參數、第一預設的AI運算式、第二預設的AI運算式、第一加權參數及第二加權參數的步驟包括: Figure 17 is an embodiment of the training method for the AI encoding operation, AI decoding operation, first preset AI operation, second preset AI operation, first weighting parameter, and second weighting parameter of the artificial neural network module described in this invention. In this invention, the AI encoding operation and its weighting parameter in server 701, and the AI decoding operation and its weighting parameter, first preset AI operation, second preset AI operation, first weighting parameter, and second weighting parameter in the AI processing module in client device 702 are all predefined by executing the artificial neural network training program on the training server. After training is complete, the AI encoding formula and its weighting parameters are applied to the AI encoding module of server 701 to execute step 722 (AI encoding step) shown in Figure 16; simultaneously, the AI decoding formula and its weighting parameters, the first preset AI formula, the second preset AI formula, the first weighting parameter, and the second weighting parameter are applied to the AI processing module of client device 702 to execute step 723 (combined processing of AI decoding + resolution enhancement + image quality enhancement) as shown in Figure 16. The steps for training the AI encoding formula and its weighting parameters, the AI decoding formula and its weighting parameters, the first preset AI formula, the second preset AI formula, the first weighting parameter, and the second weighting parameter in the training server include:
步驟7221:在該訓練伺服器中啟用一訓練模式並在訓練模式中執行第一應用程式以產生複數個訓練原圖影像(步驟7222);複數個該訓練原圖影像具有該第二解析度(高解析度)。 Step 7221: Enable a training mode on the training server and execute the first application within the training mode to generate a plurality of training original image images (Step 7222); the plurality of training original image images have the second resolution (high resolution).
步驟7223:執行一解析度降低程序,將複數個該訓練原圖影像的解析度由該第二解析度降低至該第一解析度,以產生具該第一解析度的複數個訓練低解析度影像(步驟7224)。 Step 7223: Perform a resolution reduction procedure to reduce the resolution of a plurality of the training source images from the second resolution to the first resolution, thereby generating a plurality of low-resolution training images with the first resolution (Step 7224).
步驟7225:使用一第一人工神經網路模組來接受並使用一訓練編碼運算式來逐一處理複數個該訓練低解析度影像以產生相對應之具該第一解析度的複數個訓練編碼影像(步驟7226);該訓練編碼運算式具有複數個訓練編碼加權參數。 Step 7225: A first artificial neural network module is used to receive and process a plurality of the training low-resolution images one by one using a training coding algorithm to generate a plurality of corresponding training coded images with the first resolution (Step 7226); the training coding algorithm has a plurality of training coding weighting parameters.
步驟7227:使用一第二人工神經網路模組來接受並使用一訓練解碼運算式來逐一處理複數個該訓練編碼影像以產生相對應之具該第二解析度的複數個訓練輸出影像(步驟7228);該訓練解碼運算式具有複數個訓練解碼加權參數。 Step 7227: A second artificial neural network module is used to receive and process a plurality of the trained encoded images one by one using a training decoding algorithm to generate a plurality of corresponding trained output images with the second resolution (Step 7228); the training decoding algorithm has a plurality of training decoding weighting parameters.
步驟7229:使用一比較模組來逐一比較複數個該訓練輸出影像和相對應的複數個該訓練原圖影像之間的差異,並據以調整該訓練編碼運算式的該些訓練編碼加權參數以及該訓練解碼運算式的該些訓練解碼加權參數。該些訓練編碼加權參數以及該些訓練解碼加權參數會被調整成可讓該訓練輸出影像與相對應之該訓練原圖影像之間的差異最小化。每一次當該些訓練編碼加權參數以及該些訓練解碼加權參數被調整後,調整後的該 些訓練編碼加權參數以及該些訓練解碼加權參數就會分別被回饋給該訓練編碼運算式以及該訓練解碼運算式以供處理下一個該訓練低解析度影像。於步驟7220中,在進行過預定數量的該訓練輸出影像與相對應之該訓練原圖影像的比較、以及預定次數的複數個該訓練編碼加權參數以及該些訓練解碼加權參數的調整程序後,最後所得到的該些訓練編碼加權參數會被應用在該伺服器的該AI編碼模組的AI編碼運算式中;並且,所得到的該些訓練解碼加權參數會被應用在該客戶端裝置的該AI處理模組的至少一該數學運算式中。藉此,該伺服器的AI編碼模組可以如圖十六之步驟722般合併地處理將原圖影像的解析度降低與編碼的程序;並且,該客戶端裝置的該AI處理模組可以如圖十六之步驟723般對所接收到的該編碼後影像合併性地執行解碼、解析度提升以及影像品質增強的程序。 Step 7229: Use a comparison module to compare the differences between a plurality of the training output images and the corresponding plurality of the training source images one by one, and adjust the training encoding weighting parameters of the training encoding operation and the training decoding weighting parameters of the training decoding operation accordingly. The training encoding weighting parameters and the training decoding weighting parameters are adjusted to minimize the differences between the training output images and the corresponding training source images. Each time the training encoding weighting parameters and the training decoding weighting parameters are adjusted, the adjusted training encoding weighting parameters and the training decoding weighting parameters are fed back to the training encoding operation and the training decoding operation respectively to process the next training low-resolution image. In step 7220, after comparing a predetermined number of training output images with the corresponding training original images, and adjusting a predetermined number of training encoding weighting parameters and training decoding weighting parameters, the final training encoding weighting parameters are applied to the AI encoding formula of the AI encoding module of the server; and the final training decoding weighting parameters are applied to at least one mathematical formula of the AI processing module of the client device. Therefore, the server's AI encoding module can combine the processes of reducing the resolution of the original image and encoding it, as shown in step 722 of Figure 16; and the client device's AI processing module can combine the processes of decoding, resolution enhancement, and image quality enhancement on the received encoded image, as shown in step 723 of Figure 16.
於一較佳實施例中,客戶端裝置702的AI處理模組僅包含單一組的AI運算式及複數個加權參數,其是藉由如圖十七所述步驟7221至7229的方式來進行訓練,所以也可以提供如圖十六中的步驟723所述的「AI解碼+AI提升解析度+AI增強影像品質」的合併功能。 In a preferred embodiment, the AI processing module of client device 702 contains only a single set of AI formulas and a plurality of weighting parameters, which are trained via steps 7221 to 7229 as described in Figure 17. Therefore, it can also provide the combined function of "AI decoding + AI resolution enhancement + AI image quality enhancement" as described in step 723 of Figure 16.
在本發明的一個實施例中,可以使用以下現有技術中的任一種人工神經網絡技術作為第一人工神經網絡模組在伺服器中執行AI編碼步驟:自動編碼器(Autoencoder;簡稱AE)、去噪自動編碼器(Denoising Autoencoder;簡稱DAE)、變分自編碼器(Variational autoencoder;簡稱VAE)和矢量量化變分自編碼器(Vector-Quantized Variational Autoencoder;簡稱VQ-VAE)。用於在客戶端裝置中用於執行AI解碼、AI提升解析度和AI增強影像品質的第二人工神經網絡模組可以選自以下現有的人工神經網絡技術:SRCNN、EDSR、RCAN、EnhanceNet、SRGAN和ESRGAN。 In one embodiment of the present invention, any of the following existing artificial neural network techniques can be used as the first artificial neural network module to perform AI encoding steps on the server: Autoencoder (AE), Denoising Autoencoder (DAE), Variational Autoencoder (VAE), and Vector-Quantized Variational Autoencoder (VQ-VAE). The second artificial neural network module used in the client device to perform AI decoding, AI resolution enhancement, and AI image quality enhancement can be selected from the following existing artificial neural network techniques: SRCNN, EDSR, RCAN, EnhanceNet, SRGAN, and ESRGAN.
於一較佳實施例中,複數個該原圖影像可以是三維(three-dimensional;簡稱3D)影像;每一個3D影像分別包含以並排方式組合在一個圖像幀中的左眼視圖和右眼視圖。因此,在客戶端裝置產生的相對應的輸出影像也會是3D影像。 In a preferred embodiment, the plurality of original images may be three-dimensional (3D) images; each 3D image contains a left-eye view and a right-eye view, respectively, combined side-by-side in a single image frame. Therefore, the corresponding output image generated by the client device will also be a 3D image.
於一較佳實施例中,根據本發明利用人工智慧處理模組來降低影像串流所需網路頻寬的系統還可以應用於機器人的遠程控制系統。本發明的伺服器可以是機器人,其包括運動模組、攝像頭模組、通訊模組和控 制模組。本發明的客戶端裝置可以是包括控制器模組和顯示器的機器人控制設備。機器人通過網際網路或其他無線通信技術與控制設備遠程連接。控制器模組可由使用者操作以向機器人發送控制指令,從而遠程控制和操作機器人的運動和動作。機器人的攝像頭模組包括雙眼影像擷取模組,以獲取3D影像(左眼視圖和右眼視圖並排組合在一個圖像幀中)。依據從控制設備接收到的控制指令,機器人可以進行移動和其他動作,也可以拍攝機器人周圍環境的3D影像,然後將這些3D影像發送回控制設備並顯示在顯示器上。通過使用本發明的方法,客戶端裝置(控制設備)可以配備預先訓練過的AI處理模組;藉此,機器人的雙眼影像擷取模組只需要拍攝少量數據的低解析度影像並消耗較少的網路頻寬快速地傳送到客戶端裝置,然後客戶端裝置可以使用AI處理模組恢復3D影像的高解析度以及高影像品質。此外,由於機器人是拍攝及處理低解析度影像,其所需消耗的運算資源相對更低而且更省電,可以延長機器人的遠端作業時間。 In a preferred embodiment, the system according to the invention, which utilizes an artificial intelligence processing module to reduce the network bandwidth required for video streaming, can also be applied to a remote control system for robots. The server of the invention can be a robot, comprising a motion module, a camera module, a communication module, and a control module. The client device of the invention can be a robot control device including a controller module and a display. The robot is remotely connected to the control device via the Internet or other wireless communication technologies. The controller module can be operated by a user to send control commands to the robot, thereby remotely controlling and manipulating the robot's movement and actions. The robot's camera module includes a binocular image capture module to acquire 3D images (left-eye and right-eye views combined side-by-side in a single image frame). Based on control commands received from the control device, the robot can move and perform other actions, and can also capture 3D images of its surrounding environment, then send these 3D images back to the control device and display them on a monitor. Using the method of this invention, the client device (control device) can be equipped with a pre-trained AI processing module; thereby, the robot's binocular image acquisition module only needs to capture a small amount of low-resolution image data and quickly transmit it to the client device with less network bandwidth. The client device can then use the AI processing module to restore the 3D images to high resolution and high image quality. Furthermore, because the robot captures and processes low-resolution images, its required computing resources are relatively lower and it is more power-efficient, which can extend the robot's remote operation time.
以上所述僅為本發明的較佳實施例,並非用來侷限本發明的可實施範圍。本發明的保護範圍應以申請專利範圍內容所載為準。任何熟習本項技術之人對於本發明的各種修改與改變,都可能未偏離本發明的發明精神與可實施範圍,而仍受到本發明的申請專利範圍內容所涵蓋。 The above description is merely a preferred embodiment of the present invention and is not intended to limit the scope of its implementation. The scope of protection of the present invention shall be determined by the contents of the patent application. Any modifications and alterations to the present invention made by those skilled in the art may not depart from the spirit and scope of the invention and shall still be covered by the contents of the patent application.
701:伺服器 701: Server
702:客戶端裝置 702: Client Device
711-718:步驟 711-718: Steps
Claims (3)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW110148954A TWI911362B (en) | 2021-12-27 | Method for reducing network bandwidth required for image streaming by using artificial intelligence processing module |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW110148954A TWI911362B (en) | 2021-12-27 | Method for reducing network bandwidth required for image streaming by using artificial intelligence processing module |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202326526A TW202326526A (en) | 2023-07-01 |
| TWI911362B true TWI911362B (en) | 2026-01-11 |
Family
ID=
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170070554A1 (en) | 2015-09-09 | 2017-03-09 | Vantrix Corporation | Method and system for flow-rate regulation in a content-controlled streaming network |
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170070554A1 (en) | 2015-09-09 | 2017-03-09 | Vantrix Corporation | Method and system for flow-rate regulation in a content-controlled streaming network |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| TWI826321B (en) | A method for enhancing quality of media | |
| US11973979B2 (en) | Image compression for digital reality | |
| US12214277B2 (en) | Method and device for generating video frames | |
| US12003386B2 (en) | Method for enhancing quality of media | |
| US11290345B2 (en) | Method for enhancing quality of media | |
| US20110221865A1 (en) | Method and Apparatus for Providing a Video Representation of a Three Dimensional Computer-Generated Virtual Environment | |
| CN110087081B (en) | Video encoding method, device, server and storage medium | |
| CN102870412A (en) | Providing of encoded video applications in a network environment | |
| TW201501761A (en) | Improving rate control bit allocation for video streaming based on an attention area of a gamer | |
| CN113297937A (en) | Image processing method, device, equipment and medium | |
| CN115170713B (en) | Three-dimensional scene cloud rendering method and system based on super network | |
| US11006184B2 (en) | Enhanced distribution image system | |
| US20170221174A1 (en) | Gpu data sniffing and 3d streaming system and method | |
| CN115423925B (en) | Cloud rendering method and device based on neural network compression rendering information | |
| TW202401364A (en) | Image synthesis | |
| TWI911362B (en) | Method for reducing network bandwidth required for image streaming by using artificial intelligence processing module | |
| JP2023104295A (en) | How to Use Artificial Intelligence Modules to Reduce Network Bandwidth Needed for Video Streaming | |
| TW202326526A (en) | Method for reducing network bandwidth required for image streaming by using artificial intelligence processing module | |
| US12477111B2 (en) | Parameterized noise synthesis for graphical artifact removal | |
| HK40071447A (en) | Image processing method and apparatus, device and readable storage medium | |
| CN119031156A (en) | A video stereoscopic visual effect conversion system, method, storage medium and electronic device |