TWI885862B

TWI885862B - See-through display method and see-through display system

Info

Publication number: TWI885862B
Application number: TW113114452A
Authority: TW
Inventors: 佑和; 林士豪; 徐文正; 楊朝光
Original assignee: 宏碁股份有限公司
Priority date: 2024-04-18
Filing date: 2024-04-18
Publication date: 2025-06-01
Also published as: TW202542715A

Abstract

A see-through display method and a see-through display system are disclosed. The method includes the following steps. A user image is captured toward the front side of a display through a first image sensor, and a scene image is captured toward the rear side of the display through a second image sensor. User location information associated with the three-dimensional reference coordinate system is obtained according to the user image. Scene position information associated with the three-dimensional reference coordinate system is obtained according to the scene image. A viewing frustum is determined based on the user position information and an actual size of the display. A projection matrix of the viewing frustum is used to generate a display frame projected on the display plane of the display according to the scene position information. The display frame is output through the display to show the scene behind the monitor.

Description

Transparent display method and transparent display system

本發明是有關於一種影像顯示技術，且特別是有關於一種穿透式顯示方法和穿透式顯示系統。 The present invention relates to an image display technology, and in particular to a transmissive display method and a transmissive display system.

隨著技術的進步，擴增實境(Augmented Reality，AR)應用已變得越來越流行。這種技術不僅在娛樂領域有所突破，還廣泛應用於商業、教育、醫療等領域。隨著AR技術的不斷成熟和普及，人們可以透過AR眼鏡、智慧手機、各式手持電子裝置或各式穿戴電子裝置，將虛擬元素融入現實世界中，為使用者提供豐富的互動體驗。總的來說，AR技術將繼續改變現代人的生活方式，為現代人帶來更多便利和豐富的體驗。 With the advancement of technology, augmented reality (AR) applications have become more and more popular. This technology has not only made breakthroughs in the field of entertainment, but has also been widely used in business, education, medicine and other fields. With the continuous maturity and popularization of AR technology, people can integrate virtual elements into the real world through AR glasses, smart phones, various handheld electronic devices or various wearable electronic devices, providing users with rich interactive experience. In general, AR technology will continue to change the lifestyle of modern people and bring more convenience and rich experience to modern people.

一般來說，設置於手持電子裝置背面的相機可拍攝現實場景，而手持電子裝置顯示的AR畫面包括現實場景影像與叠合於現實場景影像上的虛擬元素。傳統上，由於手持電子裝置的可移動性以及與使用者之間的相對位置變化不明顯，因此基於使用者追蹤結果來決定AR畫面是不必要的。然而，當企圖將AR技術應用到位於固定位置的大型顯示器，若不考量使用者與顯示器之間的相對位置關係，將導致AR畫面中的場景內容無法符合使用者需求。舉例來說，使用者可能無法透過顯示器的AR畫面觀看到感興趣的場景物件。 Generally speaking, the camera installed on the back of a handheld electronic device can shoot real scenes, and the AR screen displayed by the handheld electronic device includes real scene images and virtual elements superimposed on the real scene images. Traditionally, due to the mobility of handheld electronic devices and the unclear changes in relative position between the user and the user, it is unnecessary to determine the AR screen based on user tracking results. However, when attempting to apply AR technology to a large display at a fixed position, if the relative position relationship between the user and the display is not considered, the scene content in the AR screen will not meet the user's needs. For example, the user may not be able to view the scene objects of interest through the AR screen of the display.

本發明提供一種能够有效地解决上述問題的穿透式顯示方法和穿透式顯示系統。 The present invention provides a transmissive display method and a transmissive display system that can effectively solve the above problems.

本發明的範例實施例提供一種穿透式顯示方法，其適用於包括第一影像感測器、第二影像感測器與顯示器的穿透式顯示系統。所述穿透式顯示方法包括：透過第一影像感測器朝顯示器的前側擷取使用者影像，並透過第二影像感測器朝顯示器的後側擷取場景影像；根據使用者影像獲取關聯於立體參考座標系的使用者位置資訊；根據場景影像獲取關聯於立體參考座標系的場景位置資訊；根據使用者位置資訊與顯示器的實際尺寸，決定視錐體(viewing frustum)；利用視錐體的投影矩陣，根據場景位置資訊產生投影於顯示器的顯示平面的顯示圖幀；以及透過顯示器輸出顯示圖幀，以顯示顯示器的後側的場景。 An exemplary embodiment of the present invention provides a transmissive display method, which is applicable to a transmissive display system including a first image sensor, a second image sensor and a display. The transparent display method includes: capturing a user image toward the front side of the display through a first image sensor, and capturing a scene image toward the rear side of the display through a second image sensor; obtaining user position information related to a stereo reference coordinate system based on the user image; obtaining scene position information related to the stereo reference coordinate system based on the scene image; determining a viewing frustum based on the user position information and the actual size of the display; using a projection matrix of the viewing frustum to generate a display frame projected on a display plane of the display according to the scene position information; and outputting the display frame through the display to display the scene behind the display.

本發明的另一範例實施例提供一種穿透式顯示系統，所述穿透式顯示系統包括：第一影像感測器、第二影像感測器、顯示器，以及至少一處理器。處理器耦接至第一影像感測器、第二影像感測器和顯示器，並用以：透過第一影像感測器朝顯示器的前側擷取使用者影像，並透過第二影像感測器朝顯示器的後側擷取場景影像；根據使用者影像獲取關聯於立體參考座標系的使用者位置資訊；根據場景影像獲取關聯於立體參考座標系的場景位置資訊；根據使用者位置資訊與顯示器的實際尺寸，決定視錐體(viewing frustum)；利用視錐體的投影矩陣，根據場景位置資訊產生投影於顯示器的顯示平面的顯示圖幀；以及透過顯示器輸出顯示圖幀，以顯示顯示器的後側的場景。 Another exemplary embodiment of the present invention provides a transmissive display system, the transmissive display system comprising: a first image sensor, a second image sensor, a display, and at least one processor. The processor is coupled to the first image sensor, the second image sensor and the display, and is used to: capture a user image toward the front side of the display through the first image sensor, and capture a scene image toward the rear side of the display through the second image sensor; obtain user position information related to a stereo reference coordinate system based on the user image; obtain scene position information related to a stereo reference coordinate system based on the scene image; determine a viewing frustum based on the user position information and the actual size of the display; generate a display frame projected on a display plane of the display based on the scene position information using a projection matrix of the viewing frustum; and output the display frame through the display to display the scene behind the display.

基於上述，於本發明實施例中，可根據使用者影像與場景影像獲取同一立體參考座標系下的使用者位置資訊與場景位置資訊。用以決定顯示圖幀之顯示內容的視錐體可基於使用者位置資訊與顯示器的實際尺寸而定。於是，顯示器所輸出的顯示圖幀的顯示場景內容可反應於使用者移動而改變，並與顯示器周遭的實際場景達到良好的對齊。 Based on the above, in the embodiment of the present invention, the user position information and the scene position information in the same stereo reference coordinate system can be obtained based on the user image and the scene image. The viewing cone used to determine the display content of the display frame can be based on the user position information and the actual size of the display. Therefore, the display scene content of the display frame output by the display can change in response to the user's movement and achieve good alignment with the actual scene around the display.

10,70:穿透式顯示系統 10,70: Transparent display system

110:第一影像感測器 110: First image sensor

120:第二影像感測器 120: Second image sensor

130:顯示器 130: Display

140:儲存裝置 140: Storage device

150:處理器 150: Processor

160:深度感測器 160: Depth sensor

F1:顯示圖幀 F1: Display frame

Obj1:場景物件 Obj1: Scene object

U1:使用者 U1: User

111,112,113:FOV 111,112,113:FOV

S1:顯示平面 S1: Display plane

DP1~DP4:頂點 DP1~DP4: Vertex

VP1:使用者座標 VP1: User coordinates

51:視錐體 51: Visual cone

S210~S260,S310~S330,S810~S870:步驟 S210~S260,S310~S330,S810~S870: Steps

圖1A至圖1C是根據本發明實施例的穿透式顯示系統的示意圖。 Figures 1A to 1C are schematic diagrams of a transmissive display system according to an embodiment of the present invention.

圖2是根據本發明實施例的穿透式顯示方法的流程圖。 Figure 2 is a flow chart of a penetration display method according to an embodiment of the present invention.

圖3是根據本發明實施例的獲取場景位置資訊的流程圖。 Figure 3 is a flow chart of obtaining scene location information according to an embodiment of the present invention.

圖4是根據本發明實施例的立體參考座標系的示意圖。 Figure 4 is a schematic diagram of a three-dimensional reference coordinate system according to an embodiment of the present invention.

圖5是根據本發明實施例的根據使用者位置資訊決定視錐體的示意圖。 FIG5 is a schematic diagram of determining a viewing cone based on user location information according to an embodiment of the present invention.

圖6A與圖6B是根據本發明實施例的顯示場景的情境示意圖。 Figures 6A and 6B are schematic diagrams of display scenes according to an embodiment of the present invention.

圖7是根據本發明實施例的穿透式顯示系統的示意圖。 FIG7 is a schematic diagram of a transparent display system according to an embodiment of the present invention.

圖8是根據本發明實施例的穿透式顯示方法的流程圖。 Figure 8 is a flow chart of a penetration display method according to an embodiment of the present invention.

本揭露的部份範例實施例接下來將會配合附圖來詳細描述，以下的描述所引用的元件符號，當不同附圖出現相同的元件符號將視為相同或相似的元件。這些範例實施例只是本揭露的一部份，並未揭示所有本揭露的可實施方式。更確切的說，這些範例實施例僅為本揭露的專利申請範圍中的方法以及系統的範例。 Some exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. When the same element symbols appear in different drawings, they will be regarded as the same or similar elements. These exemplary embodiments are only part of the present disclosure and do not disclose all possible implementations of the present disclosure. More specifically, these exemplary embodiments are only examples of methods and systems within the scope of the patent application of the present disclosure.

請參照圖1A，於一些實施例中，穿透式顯示系統10可實現為例如以下具有影像處理能力和資料計算能力的電子裝置中：筆記型電腦、平板電腦、個人電腦、伺服器、遊戲機、可攜式電子裝置、桌上電腦或其他電子裝置。穿透式顯示系統10包括第一影像感測110、第二影像感測器120、顯示器130、儲存裝置140，以及至少一處理器150。 Referring to FIG. 1A , in some embodiments, the transparent display system 10 can be implemented in, for example, the following electronic devices with image processing capabilities and data computing capabilities: laptops, tablet computers, personal computers, servers, game consoles, portable electronic devices, desktop computers, or other electronic devices. The transparent display system 10 includes a first image sensor 110, a second image sensor 120, a display 130, a storage device 140, and at least one processor 150.

處理器150負責穿透式顯示系統10的全部或部分操作。舉例來說，處理器150可包括中央處理器(central processing unit，CPU)、圖形處理器(graphic processing unit，GPU)或其他可程式化通用或專用微處理器、數字信號處理器(digital signal processor，DSP)、可程式化控制器、特殊應用積體電路(application specific integrated circuit，ASIC)、可程式化邏輯裝置(programmable logic device，PLD)或其他類似裝置或這些裝置的組合。處理器150的數量可為一個或多個，本發明對此不做限制。 The processor 150 is responsible for all or part of the operation of the transparent display system 10. For example, the processor 150 may include a central processing unit (CPU), a graphic processing unit (GPU) or other programmable general or dedicated microprocessor, a digital signal processor (DSP), a programmable controller, an application specific integrated circuit (ASIC), a programmable logic device (PLD) or other similar devices or a combination of these devices. The number of processors 150 may be one or more, and the present invention is not limited to this.

儲存裝置140連接至處理器150，並用以暫時性地或永久地儲存資料，像是用以儲存影像、指令、程式碼、軟體模組等等資料。具體來說，儲存裝置140可包括揮發性儲存電路。揮發性儲存電路用以以揮發性方式儲存資料。舉例來說，揮發性儲存電路可包括隨機存取記憶體(random access memory，RAM)或類似的揮發性儲存媒體。或者，儲存裝置140可包括非揮發性儲存電路。非揮發性儲存電路用以以非揮發性方式儲存資料。舉例來說，非揮發性儲存電路可包括唯讀記憶體(read only memory，ROM)、固態硬碟(solid state drive，SSD)和/或傳統硬碟(traditional hard disk drive，HDD)或類似的非揮發性儲存媒體。儲存裝置140的數量可為一個或多個，本發明對此不做限制。 The storage device 140 is connected to the processor 150 and is used to temporarily or permanently store data, such as images, instructions, program codes, software modules, and the like. Specifically, the storage device 140 may include a volatile storage circuit. The volatile storage circuit is used to store data in a volatile manner. For example, the volatile storage circuit may include a random access memory (RAM) or a similar volatile storage medium. Alternatively, the storage device 140 may include a non-volatile storage circuit. The non-volatile storage circuit is used to store data in a non-volatile manner. For example, the non-volatile storage circuit may include a read only memory (ROM), a solid state drive (SSD) and/or a traditional hard disk drive (HDD) or similar non-volatile storage media. The number of storage devices 140 may be one or more, and the present invention is not limited thereto.

顯示器130例如是液晶顯示器(Liquid Crystal Display，LCD)、發光二極體(Light-Emitting Diode，LED)顯示器、有機發光二極體顯示器(Organic Light-Emitting Diode，OLED)或其他種類的顯示器，本發明對此不限制。於一些實施例中，顯示器130可以是立體顯示器，分別提供不同影像給使用者的左眼或右眼，以呈現出立體視覺效果，但本發明不限制於此。舉例而言，顯示器130可以是裸視3D顯示器或眼鏡式3D顯示器。 The display 130 is, for example, a liquid crystal display (LCD), a light-emitting diode (LED) display, an organic light-emitting diode (OLED) display, or other types of displays, and the present invention is not limited thereto. In some embodiments, the display 130 may be a stereoscopic display that provides different images to the left eye or the right eye of the user to present a stereoscopic visual effect, but the present invention is not limited thereto. For example, the display 130 may be a naked-eye 3D display or a glasses-type 3D display.

第一影像感測器110用以擷取影像，且包括具有透鏡以及感光元件的攝像鏡頭。第一影像感測器110例如可實現為由所述透鏡、所述感光元件與其他組件組成的相機模組。感光元件例如是電荷耦合元件(Charge Coupled Device，CCD)、互補性氧化金屬半導體(Complementary Metal-Oxide Semiconductor，CMOS)元件或其他元件，本發明對此不限制。從另一觀點來看，第一影像感測器110可為RGB影像感測器。 The first image sensor 110 is used to capture images and includes a camera lens having a lens and a photosensitive element. The first image sensor 110 can be implemented as a camera module composed of the lens, the photosensitive element and other components. The photosensitive element is, for example, a charge coupled device (CCD), a complementary metal-oxide semiconductor (CMOS) element or other elements, and the present invention is not limited thereto. From another point of view, the first image sensor 110 can be an RGB image sensor.

第二影像感測器120用以擷取影像，且包括具有透鏡以及感光元件的攝像鏡頭。第二影像感測器120例如可實現為由透鏡、感光元件與其他組件組成的相機模組。感光元件例如是電荷耦合元件、互補性氧化金屬半導體元件或其他元件，本發明對此不限制。從另一觀點來看，第二影像感測器120可為RGB影像感測器。 The second image sensor 120 is used to capture images and includes a camera lens having a lens and a photosensitive element. The second image sensor 120 can be implemented as a camera module composed of a lens, a photosensitive element and other components. The photosensitive element is, for example, a charge coupled device, a complementary metal oxide semiconductor element or other element, and the present invention is not limited thereto. From another point of view, the second image sensor 120 can be an RGB image sensor.

請參照圖1B，於本發明實施例中，使用者U1位於顯示器130的前側，第一影像感測器110用以朝顯示器130的前側擷取使用者影像。根據本領域具通常知識者所熟知的眼球追踪技術或臉部追蹤技術，此使用者影像可用以追蹤使用者U1的眼部位置資訊或臉部位置資訊等等。第二影像感測器120用以朝顯示器130的後側擷取場景影像。基此，處理器150可基於第一影像感測器 110所擷取的使用者影像獲取使用者U1的使用者位置資訊，而處理器150可基於第二影像感測器120所擷取的場景影像獲取場景位置資訊，例如場景物件Obj1的空間位置資訊。 Referring to FIG. 1B , in the embodiment of the present invention, the user U1 is located in front of the display 130, and the first image sensor 110 is used to capture the user image toward the front of the display 130. According to the eye tracking technology or face tracking technology known to those skilled in the art, the user image can be used to track the eye position information or face position information of the user U1, etc. The second image sensor 120 is used to capture the scene image toward the back of the display 130. Based on this, the processor 150 can obtain the user position information of the user U1 based on the user image captured by the first image sensor 110, and the processor 150 can obtain the scene position information, such as the spatial position information of the scene object Obj1, based on the scene image captured by the second image sensor 120.

於本發明實施例中，處理器150可根據使用者U1的使用者位置資訊決定顯示圖幀F1，顯示圖幀F1用以顯示顯示器130後側的場景，且顯示圖幀F1的顯示場景可與周遭實際場景大致上對齊。如圖1B所示，雖然使用者U1無法直接看到被顯示器130遮蔽的場景物件Obj1，但顯示圖幀F1可包括場景物件Obj1的影像，以實現穿透顯示的功能。值得注意的是，當使用者U1移動，被顯示器130遮蔽的場景內容也對應改變。顯示圖幀F1的顯示場景也可反應於使用者U1移動而改變，以使顯示圖幀F1的顯示場景可保持與周遭實際場景大致上對齊。 In the embodiment of the present invention, the processor 150 may determine the display frame F1 according to the user location information of the user U1. The display frame F1 is used to display the scene behind the display 130, and the display scene of the display frame F1 may be roughly aligned with the surrounding actual scene. As shown in FIG. 1B , although the user U1 cannot directly see the scene object Obj1 blocked by the display 130, the display frame F1 may include the image of the scene object Obj1 to realize the function of penetrating display. It is worth noting that when the user U1 moves, the scene content blocked by the display 130 also changes accordingly. The display scene of the display frame F1 can also change in response to the movement of the user U1, so that the display scene of the display frame F1 can remain roughly aligned with the surrounding actual scene.

請參照圖1C，於本發明實施例中，穿透式顯示系統10可為筆記型電腦。第一影像感測器110可為設置顯示器130的顯示平面上方的前相機模組，而第二影像感測器120可為設置於筆記型電腦上蓋的後相機模組。當使用者U1操作筆記型電腦，筆記型電腦的顯示器130可根據使用者U1的當前位置顯示被筆記型電腦遮蔽的場境內容。 Please refer to FIG. 1C . In an embodiment of the present invention, the transparent display system 10 may be a laptop. The first image sensor 110 may be a front camera module disposed above the display plane of the display 130 , and the second image sensor 120 may be a rear camera module disposed on the cover of the laptop. When the user U1 operates the laptop, the display 130 of the laptop may display the scene content obscured by the laptop according to the current position of the user U1 .

於一些實施例中，第一影像感測器110的視野角(Field of View，FOV)111小於第二影像感測器120的FOV 112，以確保第二影像感測器120擷取足夠的場景內容。於一些實施例中，第二影像感測器120的鏡頭可由具備大FOV的魚眼鏡頭或廣角鏡頭來實現。 In some embodiments, the field of view (FOV) 111 of the first image sensor 110 is smaller than the FOV 112 of the second image sensor 120 to ensure that the second image sensor 120 captures sufficient scene content. In some embodiments, the lens of the second image sensor 120 can be implemented by a fisheye lens or a wide-angle lens with a large FOV.

此外，於一些實施例中，顯示器130所顯示的場景範圍可根據使用者U1的使用者位置資訊與顯示器130的實際尺寸而決定。進一步來說，在將使用者U1視為虛擬相機的條件下，處理器150可根據使用者U1的使用者位置資訊與顯示器130的實際尺寸決定此虛擬相機的FOV 113與視錐體。 In addition, in some embodiments, the scene range displayed by the display 130 can be determined according to the user position information of the user U1 and the actual size of the display 130. Further, under the condition that the user U1 is regarded as a virtual camera, the processor 150 can determine the FOV 113 and the viewing cone of the virtual camera according to the user position information of the user U1 and the actual size of the display 130.

圖2是根據本發明實施例的穿透式顯示方法的流程圖，而圖2的方法流程可由圖1A的穿透式顯示系統10來實現。在此，使用者可透過穿透式顯示系統10的顯示器130來觀看顯示器130後側的場景內容。 FIG. 2 is a flow chart of a transparent display method according to an embodiment of the present invention, and the method flow of FIG. 2 can be implemented by the transparent display system 10 of FIG. 1A. Here, the user can view the scene content behind the display 130 through the display 130 of the transparent display system 10.

於步驟S210，處理器150透過第一影像感測器110朝顯示器130的前側擷取使用者影像，並透過第二影像感測器120朝顯示器130的後側擷取場景影像。具體來說，第一影像感測器110用以朝觀看顯示器130的使用者進行拍攝，而第二影像感測器120用以朝顯示器130之後側的實際場景進行拍攝。 In step S210, the processor 150 captures a user image toward the front side of the display 130 through the first image sensor 110, and captures a scene image toward the back side of the display 130 through the second image sensor 120. Specifically, the first image sensor 110 is used to shoot toward the user viewing the display 130, and the second image sensor 120 is used to shoot toward the actual scene behind the display 130.

於步驟S220，處理器150根據使用者影像獲取關聯於立體參考座標系的使用者位置資訊。須說明的是，立體參考座標系是基於顯示器130的顯示平面而定義。於一些實施例中，使用者位置資訊可包括使用者與顯示器130之間的距離資訊。處理器150可根據使用者影像中的人臉大小、瞳距或其他人臉特徵來估測使用者與顯示器130之間的距離資訊。或者，於另一些實施例中，使用者位置資訊可包括使用者於三維座標系下的三維使用者座標。處理器150可根據第一影像感測器110的內部參數(intrinsic parameter)與外部參數，將使用者於使用者影像中的影像座標轉換為世界座標系(例如基於顯示器130而建立的立體參考座標系)中的世界座標。 In step S220, the processor 150 obtains user position information associated with a stereo reference coordinate system based on the user image. It should be noted that the stereo reference coordinate system is defined based on the display plane of the display 130. In some embodiments, the user position information may include distance information between the user and the display 130. The processor 150 may estimate the distance information between the user and the display 130 based on the face size, pupil distance, or other facial features in the user image. Alternatively, in other embodiments, the user position information may include the three-dimensional user coordinates of the user in a three-dimensional coordinate system. The processor 150 can convert the image coordinates of the user in the user image into world coordinates in a world coordinate system (e.g., a stereo reference coordinate system established based on the display 130) according to the intrinsic parameters and external parameters of the first image sensor 110.

於一些實施例中，立體參考座標系的第一座標軸與第二座標軸平行於顯示器130的顯示平面，且立體參考座標系的原點為顯示平面上的一參考點。舉例而言，立體參考座標系的原點可為顯示平面上的中心點。立體參考座標系的X軸與Y軸位於顯示平面上。亦即，顯示平面為立體參考座標系中Z=0的平面。 In some embodiments, the first coordinate axis and the second coordinate axis of the stereo reference coordinate system are parallel to the display plane of the display 130, and the origin of the stereo reference coordinate system is a reference point on the display plane. For example, the origin of the stereo reference coordinate system can be the center point on the display plane. The X axis and the Y axis of the stereo reference coordinate system are located on the display plane. That is, the display plane is the plane with Z=0 in the stereo reference coordinate system.

於一些實施例中，第一影像感測器110還可搭配至少一個深度感測器(未繪示)或距離感測器(未繪示)來對使用者進行影像辨識定位，以獲取使用者於立體參考座標系中的三維使用者座標。 In some embodiments, the first image sensor 110 may also be used in conjunction with at least one depth sensor (not shown) or distance sensor (not shown) to perform image recognition and positioning of the user to obtain the three-dimensional user coordinates of the user in the stereo reference coordinate system.

於步驟S230，處理器150根據場景影像獲取關聯於立體參考座標系的場景位置資訊。於一些實施例中，場景位置資訊可包括三維座標系下的三維場景座標。處理器150可根據第二影像感測器120的內部參數(intrinsic parameters)與外部參數，將場景影像中的影像座標轉換為世界座標系(例如基於顯示器130而建立的立體參考座標系)中的世界座標。於一些實施例中，場景影像中進行座標轉換的影像座標可由三維網格的網格節點而取樣出來。 In step S230, the processor 150 obtains scene position information associated with a stereo reference coordinate system based on the scene image. In some embodiments, the scene position information may include three-dimensional scene coordinates in a three-dimensional coordinate system. The processor 150 may convert the image coordinates in the scene image into world coordinates in a world coordinate system (e.g., a stereo reference coordinate system established based on the display 130) based on the intrinsic parameters and external parameters of the second image sensor 120. In some embodiments, the image coordinates in the scene image that are converted may be sampled from grid nodes of a three-dimensional grid.

於一些實施例中，圖3是根據本發明實施例的獲取場景位置資訊的流程圖。請參考圖3，於步驟S310，處理器150基於顯示器130的顯示平面，建立一個立體參考座標系。 In some embodiments, FIG. 3 is a flow chart of obtaining scene location information according to an embodiment of the present invention. Referring to FIG. 3 , in step S310, the processor 150 establishes a three-dimensional reference coordinate system based on the display plane of the display 130.

舉例而言，圖4是根據本發明實施例的立體參考座標系的示意圖。請參考圖4，處理器150可將立體參考座標系的原點(0,0,0)設定為顯示平面S1上的中心點。立體參考座標系的X軸為顯示器130的顯示水平軸，而立體參考座標系的Y軸為顯示器130的顯示垂直軸。立體參考座標系的Z軸穿過原點(0,0,0)且垂直於顯示平面S1。 For example, FIG. 4 is a schematic diagram of a stereo reference coordinate system according to an embodiment of the present invention. Referring to FIG. 4 , the processor 150 may set the origin (0,0,0) of the stereo reference coordinate system as the center point on the display plane S1. The X axis of the stereo reference coordinate system is the display horizontal axis of the display 130, and the Y axis of the stereo reference coordinate system is the display vertical axis of the display 130. The Z axis of the stereo reference coordinate system passes through the origin (0,0,0) and is perpendicular to the display plane S1.

於步驟S320，處理器150根據第二影像感測器120與顯示器130之間的空間位置關係，決定第二影像感測器120的外部參數(extrinsic parameters)。第二影像感測器120的外部參數描述第二影像感測器120的位置和方向，以及第二影像感測器120與世界座標系統之間的轉換關係。這些外部參數通常用來定義第二影像感測器120的空間姿勢(position and orientation)，以便將相機座標系中的點映射到世界座標系中，或者將世界座標系中的點映射到相機座標系中。 In step S320, the processor 150 determines the extrinsic parameters of the second image sensor 120 according to the spatial position relationship between the second image sensor 120 and the display 130. The extrinsic parameters of the second image sensor 120 describe the position and orientation of the second image sensor 120, and the conversion relationship between the second image sensor 120 and the world coordinate system. These extrinsic parameters are usually used to define the spatial posture (position and orientation) of the second image sensor 120, so as to map points in the camera coordinate system to the world coordinate system, or map points in the world coordinate system to the camera coordinate system.

於一些實施例中，處理器150可基於顯示器130的顯示平面定義立體參考座標系，並將此立體參考座標系作為世界座標系。在此條件下，根據第二影像感測器120與顯示器130之間的空間位置關係，處理器150可獲取第二影像感測器120於立體參考座標系中的座標位置。另外，第二影像感測器120的其他外部參數，像是拍攝方向等等，可透過相機校正程序來獲取。 In some embodiments, the processor 150 may define a stereo reference coordinate system based on the display plane of the display 130 and use the stereo reference coordinate system as the world coordinate system. Under this condition, the processor 150 may obtain the coordinate position of the second image sensor 120 in the stereo reference coordinate system according to the spatial position relationship between the second image sensor 120 and the display 130. In addition, other external parameters of the second image sensor 120, such as shooting direction, etc., may be obtained through a camera calibration procedure.

於步驟S330，處理器150根據第二影像感測器120的內部參數與外部參數對場景影像中的場景像素座標進行座標轉換，以獲取關聯於立體參考座標系的場景位置資訊。具體來說，處理器150可基於下列公式(1)來進行座標轉換，以將場景影像中的場景像素座標轉換為立體參考座標系中的立體場景座標。於一些實施例中，處理器150可將場景影像之網格的網格節點的影像座標轉換為立體場景座標。 In step S330, the processor 150 performs coordinate conversion on the scene pixel coordinates in the scene image according to the internal parameters and external parameters of the second image sensor 120 to obtain scene position information associated with the stereo reference coordinate system. Specifically, the processor 150 may perform coordinate conversion based on the following formula (1) to convert the scene pixel coordinates in the scene image into stereo scene coordinates in the stereo reference coordinate system. In some embodiments, the processor 150 may convert the image coordinates of the grid nodes of the grid of the scene image into stereo scene coordinates.

其中，(u,v)代表影像座標，(X,Y,Z)代表世界座標，

為第二影像感測器120的內部參數矩陣，而

為第二影像感測器120的外部參數矩陣。外部參數矩陣包括旋轉矩陣(rotation matrix)R與平移向量(translation vector)T。第二影像感測器120的外部參數矩陣可用以表示第二影像感測器120於世界座標系(亦即立體參考座標系)中的位置與拍攝方向。如此一來，處理器150可透過座標轉換而將場景影像中的多個影像座標轉換為立體參考座標系中的立體場景座標。

Among them, (u,v) represents the image coordinates, (X,Y,Z) represents the world coordinates,

is the internal parameter matrix of the second image sensor 120, and

is the external parameter matrix of the second image sensor 120. The external parameter matrix includes a rotation matrix R and a translation vector T. The external parameter matrix of the second image sensor 120 can be used to represent the position and shooting direction of the second image sensor 120 in the world coordinate system (i.e., the stereo reference coordinate system). In this way, the processor 150 can convert multiple image coordinates in the scene image into stereo scene coordinates in the stereo reference coordinate system through coordinate conversion.

於一些實施例中，第二影像感測器120可包括一魚眼鏡頭或一廣角鏡頭，而魚眼鏡頭或廣角鏡頭用以擷取場景影像。在透過座標轉換獲取關聯於立體參考座標系的場景位置資訊之前，處理器150可對場景影像進行一變形校正處理。換言之，處理器150可將針對魚眼影像或廣角影像的影像失真進行校正。具體而言，當第二影像感測器120透過廣角鏡頭擷取場景影像，處理器150可透過公式(2)來進行變形校正處理。當第二影像感測器120透過魚眼鏡頭擷取場景影像，處理器150可透過公式(3)來進行變形校正處理。 In some embodiments, the second image sensor 120 may include a fisheye lens or a wide-angle lens, and the fisheye lens or the wide-angle lens is used to capture a scene image. Before obtaining the scene position information related to the stereo reference coordinate system through coordinate conversion, the processor 150 may perform a deformation correction process on the scene image. In other words, the processor 150 may correct the image distortion of the fisheye image or the wide-angle image. Specifically, when the second image sensor 120 captures the scene image through the wide-angle lens, the processor 150 may perform the deformation correction process through formula (2). When the second image sensor 120 captures the scene image through the fisheye lens, the processor 150 may perform the deformation correction process through formula (3).

其中，

，θ=tan^-1 r，k_n為徑向畸變係數(radial distortion coefficients)，與p_n為切向畸變係數(tangential distortion coefficients)。

in,

, θ =tan ^-1 r , k _n is the radial distortion coefficient, and p _n is the tangential distortion coefficient.

回到圖2，於步驟S240，處理器150根據使用者位置資訊與顯示器的實際尺寸，決定視錐體(viewing frustum)。視錐體是一個用於表示相機可見區域的幾何體，又可稱之為投影視錐體。視錐體由六個平面組成，其分別為近平面、遠平面、左平面、右平面、頂平面和底平面。這些平面定義了相機可以看到的區域。也就是說，此視錐體可用以決定自場景影像擷取出局部場景影像。 Returning to FIG. 2 , in step S240 , the processor 150 determines the viewing frustum according to the user position information and the actual size of the display. The viewing frustum is a geometric body used to represent the visible area of the camera, and can also be called a projection viewing frustum. The viewing frustum is composed of six planes, namely the near plane, the far plane, the left plane, the right plane, the top plane, and the bottom plane. These planes define the area that the camera can see. In other words, this viewing frustum can be used to determine the local scene image to be captured from the scene image.

於本發明實施例中，處理器150可將使用者於立體參考座標系中的使用者座標設定為虛擬相機的座標位置，並基於此使用者座標決定視錐體。於一些實施例中，視錐體反應於使用者位置資訊改變而變動。亦即，當使用者移動，則視錐體也將對應改變。 In an embodiment of the present invention, the processor 150 may set the user coordinates of the user in the stereo reference coordinate system as the coordinate position of the virtual camera, and determine the viewing cone based on the user coordinates. In some embodiments, the viewing cone changes in response to changes in the user position information. That is, when the user moves, the viewing cone will also change accordingly.

於一些實施例中，使用者位置資訊可包括關聯於立體參考座標系的一使用者座標，視錐體是透過連接使用者座標與顯示器130之顯示平面的多個頂點而獲取。也就是說，視錐體的左平面、右平面、頂平面和底平面是根據顯示器130的顯示範圍而決定。舉例而言，圖5是根據本發明實施例的根據使用者位置資訊決定視錐體的示意圖。請參照圖5，顯示器130的顯示平面S1包括頂點DP1、DP2、DP3、DP4。在根據使用者影獲取立體參考座標系中的使用者座標VP1之後，處理器150可連接使用者座標VP1與顯示平面S1的多個頂點DP1、DP2、DP3、DP4，而獲取視錐體51。處理器150可依據預設距離來設置視錐體51的遠平面與近平面。可知的，視口(viewport)的寬高比(aspect ratio)為顯示器130的寬高比。 In some embodiments, the user position information may include a user coordinate associated with a stereo reference coordinate system, and the viewing cone is obtained by connecting the user coordinate with multiple vertices of the display plane of the display 130. That is, the left plane, right plane, top plane, and bottom plane of the viewing cone are determined according to the display range of the display 130. For example, FIG5 is a schematic diagram of determining the viewing cone according to the user position information according to an embodiment of the present invention. Referring to FIG5, the display plane S1 of the display 130 includes vertices DP1, DP2, DP3, and DP4. After obtaining the user coordinate VP1 in the stereo reference coordinate system according to the user image, the processor 150 can connect the user coordinate VP1 and multiple vertices DP1, DP2, DP3, and DP4 of the display plane S1 to obtain the viewing cone 51. The processor 150 can set the far plane and the near plane of the viewing cone 51 according to the preset distance. It can be known that the aspect ratio of the viewport is the aspect ratio of the display 130.

在根據顯示器130的實際尺寸與使用者座標決定視錐體之後，處理器150可推導出投影矩陣的參數。具體來說，視錐體的各個平面定義了投影矩陣中的參數，比如視角、視口寬高比、近平面和遠平面距離等。這些參數决定了投影矩陣的數值。於一些實施例中，此投影矩陣可為偏移透視矩陣(off-center perspective matrix)。像是，處理器150獲取的投影矩陣P可表徵為公式(5)。 After determining the viewing cone according to the actual size of the display 130 and the user coordinates, the processor 150 can derive the parameters of the projection matrix. Specifically, each plane of the viewing cone defines the parameters in the projection matrix, such as the viewing angle, the viewport aspect ratio, the distance between the near plane and the far plane, etc. These parameters determine the values of the projection matrix. In some embodiments, the projection matrix can be an off-center perspective matrix. For example, the projection matrix P obtained by the processor 150 can be represented by formula (5).

其中，near代表近平面與使用者座標之間的距離，far代表遠平面與使用者座標之間的距離，right代表顯示器130的右顯示邊界的X座標，left代表顯示器130的左顯示邊界的X座標。top代表顯示器130的上顯示邊界的Y座標，bottom代表顯示器130的下顯示邊界的Y座標。

Among them, near represents the distance between the near plane and the user coordinates, far represents the distance between the far plane and the user coordinates, right represents the X coordinate of the right display boundary of the display 130, and left represents the X coordinate of the left display boundary of the display 130. Top represents the Y coordinate of the upper display boundary of the display 130, and bottom represents the Y coordinate of the lower display boundary of the display 130.

於步驟S250，處理器150利用視錐體的投影矩陣，根據場景位置資訊產生投影於顯示器130的顯示平面的顯示圖幀。具體來說，處理器150可將立體參考座標系中的多個三維場景座標的四維齊次座標(x,y,z,1)與投影矩陣P相乘，以將這些場景座標映射到視口(亦即顯示平面)上對應的屏幕座標。上述場景座標可以是多個網格節點於立體參考座標系中的立體座標。換言之，場景影像中的局部場景影像可經由投影矩陣而投影至顯示平面而產生顯示圖幀。 In step S250, the processor 150 uses the projection matrix of the viewing cone to generate a display frame projected on the display plane of the display 130 according to the scene position information. Specifically, the processor 150 can multiply the four-dimensional coordinates (x, y, z, 1) of multiple three-dimensional scene coordinates in the stereo reference coordinate system with the projection matrix P to map these scene coordinates to the corresponding screen coordinates on the viewport (i.e., the display plane). The above scene coordinates can be the three-dimensional coordinates of multiple grid nodes in the stereo reference coordinate system. In other words, the local scene image in the scene image can be projected to the display plane via the projection matrix to generate a display frame.

於步驟S260，處理器150透過顯示器130輸出顯示圖幀，以顯示出顯示器130後側的場景。具體來說，由於場景影像中投影至顯示器130之顯示平面上的投影範圍是根據使用者位置資訊與顯示器130的實際尺寸而定，因此顯示器130所輸出之顯示圖幀不僅可呈現出顯示器130後側的場景，且顯示圖幀中的場景影像可對齊顯示器130周遭的實際場景。此外，反應於使用者移動，顯示器130的顯示圖幀的場景內容也將對應改變。 In step S260, the processor 150 outputs a display frame through the display 130 to display the scene behind the display 130. Specifically, since the projection range of the scene image projected onto the display plane of the display 130 is determined according to the user's position information and the actual size of the display 130, the display frame output by the display 130 can not only present the scene behind the display 130, but also the scene image in the display frame can be aligned with the actual scene around the display 130. In addition, in response to the user's movement, the scene content of the display frame of the display 130 will also change accordingly.

舉例而言，圖6A與圖6B是根據本發明實施例的顯示場景的情境示意圖。請參照6A，當使用者U1位於第一位置，顯示器130的顯示圖幀可包括顯示器130後側的場景內容。像是，被顯示器130遮蔽的容器將可顯示於顯示器130中。請參照6B，當使用者從第一位置移動到第二位置，基於使用者位置決定的視錐體將對應改變。於是，視錐體所擷取到場景內容也將改變，致使顯示器130所顯示的場景內容將對應調整。於一些實施例中，處理器150還可將上述顯示圖幀作為AR畫面的背景，以提供一AR功能或AR應用。 For example, FIG. 6A and FIG. 6B are schematic diagrams of the display scene according to an embodiment of the present invention. Referring to FIG. 6A, when the user U1 is at the first position, the display frame of the display 130 may include the scene content behind the display 130. For example, the container blocked by the display 130 may be displayed in the display 130. Referring to FIG. 6B, when the user moves from the first position to the second position, the visual cone determined based on the user's position will change accordingly. Therefore, the scene content captured by the visual cone will also change, causing the scene content displayed by the display 130 to be adjusted accordingly. In some embodiments, the processor 150 may also use the above display frame as the background of the AR screen to provide an AR function or AR application.

圖7是根據本發明實施例的穿透式顯示系統的示意圖。請參照圖7，穿透式顯示系統70可包括第一影像感測110、第二影像感測器120、顯示器130、儲存裝置140、至少一處理器150，以及深度感測器160。與圖1實施例不同的是，穿透式顯示系統70還可包括用以感測場景深度信息的深度感測器160。深度感測器160可以利用主動式深度感測技術以及被動式深度感測技術來實現。主動式深度感測技術可藉由主動發出光源、紅外線、超音波、雷射等作為訊號搭配時差測距技術來計算深度信息。被動式深度感測技術可以藉由兩個影像感測器以不同視角擷取其前方的兩張影像，以利用兩張影像的視差來計算深度信息。 FIG7 is a schematic diagram of a transmissive display system according to an embodiment of the present invention. Referring to FIG7 , the transmissive display system 70 may include a first image sensor 110, a second image sensor 120, a display 130, a storage device 140, at least one processor 150, and a depth sensor 160. Different from the embodiment of FIG1 , the transmissive display system 70 may also include a depth sensor 160 for sensing scene depth information. The depth sensor 160 may be implemented using active depth sensing technology and passive depth sensing technology. Active depth sensing technology may calculate depth information by actively emitting light sources, infrared rays, ultrasound waves, lasers, etc. as signals in combination with time difference ranging technology. Passive depth sensing technology can use two image sensors to capture two images in front of it at different viewing angles, and use the parallax of the two images to calculate depth information.

圖8是根據本發明實施例的穿透式顯示方法的流程圖，而圖8的方法流程可由圖7的穿透式顯示系統70來實現。在此，使用者可透過穿透式顯示系統70的顯示器130來觀看顯示器130後側的場景內容。 FIG8 is a flow chart of a transparent display method according to an embodiment of the present invention, and the method flow of FIG8 can be implemented by the transparent display system 70 of FIG7. Here, the user can view the scene content behind the display 130 through the display 130 of the transparent display system 70.

於本實施例中，場景影像的場景位置資訊包括關聯於立體參考座標系的三維網格場景資訊。三維網格場景資訊中網格節點於立體參考座標系中的Z座標值可根據場景深度而產生。 In this embodiment, the scene position information of the scene image includes three-dimensional grid scene information associated with a stereo reference coordinate system. The Z coordinate value of the grid node in the three-dimensional grid scene information in the stereo reference coordinate system can be generated according to the scene depth.

於步驟S810，處理器150透過第一影像感測器110朝顯示器130的前側擷取使用者影像，並透過第二影像感測器120朝顯示器130的後側擷取場景影像。於步驟S820，處理器150根據使用者影像獲取關聯於立體參考座標系的使用者位置資訊。此些步驟可參照前述實施例的說明，於此不再次贅述。 In step S810, the processor 150 captures a user image toward the front side of the display 130 through the first image sensor 110, and captures a scene image toward the back side of the display 130 through the second image sensor 120. In step S820, the processor 150 obtains user position information associated with a stereo reference coordinate system based on the user image. These steps can refer to the description of the aforementioned embodiment and will not be repeated here.

於步驟S830，處理器150獲取對應於場景影像的深度信息。舉例來說，深度圖中每個像素(或位置)的值可指示場景影像中對應像素(或位置)的深度值。處理器11可將深度圖作為對應於場景影像的深度信息。 In step S830, the processor 150 obtains depth information corresponding to the scene image. For example, the value of each pixel (or position) in the depth map may indicate the depth value of the corresponding pixel (or position) in the scene image. The processor 11 may use the depth map as the depth information corresponding to the scene image.

於一些實施例中，處理器150可利用深度感測器160獲取對應於場景影像的深度信息。或者，於另一些實施例中，處理器150可對場景影像執行影像預處理操作，以産生滿足深度學習模型的輸入要求的經調整場景影像。處理器150可透過深度學習模型來分析經調整場景影像獲得深度信息。 In some embodiments, the processor 150 may use the depth sensor 160 to obtain depth information corresponding to the scene image. Alternatively, in other embodiments, the processor 150 may perform image preprocessing operations on the scene image to generate an adjusted scene image that meets the input requirements of the deep learning model. The processor 150 may analyze the adjusted scene image through the deep learning model to obtain depth information.

在一些實施例中，儲存裝置140可儲存有深度學習模型。深度學習模型基於例如卷積神經網絡(Convolutional Neural Network，CNN)或類神經網絡等神經網絡結構來實施。深度學習模型用以估計(即，預測)場景影像中每個像素(或位置)的深度。此外，處理器150可對場景影像執行影像預處理操作，以產生滿足深度學習模型的輸入要求的經調整場景影像。舉例來說，在影像預處理操作中，處理器150可調整場景影像的大小和/或轉換場景影像的格式以產生經調整場景影像。處理器150透過深度學習模型來分析經調整場景影像，以獲得對應於第一影像的深度信息。舉例來說，處理器150可將經調整場景影像輸入到深度學習模型，然後接收深度學習模型關於經調整場景影像的輸出深度度圖。 In some embodiments, the storage device 140 may store a deep learning model. The deep learning model is implemented based on a neural network structure such as a convolutional neural network (CNN) or a neural network-like network. The deep learning model is used to estimate (i.e., predict) the depth of each pixel (or position) in the scene image. In addition, the processor 150 may perform an image preprocessing operation on the scene image to generate an adjusted scene image that meets the input requirements of the deep learning model. For example, in the image preprocessing operation, the processor 150 may adjust the size of the scene image and/or convert the format of the scene image to generate an adjusted scene image. The processor 150 analyzes the adjusted scene image through the deep learning model to obtain depth information corresponding to the first image. For example, the processor 150 may input the adjusted scene image into the deep learning model, and then receive the output depth map of the deep learning model with respect to the adjusted scene image.

於一些實施例中，處理器150可判斷用以感測場景深度信息的深度感測器160是否可用，或者判斷用以估計場景深度信息的深度學習模型是否可用。當深度感測器160可用或深度學習模型可用，處理器150可獲取場景影像的深度信息。 In some embodiments, the processor 150 may determine whether the depth sensor 160 for sensing scene depth information is available, or determine whether the depth learning model for estimating scene depth information is available. When the depth sensor 160 is available or the depth learning model is available, the processor 150 may obtain the depth information of the scene image.

當深度感測器160與深度學習模型其中至少一者是可用的，於步驟S840，處理器150根據深度信息與場景影像，產生關聯於立體參考座標系的三維網格場景資訊。在一實施例中，三維網格的網格節點在立體參考座標系的Z軸方向上的高度可視為三維網格節點的深度。否則，當深度感測器160與深度學習模型都不可用，處理器150可產生Z軸方向上的高度都相同之平面的網格資訊。 When at least one of the depth sensor 160 and the deep learning model is available, in step S840, the processor 150 generates three-dimensional grid scene information associated with the stereo reference coordinate system based on the depth information and the scene image. In one embodiment, the height of the grid node of the three-dimensional grid in the Z-axis direction of the stereo reference coordinate system can be regarded as the depth of the three-dimensional grid node. Otherwise, when neither the depth sensor 160 nor the deep learning model is available, the processor 150 can generate grid information of a plane with the same height in the Z-axis direction.

於步驟S850a，處理器150根據使用者位置資訊中的右眼位置資訊與顯示器130的實際尺寸，決定第一視錐體。於步驟S850b，處理器150根據使用者位置資訊中的左眼位置資訊與顯示器的實際尺寸，決定第二視錐體。處理器150決定視錐體的詳細操作可參照前述實施例的說明。須特別說明的是，於一些實施例中，處理器150可根據使用者影像分別定位使用者的右眼的右眼位置資訊與左眼的左眼位置資訊。右眼位置資訊與左眼位置資訊可分別為立體參考座標系中的左眼座標與右眼座標。於是，處理器150可分別跟左眼座標與右眼座標與顯示器130的實際尺寸來決定第一視錐體與第二視錐體。可知的，由於右眼座標不同於左眼座標，因此第一視錐體與第二視錐體所擷取的場景內容也會有所差異。 In step S850a, the processor 150 determines the first viewing cone according to the right eye position information in the user position information and the actual size of the display 130. In step S850b, the processor 150 determines the second viewing cone according to the left eye position information in the user position information and the actual size of the display. The detailed operation of the processor 150 determining the viewing cone can refer to the description of the aforementioned embodiment. It should be particularly noted that in some embodiments, the processor 150 can locate the right eye position information of the user's right eye and the left eye position information of the left eye according to the user image. The right eye position information and the left eye position information can be the left eye coordinates and the right eye coordinates in the stereo reference coordinate system, respectively. Therefore, the processor 150 can determine the first viewing cone and the second viewing cone according to the left eye coordinates, the right eye coordinates and the actual size of the display 130. It can be seen that since the right eye coordinates are different from the left eye coordinates, the scene contents captured by the first viewing cone and the second viewing cone will also be different.

於步驟S860a，處理器150利用第一視錐體的投影矩陣，根據場景位置資訊產生投影於顯示器130的顯示平面的右眼顯示圖幀。詳細來說，處理器150可將第一視錐體所擷取之場景影像中的三維網格節點投影至顯示器130的顯示平面上，以渲染出右眼顯示圖幀。 In step S860a, the processor 150 uses the projection matrix of the first viewing cone to generate a right-eye display frame projected on the display plane of the display 130 according to the scene position information. Specifically, the processor 150 may project the three-dimensional grid nodes in the scene image captured by the first viewing cone onto the display plane of the display 130 to render the right-eye display frame.

於步驟S860b，處理器150利用第二視錐體的投影矩陣，根據場景位置資訊產生投影於顯示器130的顯示平面的左眼顯示圖幀。詳細來說，處理器150可將第二視錐體所擷取之場景影像中的三維網格節點投影至顯示器130的顯示平面上，以渲染出左眼顯示圖幀。 In step S860b, the processor 150 uses the projection matrix of the second viewing cone to generate a left-eye display frame projected on the display plane of the display 130 according to the scene position information. Specifically, the processor 150 may project the three-dimensional grid nodes in the scene image captured by the second viewing cone onto the display plane of the display 130 to render the left-eye display frame.

須說明的是，處理器150是利用深度估測技術或深度感測技術而獲取三維網格節點的場景深度，再將具備深度信息的三維網格節點投影至顯示器130的顯示平面上。因此，顯示器130所呈現之場景內容可更為精確，不會因為缺乏深度信息而倒導致場景物件的感知位置出現不當偏移。 It should be noted that the processor 150 uses depth estimation technology or depth sensing technology to obtain the scene depth of the three-dimensional grid nodes, and then projects the three-dimensional grid nodes with depth information onto the display plane of the display 130. Therefore, the scene content presented by the display 130 can be more accurate, and the perceived position of the scene objects will not be improperly offset due to the lack of depth information.

於步驟S870，處理器150透過顯示器130輸出左眼顯示圖幀與右眼顯示圖幀，以顯示顯示器130的後側的場景。於一些實施例中，當顯示器130為裸視3D顯示器，處理器150可對左眼顯示圖幀與右眼顯示圖幀進行影像編織處理，以同步交錯顯示左眼顯示圖幀與右眼顯示圖幀。當顯示器130為眼鏡式3D顯示器，處理器150可控制顯示器130交替地顯示左眼顯示圖幀與右眼顯示圖幀。如此一來，使用者可感受到立體視覺效果。 In step S870, the processor 150 outputs the left eye display frame and the right eye display frame through the display 130 to display the scene behind the display 130. In some embodiments, when the display 130 is a naked eye 3D display, the processor 150 can perform image weaving processing on the left eye display frame and the right eye display frame to synchronously interlace and display the left eye display frame and the right eye display frame. When the display 130 is a glasses-type 3D display, the processor 150 can control the display 130 to alternately display the left eye display frame and the right eye display frame. In this way, the user can experience a stereoscopic visual effect.

基於上述內容，於本發明實施例中，可根據使用者影像與場景影像獲取同一立體參考座標系下的使用者位置資訊與場景位置資訊。用以決定顯示圖幀之顯示內容的視錐體可基於使用者位置資訊與顯示器的實際尺寸而定。於是，顯示器所輸出的顯示圖幀的顯示場景內容可反應於使用者移動而改變，並與顯示器周遭的實際場景達到良好的對齊。此外，當使用場景深度信息將場景影像的內容投影至顯示平面，顯示器所呈現之場景內容可更為貼近實際場景。 Based on the above, in the embodiment of the present invention, the user position information and the scene position information in the same stereo reference coordinate system can be obtained based on the user image and the scene image. The viewing cone used to determine the display content of the display frame can be based on the user position information and the actual size of the display. Therefore, the display scene content of the display frame output by the display can change in response to the user's movement and achieve good alignment with the actual scene around the display. In addition, when the scene depth information is used to project the content of the scene image onto the display plane, the scene content presented by the display can be closer to the actual scene.

S210~S260:步驟 S210~S260: Steps

Claims

A method for transparent display, applicable to a transparent display system including a first image sensor, a second image sensor and a display, comprising: Capturing a user image toward the front side of the display through the first image sensor, and capturing a scene image toward the back side of the display through the second image sensor; Obtaining user position information associated with a stereo reference coordinate system based on the user image; Obtaining scene position information associated with the stereo reference coordinate system based on the scene image; Determining a viewing frustum based on the user position information and the actual size of the display; Using the projection matrix of the viewing frustum, generating a display frame projected on the display plane of the display according to the scene position information; and The display frame is outputted through the display to display the scene behind the display.

As described in claim 1, the second image sensor includes a fisheye lens or a wide-angle lens, and the fisheye lens or the wide-angle lens is used to capture the scene image. Before the step of obtaining the scene position information associated with the stereo reference coordinate system based on the scene image, the method further includes: performing a deformation correction process on the scene image.

A transparent display method as described in claim 1, wherein the first coordinate axis and the second coordinate axis of the three-dimensional reference coordinate system are parallel to the display plane of the display, and the origin of the three-dimensional reference coordinate system is a reference point on the display plane.

A transparent display method as described in claim 1, wherein the viewing cone changes in response to changes in the user's location information.

The transparent display method as described in claim 1, wherein the step of obtaining the scene position information associated with the stereo reference coordinate system according to the scene image comprises: Establishing the stereo reference coordinate system based on the display plane of the display; Determining the external parameters of the second image sensor according to the spatial position relationship between the second image sensor and the display; and Performing coordinate conversion on the scene pixel coordinates in the scene image according to the internal parameters of the second image sensor and the external parameters to obtain the scene position information associated with the stereo reference coordinate system.

A transparent display method as described in claim 1, wherein the user position information includes a user coordinate associated with the stereo reference coordinate system, and the viewing cone is obtained by connecting the user coordinate and multiple vertices of the display plane.

As described in claim 1, the scene position information includes three-dimensional grid scene information associated with the stereo reference coordinate system, and the step of obtaining the scene position information associated with the stereo reference coordinate system based on the scene image includes: Obtaining depth information corresponding to the scene image; and Generating the three-dimensional grid scene information associated with the stereo reference coordinate system based on the depth information and the scene image.

In the transparent display method as described in claim 7, the step of obtaining the depth information corresponding to the scene image includes: performing image preprocessing operations on the scene image to generate an adjusted scene image that meets the input requirements of the deep learning model; and analyzing the adjusted scene image through the deep learning model to obtain the depth information.

In the transparent display method as described in claim 7, the step of obtaining the depth information corresponding to the scene image includes: Using a depth sensor to obtain the depth information corresponding to the scene image.

The transparent display method as described in claim 1, wherein the display includes a stereoscopic display that provides right-eye display frames and left-eye display frames, and the step of determining the viewing cone according to the user position information and the actual size of the display includes: Determining a first viewing cone according to the right eye position information in the user position information and the actual size of the display; and Determining a second viewing cone according to the left eye position information in the user position information and the actual size of the display, wherein using the projection matrix of the viewing cone, the step of generating the display frame projected on the display plane of the display according to the scene position information includes: Using the projection matrix of the first viewing cone, the right eye display frame is generated according to the scene position information and projected on the display plane of the display; and Using the projection matrix of the second viewing cone, the left eye display frame is generated according to the scene position information and projected on the display plane of the display.

A transparent display system comprises: a first image sensor; a second image sensor; a display; and at least one processor coupled to the first image sensor, the second image sensor and the display, wherein the at least one processor is used to: capture a user image toward the front side of the display through the first image sensor, and capture a scene image toward the back side of the display through the second image sensor; obtain user position information associated with a stereo reference coordinate system based on the user image; obtain scene position information associated with the stereo reference coordinate system based on the scene image; determine a viewing cone based on the user position information and the actual size of the display; Using the projection matrix of the viewing cone, a display frame projected on the display plane of the display is generated according to the scene position information; and outputting the display frame through the display to display the scene behind the display.

A transparent display system as described in claim 11, wherein the second image sensor includes a fisheye lens or a wide-angle lens, the fisheye lens or the wide-angle lens is used to capture the scene image, and the at least one processor is used to: Perform a deformation correction process on the scene image.

A transmissive display system as described in claim 11, wherein the first coordinate axis and the second coordinate axis of the three-dimensional reference coordinate system are parallel to the display plane of the display, and the origin of the three-dimensional reference coordinate system is a reference point on the display plane.

A see-through display system as described in claim 11, wherein the viewing cone changes in response to changes in the user's location information.

A transparent display system as described in claim 11, wherein the at least one processor is used to: establish the stereo reference coordinate system based on the display; determine the external parameters of the second image sensor according to the spatial position relationship between the second image sensor and the display; and perform coordinate conversion on the scene pixel coordinates in the scene image according to the internal parameters of the second image sensor and the external parameters to obtain the scene position information associated with the stereo reference coordinate system.

A see-through display system as described in claim 11, wherein the user position information includes a user coordinate associated with the stereo reference coordinate system, and the viewing cone is obtained by connecting the user coordinate and multiple vertices of the display plane.

A see-through display system as described in claim 11, wherein the scene position information includes a three-dimensional grid scene information associated with the stereo reference coordinate system, and the at least one processor is used to: obtain depth information corresponding to the scene image; and generate the three-dimensional grid scene information associated with the stereo reference coordinate system based on the depth information and the scene image.

A see-through display system as described in claim 17, wherein the at least one processor is used to: perform image preprocessing operations on the scene image to generate an adjusted scene image that meets the input requirements of the deep learning model; and analyze the adjusted scene image through the deep learning model to obtain the depth information.

The transparent display system as described in claim 17 further includes a depth sensor, and the at least one processor is connected to the depth sensor and is used to: Utilize the depth sensor to obtain the depth information corresponding to the scene image.

A transparent display system as described in claim 11, wherein the display includes a stereoscopic display that provides right-eye display frames and left-eye display frames, and the at least one processor is used to: Determine a first viewing cone based on the right-eye position information in the user position information and the actual size of the display; and Determine a second viewing cone based on the left-eye position information in the user position information and the actual size of the display, wherein the at least one processor is used to: Generate the right-eye display frame projected on the display plane of the display based on the scene position information using the projection matrix of the first viewing cone; and Generate the left-eye display frame projected on the display plane of the display based on the scene position information using the projection matrix of the second viewing cone.