TWI884034B - Panorama-based three-dimensional scene reconstruction system and method therefor - Google Patents
Panorama-based three-dimensional scene reconstruction system and method therefor Download PDFInfo
- Publication number
- TWI884034B TWI884034B TW113126362A TW113126362A TWI884034B TW I884034 B TWI884034 B TW I884034B TW 113126362 A TW113126362 A TW 113126362A TW 113126362 A TW113126362 A TW 113126362A TW I884034 B TWI884034 B TW I884034B
- Authority
- TW
- Taiwan
- Prior art keywords
- image
- path
- panorama
- scene
- stereoscopic
- Prior art date
Links
Images
Landscapes
- Image Processing (AREA)
Abstract
Description
說明書公開一種建立立體網格的技術,其中特別是一種從連續全景圖中重構立體網格並讓使用者可以沿著拍攝路徑進行移動的一種基於全景圖重建立體場景的系統與方法。The specification discloses a technology for establishing a stereo grid, particularly a system and method for reconstructing a stereo grid from a continuous panorama and allowing a user to move along a shooting path to reconstruct a stereo scene based on a panorama.
立體網格重建(3D Mesh Reconstruction)技術用於室內場景或戶外場景網格重建,在多數方法中,通過掃描器的各種感測器掃描特定場景環境,構建出描述場景以及其中物件的點雲(point cloud),接著再通過網格重建的演算法重構成對應的立體網格(mesh),重構的立體網格提供虛擬實境(virtual reality,VR)或擴增實境(augmented reality,AR)的相關應用,例如導覽系統或者地圖導航等。3D Mesh Reconstruction technology is used to reconstruct meshes of indoor or outdoor scenes. In most methods, various sensors of a scanner are used to scan a specific scene environment to construct a point cloud describing the scene and the objects therein. The point cloud is then reconstructed into a corresponding 3D mesh through a mesh reconstruction algorithm. The reconstructed 3D mesh provides related applications of virtual reality (VR) or augmented reality (AR), such as guide systems or map navigation.
習知立體網格重建技術主要流程包括,先以掃描器掃描環境影像,藉此取得環境中各個方位角度的影像,形成描述一個空間內多個資料點的點雲(point cloud),點雲即多個資料點形成的離散資料集(discrete set of data points),其中每個資料點包括空間內每個點的坐標描述與影像資料,並可包括深度資訊。之後即根據點雲資訊進行網格重建(mesh reconstruction),其中運用演算法將點雲重構成網格(mesh),配合影像資料可建立一個虛擬實境(VR)或擴增實境(AR)。The main process of the 3D mesh reconstruction technology includes first scanning the environment image with a scanner to obtain images of various azimuth angles in the environment, forming a point cloud that describes multiple data points in a space. The point cloud is a discrete set of data points formed by multiple data points, where each data point includes the coordinate description and image data of each point in the space, and may include depth information. Then, mesh reconstruction is performed based on the point cloud information, in which an algorithm is used to reconstruct the point cloud into a mesh, and a virtual reality (VR) or augmented reality (AR) can be established with the image data.
如一種Matterport建立立體影像的習知技術,適合重建室內場景或戶外場景。其中,在重建立體影像的過程中,需要使用掃描器(手機、360度相機,或其他)掃描環境,並藉此建立每個拍攝點的環境影像。因為Matterport技術拍攝時是通過定點拍攝,使用者需要選擇一個場景內的「固定站點」,以在每個固定站點進行環境影像掃描,取得每個站點的全景圖(panorama),在立體場景中的使用者能夠位移的站點數量實際上是取決於使用者拍攝了多少資料。最終可形成一個場景的立體模型,可用於虛擬實境或擴增實境的應用。For example, Matterport is a known technology for creating stereoscopic images, which is suitable for reconstructing indoor or outdoor scenes. In the process of reconstructing stereoscopic images, a scanner (mobile phone, 360-degree camera, or other) is needed to scan the environment and thereby create an environmental image of each shooting point. Because Matterport technology shoots at fixed points, users need to select a "fixed site" in a scene to scan the environment at each fixed site and obtain a panorama of each site. The number of sites that users can move in a stereoscopic scene actually depends on how much data the user has shot. Ultimately, a three-dimensional model of the scene can be formed, which can be used in virtual reality or augmented reality applications.
當完成一個立體場景內多個站點掃描後,使用者可以在所建立的立體模型描述的立體空間內瀏覽,使用者從初始站點開始,可在立體空間內朝著下一個目標站點移動,Matterport技術使用兩個站點所對應的全景圖,在立體模型中進行不同站點之間的切換。然而,因為立即切換兩個站點對應的全景圖,使得使用者在不同站點之間移動時會體驗不連續的影像切換過程,其中各種圖像也會產生扭曲的現象。After scanning multiple sites in a stereoscopic scene, users can browse in the stereoscopic space described by the established stereoscopic model. Starting from the initial site, users can move towards the next target site in the stereoscopic space. Matterport technology uses the panorama corresponding to the two sites to switch between different sites in the stereoscopic model. However, because the panorama corresponding to the two sites is switched immediately, users will experience a discontinuous image switching process when moving between different sites, and various images will also be distorted.
為了改善習知技術運用一個場景內多個離散站點的全景圖所建立的立體影像讓使用者在不同站點之間切換時會產生不連續影像切換的情況,揭露書提出一種基於全景圖重建立體場景的系統與方法,能夠解決習知不同站點之間影像切換不連續以及圖像扭曲的問題,以能提供使用者能在立體空間內自由移動,改善使用者在瀏覽立體空間內影像的體驗。In order to improve the problem that the stereoscopic images created by the known technology using the panorama of multiple discrete stations in a scene will cause discontinuous image switching when the user switches between different stations, the disclosure document proposes a system and method for reconstructing a stereoscopic scene based on a panorama, which can solve the problems of discontinuous image switching and image distortion between different stations, so as to provide users with free movement in the stereoscopic space and improve the user's experience in browsing images in the stereoscopic space.
根據基於全景圖重建立體場景的方法的實施例,在方法中,先取得一場景內多張圖像,並拼接多張圖像得出場景的全景圖,可以對場景內路徑上所有的全景圖執行語義分割與深度預測。之後根據語義分割的結果得出場景內路徑像素以及根據深度預測得出的路徑深度得出的各路徑的點雲,並對所有點雲執行點雲對齊。在找尋路徑輪廓時,將各路徑的點雲投影至一平面圖像,可根據平面圖像得出各路徑的輪廓邊緣,可依照各路徑的輪廓邊緣進行三角化演算,以得出平面路徑網格,最後沿著平面路徑網格建立場景的牆壁以及/或天花板,以建立場景的一立體網格。According to an embodiment of a method for reconstructing a stereoscopic scene based on a panorama, in the method, multiple images in a scene are first obtained, and the multiple images are stitched together to obtain a panorama of the scene. Semantic segmentation and depth prediction can be performed on all panoramas on the paths in the scene. Then, the point clouds of each path obtained based on the path pixels in the scene and the path depth obtained based on the depth prediction are obtained, and point cloud alignment is performed on all point clouds. When finding the path contour, the point cloud of each path is projected onto a plane image, and the contour edge of each path can be obtained based on the plane image. Triangulation can be performed according to the contour edge of each path to obtain a plane path grid, and finally the walls and/or ceiling of the scene are established along the plane path grid to establish a three-dimensional grid of the scene.
優選地,可由一或多個魚眼鏡頭組成的攝影系統拍攝場景內多張圖像。Preferably, a photographic system consisting of one or more fisheye lenses can be used to capture multiple images of a scene.
進一步地,可對以攝影系統拍攝的多張圖像進行魚眼校正,其中流程包括:對以一或多個魚眼鏡頭拍攝的每張圖像的每個像素點投影至一球體的坐標系,再將球體上的每個像素點投影至等距長方的坐標系,即完成魚眼校正。Furthermore, fisheye correction can be performed on multiple images taken by a photographic system, wherein the process includes: projecting each pixel of each image taken by one or more fisheye lenses to a spherical coordinate system, and then projecting each pixel on the sphere to an equidistant rectangular coordinate system, thereby completing the fisheye correction.
進一步地,可通過圖割演算法對相鄰的圖像進行接縫切割,找出相鄰圖像中最低損失接縫,再以多波段融合演算法反覆地將相鄰圖像融合成一張圖像,以形成全景圖。Furthermore, the adjacent images can be seam-cut by using a graph cutting algorithm to find the lowest loss seam in the adjacent images, and then the adjacent images can be repeatedly fused into one image by using a multi-band fusion algorithm to form a panoramic image.
利用各路徑的點雲,以同時定位與地圖建構技術估算出拍攝多張圖像的攝影系統的空間坐標與旋轉角度,再將點雲從攝影系統的坐標系轉換至世界坐標系,以將所有點雲進行對齊,得到一個完整路徑並更平滑的點雲。Using the point clouds of each path, the spatial coordinates and rotation angles of the camera system that took multiple images are estimated using simultaneous localization and mapping technology. The point clouds are then converted from the coordinate system of the camera system to the world coordinate system to align all the point clouds to obtain a complete path and smoother point cloud.
進一步地,對每一張全景圖進行語義分割時,可以語義分割之分層多尺度注意機制的預訓練模型對全景圖進行語義分割,其中運用具有大量高解析度影像的資料集所標註的類別對全景圖進行預測,得出語義分割結果,藉此取得場景內路徑的輪廓。Furthermore, when semantically segmenting each panorama, a pre-trained model of a hierarchical multi-scale attention mechanism for semantic segmentation can be used to perform semantic segmentation on the panorama, wherein the categories annotated by a dataset with a large number of high-resolution images are used to predict the panorama and obtain the semantic segmentation result, thereby obtaining the contour of the path within the scene.
進一步地,以平面圖像得出各路徑的輪廓邊緣的步驟包括,根據點雲的尺寸建立對應尺寸的遮罩圖像,將點雲的像素投影至遮罩圖像上,並以此遮罩圖像分析出點雲中路徑大致的輪廓邊緣,之後可對輪廓邊緣的縫隙進行填補,以取得點雲的路徑的輪廓邊緣。Furthermore, the step of obtaining the contour edge of each path using the plane image includes establishing a mask image of corresponding size according to the size of the point cloud, projecting the pixels of the point cloud onto the mask image, and analyzing the approximate contour edge of the path in the point cloud using the mask image, and then filling the gaps in the contour edge to obtain the contour edge of the path of the point cloud.
並且,在建立立體網格之後,可以將全景圖投影至立體網格,以全景圖作為立體網格的貼圖,移動過程同時切換全景圖。Furthermore, after creating a 3D grid, you can project a panorama onto the 3D grid and use the panorama as a texture for the 3D grid, switching the panorama during movement.
為使能更進一步瞭解本發明的特徵及技術內容,請參閱以下有關本發明的詳細說明與圖式,然而所提供的圖式僅用於提供參考與說明,並非用來對本發明加以限制。To further understand the features and technical contents of the present invention, please refer to the following detailed description and drawings of the present invention. However, the drawings provided are only used for reference and description and are not used to limit the present invention.
以下是通過特定的具體實施例來說明本發明的實施方式,本領域技術人員可由本說明書所公開的內容瞭解本發明的優點與效果。本發明可通過其他不同的具體實施例加以施行或應用,本說明書中的各項細節也可基於不同觀點與應用,在不悖離本發明的構思下進行各種修改與變更。另外,本發明的附圖僅為簡單示意說明,並非依實際尺寸的描繪,事先聲明。以下的實施方式將進一步詳細說明本發明的相關技術內容,但所公開的內容並非用以限制本發明的保護範圍。The following is a specific embodiment to illustrate the implementation of the present invention. The technical personnel in this field can understand the advantages and effects of the present invention from the content disclosed in this specification. The present invention can be implemented or applied through other different specific embodiments. The details in this specification can also be modified and changed in various ways based on different viewpoints and applications without deviating from the concept of the present invention. In addition, the drawings of the present invention are only for simple schematic illustration and are not depicted according to actual size. Please note in advance. The following implementation will further explain the relevant technical content of the present invention in detail, but the disclosed content is not used to limit the scope of protection of the present invention.
應當可以理解的是,雖然本文中可能會使用到“第一”、“第二”、“第三”等術語來描述各種元件或者訊號,但這些元件或者訊號不應受這些術語的限制。這些術語主要是用以區分一元件與另一元件,或者一訊號與另一訊號。另外,本文中所使用的術語“或”,應視實際情況可能包括相關聯的列出項目中的任一個或者多個的組合。It should be understood that, although the terms "first", "second", "third", etc. may be used in this document to describe various components or signals, these components or signals should not be limited by these terms. These terms are mainly used to distinguish one component from another component, or one signal from another signal. In addition, the term "or" used in this document may include any one or more combinations of the related listed items depending on the actual situation.
揭露書關於一種基於全景圖重建立體場景的系統與方法,其目的之一是解決習知不連續全景圖切換過程中所造成的圖像扭曲問題,如此,所述基於全景圖重建立體場景的系統能讓使用者在立體空間內平順地移動瀏覽,而改善使用者在立體空間內移動瀏覽過程的體驗。The disclosure relates to a system and method for reconstructing a stereoscopic scene based on a panorama, one of the purposes of which is to solve the image distortion problem caused by the known discontinuous panorama switching process. In this way, the system for reconstructing a stereoscopic scene based on a panorama can allow a user to move and browse smoothly in a stereoscopic space, thereby improving the user's experience of moving and browsing in a stereoscopic space.
圖1顯示基於全景圖重建立體場景的系統架構實施例圖。FIG. 1 shows a system architecture embodiment diagram for reconstructing a stereoscopic scene based on a panoramic image.
基於全景圖重建立體場景的系統主要是以電腦系統實現,配合圖式中的影像擷取裝置11拍攝一個室內場景的多張圖像,接著再以影像處理裝置100執行影像校正、圖像對齊、圖像拼接、網格重建、語義分割、深度預測、建立點雲、貼圖等步驟建立一個空間的立體模型,完成後可輸出至儲存裝置13儲存檔案,並用於其他應用。在此說明的是,系統主要由影像擷取裝置11與影像處理裝置100所實現,影像擷取裝置11與影像處理裝置100可以為獨立的兩個裝置,亦可設計為一個裝置內的兩個子系統,彼此相連,並通過與軟體的協同作業實現基於全景圖重建立體場景的方法。The system for reconstructing a three-dimensional scene based on a panoramic image is mainly implemented by a computer system, and cooperates with the
所述影像擷取裝置11可以採用360度相機,並可以一或多個魚眼鏡頭組成的攝影系統實現,通過影像擷取裝置11拍攝一個場景內一或多個位置的多張圖像,並利用多張圖像經拼接方法形成全景圖(panorama),再以演算法重建出立體網格(3D mesh)。根據實施例之一,運用影像擷取裝置11在拍攝場景時,可沿著場景的道路或是系統設定的路線形成的路徑移動以沿路拍攝多個方位的多張圖像,藉此捕捉場景的資訊。所採用以一或多個魚眼鏡頭組成的360度相機可在每個位置拍攝多張圖像,除了可以具有運算能力的攝影系統執行後續影像處理外,圖示的實施例係經影像處理裝置100的輸入單元103輸入至影像處理裝置100後,通過其中處理單元101執行影像處理,完成立體網格後,除了可儲存於影像處理裝置100內的儲存媒體外,亦可通過輸出單元105輸出至儲存裝置13。The
在此一提的是,當以一或多個魚眼鏡頭拍攝多張圖像後,相鄰的圖像彼此之間可能會包括重疊區域,因此可能導致圖像對不齊的情況出現,因此運用單顆魚眼鏡頭移動拍攝多張圖像,或是運用多顆魚眼鏡頭拍攝多張圖像,可以藉由拼接程序以及適當處理重疊區域後形成描述場景的一張涵蓋場景的全景圖。It is worth mentioning that when multiple images are taken with one or more fisheye lenses, adjacent images may include overlapping areas, which may cause image misalignment. Therefore, by using a single fisheye lens to move and take multiple images, or by using multiple fisheye lenses to take multiple images, a panoramic image covering the scene can be formed by a stitching process and appropriate processing of the overlapping areas.
運行於上述影像處理裝置內的功能可參考圖2所示運行基於全景圖重建立體場景的方法的功能模組實施例圖,圖示中的功能模組主要由電腦系統中的硬體與軟體協同作業所實現,其中步驟可同時參考圖3所示的方法實施例流程圖。The functions running in the above-mentioned image processing device can refer to the functional module implementation example diagram of the method for reconstructing a stereoscopic scene based on a panoramic image shown in Figure 2. The functional modules in the figure are mainly implemented by the collaborative operation of hardware and software in the computer system, and the steps can also refer to the method implementation example flow chart shown in Figure 3.
系統包括影像輸入單元(image input unit)21,影像輸入單元21如圖1所示的影像擷取裝置11,運用由一或多個魚眼鏡頭組成的攝影系統或各種360度相機拍攝一場景內多張圖像(步驟S301),其中相鄰圖像可能有重疊區域時,可以進一步通過圖像拼接單元(panorama image stitching unit)23將兩兩相鄰的圖像進行接縫切割處理,其中之一實施方式是使用一種圖割(Graph-Cut)演算法,通過圖割演算法進行接縫切割,其中的方法參考圖4所示實施例流程圖。主要步驟是先取得相鄰兩張圖像中找出重疊區域,並在重疊的區域中找出一條最低損失接縫,並在找出最低損失接縫後使用一多波段融合(multiband blending)演算法以將相鄰圖像融合成一張圖像,經重複這些步驟後得出融合多張圖像的一張全景圖,其中可能需要以多張全景圖描述一個場景。關於接縫切割的實施例可參考圖7。The system includes an
之後輸出全景圖至網格重建模組(mesh reconstruction module)20,通過其中語義分割單元(semantic segmentation unit)201對場景內多張中的每一張全景圖執行語義分割,以能得出場景內所有路徑的像素,以及以深度估測單元(depth estimation unit)203進行深度預測,藉此估算出個路徑的深度(步驟S303)。The panorama is then output to the
所述語義分割實施例描述如下。The semantic segmentation embodiment is described as follows.
在重建立體網格的目的下,需要得出場景內路徑的輪廓,其中採用的方法為對全景圖進行語義分割,先辨識全景圖的所有像素類別,以進行分類。根據實施例之一,可採用由Andre Tao等人所提出的語義分割之分層多尺度注意機制以對所有全景圖進行語義分割,其中以語義分割之分層多尺度注意機制的預訓練模型對全景圖進行語義分割,根據實施例之一,模型訓練時所使用的資料集可以是MapillaryVitas,此資料集是一個大場景的街景數據,其中包括大量高解析的彩色圖像,並標註了多項類別。因此,可將所得出的全景圖輸入上述演算法以進行預測,取得全景圖分類,並得出語義分割結果。In order to reconstruct the stereo grid, it is necessary to obtain the outline of the path in the scene, and the method used is to perform semantic segmentation on the panorama, first identify all pixel categories of the panorama for classification. According to one embodiment, the hierarchical multi-scale attention mechanism for semantic segmentation proposed by Andre Tao et al. can be used to perform semantic segmentation on all panoramas, wherein the pre-trained model of the hierarchical multi-scale attention mechanism for semantic segmentation is used to perform semantic segmentation on the panorama. According to one embodiment, the data set used for model training can be MapillaryVitas, which is a street view data of a large scene, including a large number of high-resolution color images, and multiple categories are annotated. Therefore, the obtained panorama can be input into the above algorithm for prediction, to obtain the classification of the panorama, and to obtain the semantic segmentation result.
所述深度預測實施例描述如下。The depth prediction embodiment is described as follows.
當從全景圖得出所有路徑的像素之後,再對全景圖進行深度預測,其中參考由Michael等人所提出的基於小波分解的單幅圖像深度預測來評估深度圖,使用基於小波分解的單幅圖像深度預測所提供的Resnet-18預訓練模型進行深度預測,模型所使用的資料集由大量校準的立體視頻所組成,可以據此對全景圖進行深度預測。After obtaining the pixels of all paths from the panorama, the depth of the panorama is predicted. The depth map is evaluated by referring to the single image depth prediction based on wavelet decomposition proposed by Michael et al. The Resnet-18 pre-trained model provided by the single image depth prediction based on wavelet decomposition is used for depth prediction. The data set used by the model consists of a large number of calibrated stereoscopic videos, based on which the depth of the panorama can be predicted.
之後,通過點雲建立單元(point cloud building unit)207,根據語義分割的結果得出場景內每一條路徑的像素,以及根據深度預測得出的路徑深度,建立各路徑的點雲(步驟S305)。Afterwards, the point
在此一提的是,當以上述語義分割和深度預測所預測的結果來建立點雲,取得場景內所有全景圖屬於路徑(包括各式道路)的所有像素,可以場景中路徑真實的輪廓作為網格底部進行重建,並只對屬於路徑的像素通過對應深度圖的像素位置的深度轉換至立體空間的點雲。然而,通過深度圖所建立的點雲會是一個粗糙且不平滑的點雲,因此可以先估測拍攝圖像的攝影系統的位置(高度),根據攝影系統的位置對點雲進行重新取樣,以得到一個更平滑的點雲。It is worth mentioning that when the point cloud is established with the results predicted by the above semantic segmentation and depth prediction, all pixels belonging to the path (including various roads) in all panoramas in the scene are obtained, and the actual outline of the path in the scene can be used as the grid bottom for reconstruction, and only the pixels belonging to the path are converted to the point cloud in the three-dimensional space through the depth of the pixel position corresponding to the depth map. However, the point cloud established by the depth map will be a rough and non-smooth point cloud, so the position (height) of the camera system that took the image can be estimated first, and the point cloud can be resampled according to the position of the camera system to obtain a smoother point cloud.
根據實施例,在對點雲進行對齊前,需要知道拍攝場景內圖像的攝影系統在取得每一張圖像的精確對應的位置,其中運用同時定位與地圖建構單元(simultaneous localization and mapping (SLAM) unit)205,以同時定位與地圖建構(SLAM) 的技術估算出攝影系統取得每一圖像的對應位置,可以空間坐標與旋轉角度描述。根據實施例,可以使用OpenVSLAM演算法。其中,由於同時定位與地圖建構的尺度單位可能不是正確對應深度的尺度單位,如圖所示的實施例,通過同時定位與地圖建構尺度估測單元(SLAM scale estimation unit)209計算同時定位與地圖建構的縮放尺度,以對同時定位與地圖建構的路徑進行等比例縮放。According to an embodiment, before aligning the point cloud, it is necessary to know the exact corresponding position of each image obtained by the camera system of the image shooting scene, wherein a simultaneous localization and mapping (SLAM) unit 205 is used to estimate the corresponding position of each image obtained by the camera system using the technology of simultaneous localization and mapping (SLAM), which can be described by spatial coordinates and rotation angles. According to an embodiment, the OpenVSLAM algorithm can be used. Among them, since the scale unit of simultaneous localization and mapping may not be the scale unit of the correct corresponding depth, as shown in the embodiment, the scaling scale of simultaneous localization and mapping is calculated by the simultaneous localization and mapping scale estimation unit (SLAM scale estimation unit) 209 to scale the path of simultaneous localization and mapping in proportion.
通過上述尺度縮放,使同時定位與地圖建構的尺度與現實的尺度一致,點雲才會正確的對齊。在對所有點雲進行點雲對齊的步驟中,通過點雲對齊單元(point cloud alignment unit)211,利用各路徑的每個點雲,以同時定位與地圖建構單元205所估算出的攝影系統的空間坐標與旋轉角度,例如取得影像擷取裝置在場景內拍攝圖像時的高度與轉向。之後,可將點雲從攝影系統的坐標系轉換至世界坐標系,以將所有點雲進行對齊,最後可得到一個完整路徑並更平滑的點雲(步驟S307)。Through the above-mentioned scaling, the scale of the simultaneous positioning and mapping construction is consistent with the actual scale, so that the point cloud will be correctly aligned. In the step of aligning all point clouds, the point
當完整地建立場景內各路徑的點雲後,將路徑上的點雲進行網格重建,其中方法是通過以輪廓找尋單元(contour finding unit)213將點雲投影到一個平面圖像上,通過此平面圖像可分析出路徑大致的輪廓邊緣(步驟S309),相關實施例可參考圖8顯示的流程。之後,通過三角定位單元(triangulation unit)215依照輪廓的邊緣作為約束,實施例之一是使用約束德洛涅三角化(Constrainted Delaunay Triangulation)演算法進行三角化演算,建立出一個平面路徑網格(步驟S311)。After the point cloud of each path in the scene is completely established, the point cloud on the path is reconstructed into a grid, wherein the point cloud is projected onto a plane image by a
當取得平面路徑網格,通過立體網格建立單元(3D mesh building unit)217,以室內場景為例,可沿著平面路徑網格的邊緣建立室內場景內的牆壁,以及/或可加上室內場景的天花板(步驟S313),以完成立體網格重建(步驟S315)。When the plane path mesh is obtained, the 3D
在建立立體網格之後,可以將拍攝或掃描場景得到的其他元素,根據攝影系統的位置,通過紋理投影單元(texture projective unit)219以當下的全景圖對立體網格模型進行投影,對立體網格進行貼圖。在播放經過立體網格重建的立體場景影片時,對於最後貼圖的步驟,可根據當前的位置,以當前全景圖對模型進行紋理投影(projective texture),而貼圖是透過全景圖影片的切換的方式獲得的。After the stereo grid is established, other elements obtained by shooting or scanning the scene can be projected onto the stereo grid model with the current panorama through the texture
當完成重建立體網格之後,可運用立體網格模型所實現的立體場景在各種應用25上。舉例來說,可在立體場景內置放導航箭頭、隨機行走的非玩家角色(non-player character,NPC)與一些立體互動物件,讓使用者能夠在此立體場景內自由地沿著路徑向前或向後移動,或執行360度旋轉,讓使用者能夠到處觀察四周場景,並且能夠隨時與場景裡的物件做一些互動。After the reconstruction of the stereo grid is completed, the stereo scene realized by the stereo grid model can be used in various applications25. For example, navigation arrows, random non-player characters (NPCs) and some stereo interactive objects can be placed in the stereo scene, allowing the user to move forward or backward freely along the path in the stereo scene, or perform 360-degree rotation, allowing the user to observe the surrounding scene everywhere and interact with the objects in the scene at any time.
針對上述圖像接縫切割的步驟,其中的方法參考圖4所示實施例流程圖。Regarding the above-mentioned image seam cutting step, the method thereof refers to the flow chart of the embodiment shown in FIG. 4 .
根據實施例之一,當通過由一或多個魚眼鏡頭組成的攝影系統拍攝場景取得多張圖像,需要先進行魚眼校正(步驟S401),魚眼校正主要分為兩個步驟,可參考圖5顯示魚眼校正的實施例示意圖。如圖5(A)顯示有一張魚眼圖像50,其中每個像素點以P(x’, y’)表示;再如圖5(B)所示,將每個像素點P(x’, y’)投影至球體的坐標系上,並轉換以立體坐標P(x, y, z)描述像素點;接著如圖5(C)所示,每個像素的立體坐標再從球體投影至等距長方(Equirectangular)的坐標系,每個像素以坐標P(x”, y”)描述,即完成魚眼校正。According to one embodiment, when a scene is photographed by a photography system composed of one or more fisheye lenses to obtain multiple images, fisheye correction is required first (step S401). The fisheye correction is mainly divided into two steps. The schematic diagram of the embodiment of fisheye correction can be referred to in FIG5. As shown in FIG5(A), a
當得到校正後的圖像後,接著進行圖像對齊(步驟S403),根據實施例之一,可採用Yu-Sheng Chen等人提出的全局相似先驗自然圖像拼接(Natural Image Stitching with the Global Similarity Prior,NISwGSP)演算法進行圖像對齊與網格變形。After obtaining the corrected image, the image is then aligned (step S403). According to one embodiment, the Natural Image Stitching with the Global Similarity Prior (NISwGSP) algorithm proposed by Yu-Sheng Chen et al. can be used to perform image alignment and grid deformation.
舉例來說,可參考圖6所示拼接4張圖像的實施範例圖,取得4張校正後的圖像,分別為第一圖61、第二圖62、第三圖63以及第四圖64,因為需要將所有圖像拼接成360度全景圖,表示在拼接的過程將第一圖61與第四圖64進行對齊,此例所示方法將第一圖61裁切成左右兩側,成為第一左側圖611與第一右側圖612,將左半部的第一左側圖611像當做第五張圖像,與第四圖64拼接,即以此第一左側圖611作為輸入,使用上述NISwGSP演算法進行對齊並做裁剪處理,最後輸出成五張對齊與變形後的圖像。For example, referring to the implementation example diagram of stitching four images shown in FIG6 , four corrected images are obtained, namely the
接著,對影像進行顏色校正(color correction)(步驟S405),由於不同的鏡頭拍攝出來的圖像可能會因為光照角度的不同或其他因素造成圖像之間存在著色差,造成在拼接的結果上有圖像不連續的情況,因此需要進行顏色校正。Next, color correction is performed on the image (step S405). Since images taken by different lenses may have color differences due to different lighting angles or other factors, resulting in discontinuity in the stitched image, color correction is required.
當圖像完成上述圖像對齊(image alignment)後,對多張圖像進行圖像融合,產生一張360度的全景圖。然而,在很多情況下,對圖像進行變形對齊時,可能因為不同魚眼鏡頭之間的視差形成不能完全準確對齊圖像的問題,因此如果針對圖像之間的重疊區域做影像圖像融合(image blending),時常會出現明顯的偽影(artifacts)。為了解決此問題,根據實施例,可對圖像進行接縫切割(seam-cutting),以找到最佳接縫將每張圖像複製到接縫的相應的位置以消除偽影。After the image alignment is completed, multiple images are fused to generate a 360-degree panoramic image. However, in many cases, when the images are deformed and aligned, the parallax between different fisheye lenses may cause the image to not be completely and accurately aligned. Therefore, if image blending is performed on the overlapping areas between the images, obvious artifacts often appear. To solve this problem, according to an embodiment, the images can be seam-cutting to find the best seam and copy each image to the corresponding position of the seam to eliminate the artifacts.
將相鄰圖像進行接縫切割處理時,根據實施例,可採用圖割(graph-cut)演算法進行接縫切割(步驟S407)。可參考圖7顯示接縫切割流程實施例圖,其中顯示兩張相鄰圖像,顯示為第一圖像I
1與第二圖像I
2,將這兩張圖像中找出重疊區域70,接著在重疊區域70裡面得出一條最低損失接縫701,能夠將圖像更好的融合在一起。當得出最低損失接縫701,得出圖7所示的左側圖71與右側圖72,再使用一多波段融合(multiband blending)演算法以將相鄰圖像融合成一張圖像73(步驟S409)。如此,可重複圖7顯示的接縫切割流程直到所有圖像都融合成一張全景圖為止。
When the adjacent images are subjected to seam cutting processing, according to the embodiment, a graph-cut algorithm may be used to perform seam cutting (step S407). The seam cutting process embodiment diagram may be shown in FIG7, wherein two adjacent images are shown as a first image I1 and a second image I2 , and an overlapping
在圖3顯示將點雲投影到一個平面圖像上,再通過此平面圖像分析出路徑大致的輪廓邊緣的步驟S309中,其中實施方式可參考圖8所示的實施例流程。FIG. 3 shows a step S309 of projecting the point cloud onto a plane image and then analyzing the plane image to obtain a rough outline edge of the path. The implementation method may refer to the embodiment process shown in FIG. 8 .
當建立點雲後,將對點雲進行路徑的輪廓分析。一開始,先根據點雲的尺寸建立一張對應尺寸的遮罩圖像(步驟S801),這個遮罩主要是代表點雲的輪廓形狀,藉此可以分析出點雲中路徑的實際輪廓。之後將點雲投影至此二維的遮罩圖像上(步驟S803),其中是將點雲的像素的三維坐標轉換至遮罩圖像的平面坐標上,轉換的方法是直接取每個點雲像素坐標值(x, y, z)中的x, y坐標值作為點雲在遮罩圖像上的坐標。After the point cloud is created, the path contour analysis will be performed on the point cloud. First, a mask image of the corresponding size is created according to the size of the point cloud (step S801). This mask mainly represents the contour shape of the point cloud, and the actual contour of the path in the point cloud can be analyzed. Then the point cloud is projected onto this two-dimensional mask image (step S803), in which the three-dimensional coordinates of the pixels of the point cloud are converted to the plane coordinates of the mask image. The conversion method is to directly take the x, y coordinate values of each point cloud pixel coordinate value (x, y, z) as the coordinates of the point cloud on the mask image.
因此,當將每個點雲都轉換至遮罩圖像的坐標系上,即可利用此遮罩圖像分析出路徑的大致輪廓邊緣(步驟S805),可參考圖9(A)顯示的遮罩圖像90的示意圖。然而,在找出輪廓邊緣之前,遮罩圖像上可以發現有些位置有一些破碎的縫隙,如圖9(A)顯示的縫隙901,而需要將一些縫隙處填補起來(步驟S807)。填補隙縫的方法可對遮罩圖像90進行高斯模糊(Gaussian Blur),再對圖像進行影像膨脹(dilate)與影像侵蝕(erode),這樣就能將一些縫隙處填補起來。Therefore, when each point cloud is converted to the coordinate system of the mask image, the mask image can be used to analyze the approximate contour edge of the path (step S805), and the schematic diagram of the
如此,將可形成如圖9(B)經填補縫隙後的遮罩圖像90’,其中顯示縫隙已經填補,形成已填補縫隙901’,使得後續處理不會因為這些縫隙影響結果,盡可能減少輪廓架構太複雜的問題。In this way, a mask image 90' after gap filling as shown in FIG. 9(B) can be formed, in which the gaps are shown to have been filled, forming filled gaps 901', so that subsequent processing will not be affected by these gaps, thereby minimizing the problem of too complex contour structure.
接著就能對遮罩圖像進行分析,如圖9(C)所示,取得路徑的輪廓邊緣圖像90’’(步驟S809),即可利用得出的輪廓邊緣作為重建立體網格的地板形狀並重構出平面網格。Then, the mask image can be analyzed, as shown in FIG9(C), to obtain the contour edge image 90'' of the path (step S809), and the obtained contour edge can be used as the floor shape of the reconstructed three-dimensional grid and the plane grid can be reconstructed.
經取得平面路徑的平面網格,可將輪廓邊緣點作為約束點(constraint point),如上述實施例,使用約束德洛涅三角化進行三角化。其中,由於三角化後的網格會存在面積太大或者太過狹長的三角形,因此接著會針對路徑輪廓內的三角形進行細化(refinement)的處理,可針對面積太大或者狹長的三角形網格以新增的頂點對三角形網格進行細化,得到一個更均勻大小的三角網格。After obtaining the plane mesh of the plane path, the edge points of the outline can be used as constraint points, such as in the above embodiment, and triangulation is performed using constrained Delaunay triangulation. Since the triangulated mesh may have triangles that are too large or too narrow, the triangles in the path outline will then be refined. For the triangle meshes that are too large or too narrow, the triangle meshes can be refined with newly added vertices to obtain a triangle mesh of more uniform size.
在此一提的是,由於三角化後的網格會是一個凸包形狀,因此會存在許多不屬於輪廓內的三角面,因此需找出存在與輪廓外的三角面,並針對這些網格進行刪面,最後得到一個完整道路的平面網格,之後再將平面網格轉換至立體網格。最後沿著網格邊緣建立牆壁以及/或天花板,以完成重建立體網格。It is worth mentioning that since the triangulated mesh is a convex hull shape, there will be many triangles that are not within the outline. Therefore, it is necessary to find the triangles outside the outline and delete these meshes to finally get a plane mesh of a complete road. Then convert the plane mesh to a three-dimensional mesh. Finally, build walls and/or ceilings along the edges of the mesh to complete the reconstruction of the three-dimensional mesh.
最後,在應用層面,運用立體網格的模型的貼圖方式是依據目前的位置進行投影的,特別的是,貼圖是全景圖,而全景圖圖像會覆蓋水平360度的視野和垂直180度的視野,因此可將立體網格模型的所有的頂點推算至對應的貼圖坐標上,如此,僅需要計算出頂點與當前攝影系統之間的極角與方位角,便可以通過極角與方位角計算出對應的貼圖坐標。Finally, at the application level, the mapping method of the model using the stereo grid is projected according to the current position. In particular, the mapping is a panorama, and the panoramic image will cover the horizontal 360-degree field of view and the vertical 180-degree field of view. Therefore, all the vertices of the stereo grid model can be extrapolated to the corresponding mapping coordinates. In this way, it is only necessary to calculate the polar angle and azimuth between the vertex and the current camera system, and then the corresponding mapping coordinates can be calculated through the polar angle and azimuth.
揭露書提出的方法可以解決習知技術在立體場景內移動時因為貼圖不連續的切換而導致移動的過程會出現嚴重的貼圖扭曲的問題,讓使用者在立體網格為架構形成的立體場景中進行互動時,在立體網格上貼圖是通過播放影片的方式來進行切換的。運行時,使用者則可透過鍵盤或滑鼠等輸入裝置控制立體場景內的移動與全景圖切換,而移動方式是沿著以同時定位與地圖建構技術所計算出來的路徑進行向前或向後移動,並在移動的過程中同時切換貼圖。也就是,當使用者向前移動時系統會向前播放全景圖影片,向後退時會以向後播放全景圖影片,藉此解決切換過程導致扭曲的問題。The method proposed in the disclosure book can solve the problem of serious texture distortion in the known technology caused by discontinuous switching of textures when moving in a stereoscopic scene. When the user interacts in a stereoscopic scene formed by a stereoscopic grid, the textures on the stereoscopic grid are switched by playing videos. During operation, the user can control the movement and panorama switching in the stereoscopic scene through input devices such as a keyboard or mouse, and the movement method is to move forward or backward along the path calculated by the simultaneous positioning and map construction technology, and switch the textures simultaneously during the movement. That is, when the user moves forward, the system will play the panoramic video forward, and when moving backward, it will play the panoramic video backward, thereby solving the problem of distortion caused by the switching process.
綜上所述,根據上述實施例所描述的基於全景圖重建立體場景的系統與方法,以全景圖作為輸入影像,據此建立立體網格,可以解決圖像不連續的造成圖像扭曲的問題。進一步地,完成建立立體網格後,當瀏覽立體影像並進行貼圖時,可透過全景圖影片的切換的方式獲得,所提出的系統還可以在立體場景裡加入一些互動物件,並可讓使用者在立體場景裡自由地依照路線自由的移動,並可旋轉鏡頭或與立體物件進行互動。In summary, according to the system and method for reconstructing a stereoscopic scene based on a panorama described in the above embodiments, a panorama is used as an input image, and a stereo grid is established based on it, which can solve the problem of image distortion caused by image discontinuity. Furthermore, after the stereo grid is established, when browsing a stereo image and mapping, it can be obtained by switching the panorama video. The proposed system can also add some interactive objects to the stereo scene, and allow the user to move freely along the route in the stereo scene, and can rotate the lens or interact with the stereo object.
以上所公開的內容僅為本發明的優選可行實施例,並非因此侷限本發明的申請專利範圍,所以凡是運用本發明說明書及圖式內容所做的等效技術變化,均包含於本發明的申請專利範圍內。The contents disclosed above are only preferred feasible embodiments of the present invention and are not intended to limit the scope of the patent application of the present invention. Therefore, all equivalent technical changes made using the contents of the specification and drawings of the present invention are included in the scope of the patent application of the present invention.
11:影像擷取裝置 100:影像處理裝置 103:輸入單元 101:處理單元 105:輸出單元 13:儲存裝置 21:影像輸入單元 23:圖像拼接單元 20:網格重建模組 201:語義分割單元 203:深度估測單元 205:同時定位與地圖建構單元 207:點雲建立單元 209:同時定位與地圖建構尺度估測單元 211:點雲對齊單元 213:輪廓找尋單元 215:三角定位單元 217:立體網格建立單元 219:紋理投影單元 25:應用 50:魚眼圖像 61:第一圖 62:第二圖 63:第三圖 64:第四圖 611:第一左側圖 612:第一右側圖 I 1:第一圖像 I 2:第二圖像 70:重疊區域 701:最低損失接縫 71:左側圖 72:右側圖 73:圖像 90:遮罩圖像 901:縫隙 90’:遮罩圖像 901’:已填補縫隙 90’’:輪廓邊緣圖像 步驟S301~S315基於全景圖重建立體場景的流程 步驟S401~S409圖像拼接的流程 步驟S801~S809分析路徑輪廓的流程 11: Image capture device 100: Image processing device 103: Input unit 101: Processing unit 105: Output unit 13: Storage device 21: Image input unit 23: Image stitching unit 20: Grid reconstruction unit 201: Semantic segmentation unit 203: Depth estimation unit 205: Simultaneous positioning and mapping unit 207: Point cloud building unit 209: Simultaneous positioning and mapping scale estimation unit 211: Point cloud alignment unit 213: Contour finding unit 215: Triangulation positioning unit 217: Stereo grid building unit 219: Texture projection unit 25: Application 50: Fisheye image 61: First image 62: Second image 63: Third image 64: Fourth image 611: First left image 612: First right image I 1 : First image I 2 : Second image 70: Overlapping area 701: Minimum loss seam 71: Left image 72: Right image 73: Image 90: Mask image 901: Gap 90': Mask image 901': Filled gap 90'': Contour edge image Steps S301~S315 Process of reconstructing a stereoscopic scene based on a panorama Steps S401~S409 Process of image stitching Steps S801~S809 Process of analyzing path contours
圖1顯示基於全景圖重建立體場景的系統架構是意圖;Figure 1 shows the system architecture for reconstructing a stereoscopic scene based on a panorama.
圖2顯示運行基於全景圖重建立體場景的方法的功能模組實施例圖;FIG. 2 shows an example diagram of a functional module for executing a method for reconstructing a stereoscopic scene based on a panoramic image;
圖3顯示基於全景圖重建立體場景的方法的實施例流程圖;FIG3 is a flow chart showing an embodiment of a method for reconstructing a stereoscopic scene based on a panoramic image;
圖4顯示圖像拼接的實施例流程圖;FIG4 shows a flowchart of an embodiment of image stitching;
圖5(A)(B)(C)顯示魚眼校正的實施例圖;FIG. 5 (A) (B) (C) shows an example diagram of fisheye correction;
圖6顯示拼接4張圖像的實施範例圖;FIG6 shows an example diagram of an implementation of stitching four images;
圖7顯示接縫切割流程實施例圖;FIG. 7 shows an example diagram of a seam cutting process;
圖8顯示分析路徑輪廓的實施例流程圖;以及FIG8 is a flowchart showing an embodiment of analyzing path profiles; and
圖9(A)(B)(C)顯示分析路徑輪廓的實施例圖。FIG9(A)(B)(C) shows an example of an analysis path profile.
21:影像輸入單元 21: Image input unit
23:圖像拼接單元 23: Image stitching unit
20:網格重建模組 20: Grid reconstruction group
201:語義分割單元 201: Semantic segmentation unit
203:深度估測單元 203: Depth estimation unit
205:同時定位與地圖建構單元 205: Simultaneous positioning and map construction unit
207:點雲建立單元 207: Point cloud creation unit
209:同時定位與地圖建構尺度估測單元 209: Simultaneous positioning and map construction scale estimation unit
211:點雲對齊單元 211: Point cloud alignment unit
213:輪廓找尋單元 213: Contour search unit
215:三角定位單元 215: Triangulation unit
217:立體網格建立單元 217: 3D grid creation unit
219:紋理投影單元 219: Texture projection unit
25:應用 25: Application
Claims (16)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW113126362A TWI884034B (en) | 2024-07-15 | 2024-07-15 | Panorama-based three-dimensional scene reconstruction system and method therefor |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW113126362A TWI884034B (en) | 2024-07-15 | 2024-07-15 | Panorama-based three-dimensional scene reconstruction system and method therefor |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TWI884034B true TWI884034B (en) | 2025-05-11 |
| TW202605762A TW202605762A (en) | 2026-02-01 |
Family
ID=96582098
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW113126362A TWI884034B (en) | 2024-07-15 | 2024-07-15 | Panorama-based three-dimensional scene reconstruction system and method therefor |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI884034B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115136203A (en) * | 2020-09-14 | 2022-09-30 | 辉达公司 | Generate labels for synthetic images using one or more neural networks |
| EP3695384B1 (en) * | 2017-10-11 | 2023-09-20 | Alibaba Group Holding Limited | Point cloud meshing method, apparatus, device and computer storage media |
| TWI830363B (en) * | 2022-05-19 | 2024-01-21 | 鈺立微電子股份有限公司 | Sensing device for providing three dimensional information |
| EP4310786A1 (en) * | 2022-07-14 | 2024-01-24 | Dassault Systemes Simulia Corp. | Automatic creation of three-dimensional (3d) variable resolution region geometries |
-
2024
- 2024-07-15 TW TW113126362A patent/TWI884034B/en active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3695384B1 (en) * | 2017-10-11 | 2023-09-20 | Alibaba Group Holding Limited | Point cloud meshing method, apparatus, device and computer storage media |
| CN115136203A (en) * | 2020-09-14 | 2022-09-30 | 辉达公司 | Generate labels for synthetic images using one or more neural networks |
| TWI830363B (en) * | 2022-05-19 | 2024-01-21 | 鈺立微電子股份有限公司 | Sensing device for providing three dimensional information |
| EP4310786A1 (en) * | 2022-07-14 | 2024-01-24 | Dassault Systemes Simulia Corp. | Automatic creation of three-dimensional (3d) variable resolution region geometries |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112150575B (en) | Scenario data acquisition method, model training method, device and computer equipment | |
| Kawai et al. | Diminished reality based on image inpainting considering background geometry | |
| US9117310B2 (en) | Virtual camera system | |
| US9251623B2 (en) | Glancing angle exclusion | |
| JP6201476B2 (en) | Free viewpoint image capturing apparatus and method | |
| Rav-Acha et al. | Dynamosaicing: Mosaicing of dynamic scenes | |
| US11232628B1 (en) | Method for processing image data to provide for soft shadow effects using shadow depth information | |
| Inamoto et al. | Virtual viewpoint replay for a soccer match by view interpolation from multiple cameras | |
| KR20070095040A (en) | Camera calibration method and 3D object reconstruction method | |
| JP2006053694A (en) | Space simulator, space simulation method, space simulation program, recording medium | |
| KR20070086037A (en) | How to switch between scenes | |
| JP2016537901A (en) | Light field processing method | |
| WO2021154096A1 (en) | Image processing for reducing artifacts caused by removal of scene elements from images | |
| Chen et al. | Casual 6-dof: free-viewpoint panorama using a handheld 360 camera | |
| US20030146922A1 (en) | System and method for diminished reality | |
| van den Hengel et al. | In situ image-based modeling | |
| Saito et al. | View interpolation of multiple cameras based on projective geometry | |
| JP6799468B2 (en) | Image processing equipment, image processing methods and computer programs | |
| Román et al. | Automatic multiperspective images. | |
| Kang et al. | Virtual navigation of complex scenes using clusters of cylindrical panoramic images | |
| TWI884034B (en) | Panorama-based three-dimensional scene reconstruction system and method therefor | |
| Chang et al. | Constructing a multivalued representation for view synthesis | |
| Kawai et al. | Panorama image interpolation for real-time walkthrough | |
| JP2020173726A (en) | Virtual viewpoint converter and program | |
| Liu et al. | See360: Novel panoramic view interpolation |