TWI867867B

TWI867867B - Panoramic video generation system and method

Info

Publication number: TWI867867B
Application number: TW112145951A
Authority: TW
Inventors: 王紀先; 楊善美; 楊雅如
Original assignee: 台灣大哥大股份有限公司
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2024-12-21
Also published as: TW202522960A

Abstract

本發明提供一種全景影片生成系統，包括：第一模組，用於獲取多個使用者裝置的第一時間資料，並提供第二時間資料以將多個所述使用者裝置的時間整合成一致，其中所述使用者裝置具有攝像單元；第二模組，藉由關聯於一預定影像或者來自所述使用者裝置所提供的一第一影像其中一者，生成一第一資料，所述第一資料係多個關聯於該目標的特徵點；第三模組，基於多個所述使用者裝置的位置資訊生成一第二資料，該第二資料關聯於所述使用者裝置相對於該目標的位置；第四模組，根據所述第一資料與所述第二資料生成一操作資料，並分別提供所述操作資料於每一個所述使用者裝置，從而根據多個所述使用者裝置所拍攝的訊號源生成一全景影片，其中所述操作資料係一關聯於所述攝像單元對於所述目標的焦距以及方向的一指示，根據本發明的構想，可以實現一種即時串流的全景影片生成技術， The present invention provides a panoramic video generation system, comprising: a first module, used to obtain first time data of multiple user devices, and provide second time data to integrate the time of multiple user devices into a consistent one, wherein the user devices have a camera unit; a second module, by being associated with a predetermined image or a first image provided by the user device, generating a first data, wherein the first data is a plurality of feature points associated with the target; a third module, based on the location information of the multiple user devices Generate a second data, the second data is related to the position of the user device relative to the target; the fourth module generates an operation data according to the first data and the second data, and provides the operation data to each of the user devices respectively, thereby generating a panoramic video according to the signal sources photographed by multiple user devices, wherein the operation data is an indication related to the focal length and direction of the camera unit for the target. According to the concept of the present invention, a real-time streaming panoramic video generation technology can be realized.

Description

Panoramic video generation system and method

一種全景影片生成系統及方法，尤其是指利用多個使用者裝置(例如手機)所拍攝的串流影像來進行影像的縫合，從而生成全景影片。 A panoramic video generation system and method, in particular, refers to stitching images using streaming images taken by multiple user devices (such as mobile phones) to generate a panoramic video.

全景影片(Panoramic Video)是一種影像格式，可以提供觀眾一個全方位的視覺體驗。全景影片通常捕捉了水準360度和垂直180度的視野，讓觀眾可以在播放影片時自由旋轉視角，就像身臨其境一樣。這種影片格式通常用於虛擬實境(VR)和擴增實境(AR)應用中，以提供更沉浸式和互動性的體驗。 Panoramic video is a video format that can provide viewers with a full range of visual experience. Panoramic videos usually capture a 360-degree horizontal and 180-degree vertical field of view, allowing viewers to freely rotate the viewing angle while playing the video, as if they were in the scene. This video format is often used in virtual reality (VR) and augmented reality (AR) applications to provide a more immersive and interactive experience.

全景影片的製作通常涉及特殊的攝影和編輯技術，以捕捉整個周圍環境的視覺訊息。攝影師通常需要使用多個相機或360度攝像機來捕捉全景影片，然後使用專業的軟件將這些影片合併成單一的全景影片。觀眾觀看全景影片時，可以使用鼠標、手持設備或虛擬實境頭盔等工具，通過改變視角來探索影片中的場景，創造出身臨其境的感覺。 The production of panoramic videos usually involves special photography and editing techniques to capture the visual information of the entire surrounding environment. Photographers usually need to use multiple cameras or 360-degree cameras to capture panoramic videos, and then use professional software to merge these videos into a single panoramic video. When watching panoramic videos, viewers can use tools such as mice, handheld devices, or virtual reality helmets to explore the scenes in the video by changing their perspectives, creating a sense of being there.

製作全景影片的過程固然豐富了觀眾的觀影體驗，然而，這種製作方式也存在著一系列缺點和挑戰，例如(1)成本高昂：製作多視角影片需要使用多台攝影機，這增加了硬體和設備的成本。每台攝影機需要獨立配置，並且購買、維護和運營這些攝影機都需要昂貴的預算。此外，多視角影片的後期製作也需要專業的軟體和人力資源，這進一步提高了成本。(2)複雜的攝影條件：每台攝影機都需要擁有良好的攝影條件，包括光線、位置和角度等。攝影師必須花費額外的時間和精力確保每個攝影機都處於最佳的拍攝狀態，這可能會增加拍攝的困難度和不確定性。(3)複雜的攝影機配置：配置多台攝影機需要精心策劃，攝影師必須確保它們不會互相幹擾，並且能夠捕捉到所需的畫面。這可能需要額外的設備，如支架和穩定器，以確保攝影機的穩定性和準確性。(4)後期製作時間長：在後期製作過程中，需要花費大量時間來剪輯和合成不同攝影機拍攝的素材。這不僅增加了後期製作的時間成本，還可能導致製作進度的延遲。(5)專業技術需求：後期製作多視角影片需要專業技術，如影片剪輯和視訊合成。這可能需要聘請具有相關技能的專業人員，或者投入時間來學習這些技術。 The process of making panoramic videos certainly enriches the audience's viewing experience. However, this production method also has a series of disadvantages and challenges, such as (1) high cost: the production of multi-view videos requires the use of multiple cameras, which increases the cost of hardware and equipment. Each camera needs to be configured independently, and the purchase, maintenance and operation of these cameras require expensive budgets. In addition, the post-production of multi-view videos also requires professional software and human resources, which further increases the cost. (2) Complex shooting conditions: Each camera needs to have good shooting conditions, including light, position and angle. The photographer must spend extra time and energy to ensure that each camera is in the best shooting condition, which may increase the difficulty and uncertainty of shooting. (3) Complex camera configuration: Configuring multiple cameras requires careful planning, and the photographer must ensure that they will not interfere with each other and can capture the required images. This may require additional equipment such as brackets and stabilizers to ensure the stability and accuracy of the camera. (4) Long post-production time: During the post-production process, a lot of time is required to edit and synthesize the materials shot by different cameras. This not only increases the time cost of post-production, but may also cause delays in production progress. (5) Professional technology requirements: Post-production of multi-view videos requires professional technology, such as video editing and video synthesis. This may require hiring professionals with relevant skills, or investing time to learn these technologies.

總結來說，製作全景影片雖然能夠提供更豐富的觀影體驗，但它也需要面對高昂的成本、技術挑戰和時間成本，也需要面對時序差異、系統穩定性、硬體複雜性等方面的挑戰和缺點。所以說，目前需要一種新型且能即時串流的全景影片生成技術，使這種形式的影片更容易製作。 In summary, although the production of panoramic videos can provide a richer viewing experience, it also faces high costs, technical challenges and time costs, as well as challenges and shortcomings in terms of timing differences, system stability, hardware complexity, etc. Therefore, a new type of panoramic video generation technology that can be streamed in real time is currently needed to make this form of video easier to produce.

為達到上述之目的，本發明提供一種全景影片生成系統，包括：一第一模組，用於獲取多個使用者裝置的第一時間資料，並提供一第二時間資料以將多個所述使用者裝置的時間整合成一致，其中所述使用者裝置具有一攝像單元，所述第一時間資料係所述使用者裝置的系統時間，以及所述第二時間資料係所述系統的系統時間；一第二模組，藉由關聯於一預定影像或者來自所述使用者裝置所提供的一第一影像其中一者，生成一第一資料，所述第一資料係多個關聯於該目標的特徵點，所述預定影像係一已拍攝所述目標的影像；一第三模組，基於多個所述使用者裝置的位置資訊生成一第二資料，該第二資料關聯於所述使用者裝置相對於該目標的位置；一第四模組，根據所述第一資料與所述第二資料生成一操作資料，並分別提供所述操作資料於每一個所述使用者裝置，從而根據多個所述使用者裝置所拍攝的訊號源生成一全景影片，其中所述操作資料係一關聯於所述攝像單元對於所述目標的焦距以及方向的一指示。 To achieve the above-mentioned purpose, the present invention provides a panoramic video generation system, comprising: a first module, used to obtain first time data of multiple user devices, and provide a second time data to integrate the time of the multiple user devices into a consistent one, wherein the user device has a camera unit, the first time data is the system time of the user device, and the second time data is the system time of the system; a second module, by being associated with a predetermined image or a first image provided by the user device, generating a first data, the first data is a plurality of related A third module generates a second data based on the location information of the plurality of user devices, the second data being related to the location of the user device relative to the target; a fourth module generates an operation data based on the first data and the second data, and provides the operation data to each of the user devices, thereby generating a panoramic video based on the signal sources captured by the plurality of user devices, wherein the operation data is an indication related to the focal length and direction of the camera unit with respect to the target.

在一實施例中，所述第一模組包含一註冊單元，利用Zenoh Auto Discovery技術，以實現來自多個所述使用者裝置向所述系統的註冊。 In one embodiment, the first module includes a registration unit that uses Zenoh Auto Discovery technology to enable registration of multiple user devices to the system.

在一實施例中，所述第一模組進一步包含一時間取得單元，用於獲取多個所述第一時間資料。 In one embodiment, the first module further includes a time acquisition unit for acquiring a plurality of the first time data.

在一實施例中，所述第一模組進一步包含一時間調整單元，根據所述第二時間資料調整所述使用者裝置中一應用程式中的時間，從而確定多個所述使用者裝置的時間戳是一致的。 In one embodiment, the first module further includes a time adjustment unit, which adjusts the time in an application in the user device according to the second time data, thereby determining that the timestamps of multiple user devices are consistent.

在一實施例中，所述第二模組係利用Visual SLAM技術來計算出所述第一資料。 In one embodiment, the second module uses Visual SLAM technology to calculate the first data.

在一實施例中，所述第三模組係利用Visual SLAM技術以及Zenoh Protocol Peer to Peer技術生成所述第二資料。 In one embodiment, the third module generates the second data using Visual SLAM technology and Zenoh Protocol Peer to Peer technology.

在一實施例中，更包含一第五模組，用於去除多個所述使用者裝置中至少兩者所拍攝的訊號源。 In one embodiment, a fifth module is further included for removing the signal source captured by at least two of the plurality of user devices.

為達到上述之目的，本發明提供全景影片生成方法，包括：一時間校準步驟，獲取多個使用者裝置的第一時間資料，並提供一第二時間資料以將多個所述使用者裝置的時間整合成一致，其中所述使用者裝置具有一攝像單元，所述第一時間資料係所述使用者裝置的系統時間，以及所述第二時間資料係所述系統的系統時間；一特徵點生成步驟，藉由關聯於一預定影像或者來自所述使用者裝置所提供的一第一影像其中一者，生成一第一資料，所述第一資料係多個關聯於該目標的特徵點，所述預定影像係一已拍攝所述目標的影像；一相對位置確認步驟，基於多個所述使用者裝置的位置資訊生成一第二資料，該第二資料關聯於所述使用者裝置相對於該目標的位置；一全景影片拍攝步驟，根據所述第一資料與所述第二資料生成一操作資料，並分別提供所述操作資料於每一個所述使用者裝置，其中所述操作資料係一關聯於所述攝像單元對於所述目標的焦距以及方向的一指示；一全景影片生成步驟，根據多個所述使用者裝置所拍攝的訊號源生成一全景影片。 To achieve the above-mentioned purpose, the present invention provides a method for generating a panoramic video, comprising: a time calibration step, obtaining first time data of a plurality of user devices, and providing a second time data to integrate the times of the plurality of user devices into a consistent one, wherein the user device has a camera unit, the first time data is the system time of the user device, and the second time data is the system time of the system; a feature point generation step, generating a first data by associating with a predetermined image or a first image provided by the user device, wherein the first data is a plurality of feature points associated with the target. A feature point, wherein the predetermined image is an image of the target that has been photographed; a relative position confirmation step, generating a second data based on the position information of multiple user devices, wherein the second data is related to the position of the user device relative to the target; a panoramic video shooting step, generating an operation data according to the first data and the second data, and providing the operation data to each user device respectively, wherein the operation data is an indication related to the focal length and direction of the camera unit for the target; a panoramic video generation step, generating a panoramic video according to the signal source shot by multiple user devices.

在一實施例中，時間校準步驟更包括一註冊步驟，利用Zenoh Auto Discovery技術，以實現來自多個所述使用者裝置中的註冊。在一實施例中，所述第一模組進一步包含一時間取得單元，用於獲取多個所述第一時間資料；一時間取得步驟，用於獲取多個所述第一時間資料；以及一時間調整步驟，根據所述第二時間資料調整所述使用者裝置中一應用程式中的時間，從而確定多個所述使用者裝置的時間戳是一致的，其中所述第二時間資料包括系統時間服務或網絡時間服務所提供的時間資料。 In one embodiment, the time calibration step further includes a registration step, using Zenoh Auto Discovery technology to implement registration from multiple user devices. In one embodiment, the first module further includes a time acquisition unit for obtaining multiple first time data; a time acquisition step for obtaining multiple first time data; and a time adjustment step for adjusting the time in an application in the user device according to the second time data, thereby determining that the timestamps of multiple user devices are consistent, wherein the second time data includes time data provided by a system time service or a network time service.

在一實施例中，所述特徵點生成步驟係利用Visual SLAM技術所實現。 In one embodiment, the feature point generation step is implemented using Visual SLAM technology.

在一實施例中，所述相對位置確認步驟係利用Visual SLAM技術以及Zenoh Protocol Peer to Peer技術所實現。 In one embodiment, the relative position confirmation step is implemented using Visual SLAM technology and Zenoh Protocol Peer to Peer technology.

在一實施例中，所述全景影片拍攝步驟與所述全景影片生成步驟之間更包含一優化步驟，用於去除多個所述使用者裝置中至少兩者所拍攝的相同訊號源。 In one embodiment, an optimization step is further included between the panoramic video shooting step and the panoramic video generation step, which is used to remove the same signal source shot by at least two of the multiple user devices.

100:系統 100: System

110:第一模組 110: First module

111:註冊單元 111:Registered unit

112:時間取得單元 112: Time acquisition unit

113:時間調整單元 113: Time adjustment unit

120:第二模組 120: Second module

130:第三模組 130: The third module

140:第四模組 140: Fourth module

150:第五模組 150: Fifth module

300:使用者裝置 300: User device

S31:時間校準步驟 S31: Time calibration step

S311:註冊步驟 S311: Registration steps

S312:時間取得步驟 S312: Time acquisition step

S313:時間調整步驟 S313: Time adjustment step

S32:特徵點生成步驟 S32: Feature point generation step

S33:相對位置確認步驟 S33: Relative position confirmation step

S34:全景影片拍攝步驟 S34: Panoramic video shooting steps

S35:優化步驟 S35: Optimization step

S36:全景影片生成步驟 S36: Panoramic video generation step

圖1顯示本發明全景影片生成之系統架構圖。 Figure 1 shows the system architecture diagram of the panoramic video generation of the present invention.

圖2顯示本發明全景影片生成方法之流程圖。 Figure 2 shows a flow chart of the panoramic video generation method of the present invention.

圖3顯示本發明時間校準步驟之流程圖。 Figure 3 shows a flow chart of the time calibration steps of the present invention.

圖4顯示本發明Zenoh Protocol Peer to Peer溝通技術之示意圖。 Figure 4 shows a schematic diagram of the Zenoh Protocol Peer to Peer communication technology of the present invention.

請參考圖1所示之本發明全景影片生成之系統架構圖，以簡要說明個模組的功能。本發明之系統100系可根據多個使用者裝置300來生成全景影片，所述使用者裝置300具有一攝像單元以拍攝一目標的影像，其中所述使用者裝置300可例如為具有錄影功能的手機、平板或者其他行動裝置，但不以此為限，所述目標定義為多個使用者裝置300所欲拍攝的標的，例如演唱會的場地、體育場等等。 Please refer to the system architecture diagram of the panoramic video generation of the present invention shown in Figure 1 to briefly explain the functions of each module. The system 100 of the present invention can generate panoramic videos based on multiple user devices 300. The user device 300 has a camera unit to shoot an image of a target, wherein the user device 300 can be, for example, a mobile phone, tablet or other mobile device with a recording function, but is not limited thereto. The target is defined as the target that the multiple user devices 300 want to shoot, such as a concert venue, a stadium, etc.

所述系統100包含一第一模組110、一第二模組120、一第三模組130、一第四模組140與一第五模組150。所述第一模組110配置成用於用於獲取多個使用者裝置300的第一時間資料，並提供一第二時間資料以將多個所述使用者裝置300的時間整合成一致。進一步言，所述第一模組110更包含一註冊單元111、一時間取得單元112以及一時間調整單元113，所述註冊單元111配置成利用Zenoh Auto Discovery技術，以實現來自多個所述使用者裝置向本系統的註冊。所述時間取得單元112配置成用於獲取多個來自使用者裝置300的第一時間資料，其中所述第一時間資料係定義為使用者裝置300的系統時間。所述時間調整單元113配置成根據所述第二時間資料調整所述使用者裝置中一應用程式中的時間，從而確定多個所述使用者裝置的時間戳是一致的，藉此確定多個所述使用者裝置所拍攝的訊號源皆是具有相同的時間，以致使不同使用者裝置300能以相同時間拍攝不同角度的影片，從而讓本系統100進行影片縫合(stitching)，其中所述第二時間資料係定義為所述系統100的系統時間，例如一系統時間服務或一網絡時間服務所提供的時間資料。 The system 100 includes a first module 110, a second module 120, a third module 130, a fourth module 140 and a fifth module 150. The first module 110 is configured to obtain first time data of a plurality of user devices 300 and provide a second time data to integrate the times of the plurality of user devices 300 into a consistent state. Furthermore, the first module 110 further includes a registration unit 111, a time acquisition unit 112 and a time adjustment unit 113. The registration unit 111 is configured to utilize the Zenoh Auto Discovery technology to implement registration of the plurality of user devices to the system. The time acquisition unit 112 is configured to obtain a plurality of first time data from the user device 300, wherein the first time data is defined as the system time of the user device 300. The time adjustment unit 113 is configured to adjust the time in an application in the user device according to the second time data, thereby determining that the timestamps of the plurality of user devices are consistent, thereby determining that the signal sources captured by the plurality of user devices all have the same time, so that different user devices 300 can capture videos of different angles at the same time, thereby allowing the system 100 to perform video stitching, wherein the second time data is defined as the system time of the system 100, such as time data provided by a system time service or a network time service.

在本發明的系統100中，所述第二模組120配置成藉由關聯於一目標的預定影像或者來自所述使用者裝置300所提供的一第一影像其中一者，生成一第一資料，所述第一資料係多個關聯於該目標的特徵點。所述第三模組130配置成基於多個所述使用者裝置300的位置資訊生成一第二資料，該第二資料關聯於所述使用者裝置相對於該目標的位置。所述第四模組140配置成根據所述第一資料與所述第二資料生成一操作資料，並分別提供所述操作資料於每一個所述使用者裝置，從而根據多個所述使用者裝置所拍攝的訊號源生成一全景影片，其中操作資料係一關聯於所述攝像單元對於所述目標的焦距以及方向的一指示。所述第五模組260配置成去除多個所述使用者裝置300中至少兩者所拍攝的訊號源。 In the system 100 of the present invention, the second module 120 is configured to generate a first data by one of a predetermined image associated with a target or a first image provided by the user device 300, wherein the first data is a plurality of feature points associated with the target. The third module 130 is configured to generate a second data based on the position information of the plurality of user devices 300, wherein the second data is associated with the position of the user device relative to the target. The fourth module 140 is configured to generate an operation data according to the first data and the second data, and provide the operation data to each of the user devices respectively, thereby generating a panoramic video according to the signal source captured by the plurality of user devices, wherein the operation data is an indication of the focal length and direction of the camera unit with respect to the target. The fifth module 260 is configured to remove the signal source captured by at least two of the plurality of user devices 300.

以下詳述說明本發明全景影片生成之技術手段，請同時參閱圖2所示之本發明全景影片生成方法之流程圖以及圖3所示之本發明時間校準步驟之流程圖並同時參考圖1。由於使用者裝置300的系統時間可能因多種原因而不準確，例如使用者可能沒有手動調整時間或因不同時區的變化。為了確保所有使用者裝置300上的時間一致，可以使用使用者裝置300的系統時間服務或者從網絡時間服務(例如NTP，Network Time Protocol)獲取時間資料，從而確保所有使用者裝置300時間是一致的，以方便後續的影片縫合。因此在時間校準步驟S31中，本發明藉由第一模組110獲取多個使用者裝置的第一時間資料，並提供一第二時間資料以將多個所述使用者裝置的時間整合成一致。更具體而言，時間校準步驟S31中包含一註冊步驟S311、一時間取得步驟S312以及一時間調整步驟 S313，在註冊步驟S311中，註冊單元111利用Zenoh(Zero Overhead Pub/Sub,Store/Forward,and Query)技術，以使多個使用者裝置300自動註冊他們的設備，從而與系統100建立通訊連接，所述Zenoh是用於構建即時數據分發系統的開源協議和工具套件，提供了自動發現和通信的能力，用於建立數據通道以在不同設備之間共用資料，例如可使不同的使用者裝置300中的應用程式之間建立通信連接，以確保它們能夠相互通訊和共用數據。在一具體實施例中，Zenoh包含zenoh client及zenoh Auto Discovery，所述zenoh client主要用於應用程序與zenoh分發系統之間的通信，以實現數據的分發、存儲和查詢等功能，所述zenoh Auto discovery係有助於構建動態的、自適應的zenoh系統，使得節點能夠自動加入或離開系統，而不需要手動的配置或管理。在本發明的較佳實施例中，註冊單元111更利用WASM技術，以在網頁瀏覽器中執行應用程式(亦即，Zenoh)，從而迴避App Store或Google Play商店的審核。具體而言，Zenoh被編譯為WebAssembly模組(所述WebAssembly模組是一個二進位文件，包含了可以在瀏覽器中運行的低階碼)，並使用相應的WebAssembly API將生成的模組整合進去，以加載和運行WebAssembly模組，其中WebAssembly模組至少可支援Rust或C來編譯Zenoh，但不以此為限。在時間取得步驟S312中，本發明的時間取得單元112藉由使用者設備300中的應用程式獲取每個使用者設備300的第一時間資料(例如從系統時間服務或網絡時間服務獲取)，從而確認哪些使用者設備300的第一時間資料與本系統的第二時間資料不一致。接著，在時間調整步驟S313中，時間調整單元113根據所述第二時間資料調整所述使用者裝置300中一應用程式中的時間，從而確定多個所述使用者裝置300的時間戳是一致的。藉由時間校準步驟S31，可以確保影片縫合時所使用的時間戳是一致的，從而克服不同使用者裝置300時間不同導致影片無法縫合的問題。 The following is a detailed description of the technical means of generating panoramic videos of the present invention. Please refer to the flowchart of the panoramic video generating method of the present invention shown in FIG2 and the flowchart of the time calibration step of the present invention shown in FIG3 and refer to FIG1 at the same time. Since the system time of the user device 300 may be inaccurate due to various reasons, such as the user may not manually adjust the time or due to changes in different time zones. In order to ensure that the time on all user devices 300 is consistent, the system time service of the user device 300 or the time data from the network time service (such as NTP, Network Time Protocol) can be used to ensure that the time of all user devices 300 is consistent, so as to facilitate the subsequent video stitching. Therefore, in the time calibration step S31, the present invention obtains first time data of multiple user devices through the first module 110, and provides a second time data to integrate the times of the multiple user devices into a consistent state. More specifically, the time calibration step S31 includes a registration step S311, a time acquisition step S312, and a time adjustment step S313. In the registration step S311, the registration unit 111 uses Zenoh (Zero Overhead Pub/Sub, Store/Forward, and Query) technology to enable multiple user devices 300 to automatically register their devices, thereby establishing a communication connection with the system 100. The Zenoh is an open source protocol and tool kit for building a real-time data distribution system, which provides automatic discovery and communication capabilities and is used to establish a data channel to share data between different devices. For example, it can enable applications in different user devices 300 to establish a communication connection to ensure that they can communicate with each other and share data. In a specific embodiment, Zenoh includes zenoh client and zenoh Auto Discovery, wherein the zenoh client is mainly used for communication between applications and the zenoh distribution system to realize functions such as data distribution, storage and query, and the zenoh Auto discovery is helpful to build a dynamic and adaptive zenoh system so that nodes can automatically join or leave the system without manual configuration or management. In a preferred embodiment of the present invention, the registration unit 111 further utilizes WASM technology to execute the application (i.e., Zenoh) in a web browser, thereby avoiding the review of the App Store or Google Play Store. Specifically, Zenoh is compiled into a WebAssembly module (the WebAssembly module is a binary file that contains low-level code that can be run in a browser), and the generated module is integrated using the corresponding WebAssembly API to load and run the WebAssembly module, wherein the WebAssembly module can at least support Rust or C to compile Zenoh, but is not limited thereto. In the time acquisition step S312, the time acquisition unit 112 of the present invention obtains the first time data of each user device 300 through the application in the user device 300 (for example, from the system time service or the network time service), thereby confirming which user devices 300 have first time data that are inconsistent with the second time data of the system. Next, in the time adjustment step S313, the time adjustment unit 113 adjusts the time in an application in the user device 300 according to the second time data, thereby determining that the timestamps of multiple user devices 300 are consistent. Through the time calibration step S31, it can be ensured that the timestamps used when stitching the video are consistent, thereby overcoming the problem that the video cannot be stitched due to different times of different user devices 300.

如圖2之特徵點生成步驟S32，第二模組120藉由關聯於一目標的預定影像或者來自所述使用者裝置300所提供的一第一影像其中一者，生成一第一資料，所述第一資料係多個關聯於該目標的特徵點，且所述預定影像係一已拍攝所述目標的影像。具體而言，第二模組120利用Visual SLAM(Visual Simultaneous Localization and Mapping)演算法來計算出第一資料，其中Visual SLAM(Visual Simultaneous Localization and Mapping)是一種用於同時定位和地圖構建的技術，該技術可藉由使用者裝置300的攝像單元或感測器捕捉的第一影像來確定使用者裝置300的位置和環境中的目標。在本發明的較佳實施例中，為了提高Visual SLAM演算法的精度，通常會創建一個特徵點地圖之第一資料，所述特徵點地圖中的特徵點可以用於後續的位置估計，其中所述特徵點可以是地標、標誌物或其他易於辨識的點。在本發明的最佳實施例中，第二模組120藉由如資料庫儲存有多個關聯於一目標影像，藉此可先計算出特徵點地圖，從而加速使用者裝置300相對於多個特徵點的定位計算。 As shown in the feature point generation step S32 of FIG. 2 , the second module 120 generates a first data by using one of a predetermined image associated with a target or a first image provided by the user device 300, wherein the first data is a plurality of feature points associated with the target, and the predetermined image is an image of the target that has been photographed. Specifically, the second module 120 calculates the first data using a Visual Simultaneous Localization and Mapping (VSLM) algorithm, wherein VSLM is a technology for simultaneous positioning and map construction, which can determine the position of the user device 300 and the target in the environment by using a first image captured by a camera unit or sensor of the user device 300. In the preferred embodiment of the present invention, in order to improve the accuracy of the Visual SLAM algorithm, a first data of a feature point map is usually created, and the feature points in the feature point map can be used for subsequent position estimation, wherein the feature points can be landmarks, signs or other easily identifiable points. In the preferred embodiment of the present invention, the second module 120 can calculate the feature point map first by storing multiple images associated with a target such as a database, thereby accelerating the positioning calculation of the user device 300 relative to multiple feature points.

在相對位置確認步驟S33中，第三模組130基於多個所述使用者裝置的位置資訊生成一第二資料，該第二資料關聯於所述使用者裝置相對於該目標的位置。具體而言，第三模組130基於Zenoh Protocol Peer to Peer(Zenoh P2P)技術與Visual SLAM結合，提供(1)資料傳輸：Zenoh P2P允許不同使用者裝置300之間的即時傳輸資料，例如前述的第一資料。(2)即時定位計算：Zenoh P2P傳輸的技術，可更快速、準確地計算出第二資料，所述第二資料定義為使用者設備300相對於目標的即時位置。在本發明中，Zenoh P2P技術的主要功能是支援分散式系統中的資料通訊和資料共用，Zenoh P2P技術通過網路連接各種使用者裝置300，允許它們互相交換資料，且與Visual SLAM結合使用。如圖4所示之Zenoh Protocol Peer to Peer示意圖，Zenoh P2P可以在不同使用者裝置300或觀眾之間分享其相對位置資訊，從而確定出多個使用者裝置300的攝像單元或感測器所分佈的不同位置，以更準確地定位出多個攝像單元在三維空間中的位置。此外，Zenoh P2P還可以用於協調不同使用者裝置300之間的資料共用。例如，各個使用者裝置300的攝像單元可以即時傳輸圖像資料或感測器測量資料，並將這些資料共用給Visual SLAM演算法，使這些資料可以有效地傳輸到需要它們的地方。 In the relative position confirmation step S33, the third module 130 generates a second data based on the position information of the plurality of user devices, and the second data is related to the position of the user device relative to the target. Specifically, the third module 130 is based on the Zenoh Protocol Peer to Peer (Zenoh P2P) technology combined with Visual SLAM to provide (1) data transmission: Zenoh P2P allows real-time transmission of data between different user devices 300, such as the aforementioned first data. (2) Real-time positioning calculation: Zenoh P2P transmission technology can more quickly and accurately calculate the second data, which is defined as the real-time position of the user device 300 relative to the target. In the present invention, the main function of Zenoh P2P technology is to support data communication and data sharing in a distributed system. Zenoh P2P technology connects various user devices 300 through a network, allowing them to exchange data with each other and be used in combination with Visual SLAM. As shown in the Zenoh Protocol Peer to Peer schematic diagram in FIG4 , Zenoh P2P can share relative position information between different user devices 300 or viewers, thereby determining the different positions of the camera units or sensors of multiple user devices 300, so as to more accurately locate the positions of multiple camera units in three-dimensional space. In addition, Zenoh P2P can also be used to coordinate data sharing between different user devices 300. For example, the camera unit of each user device 300 can transmit image data or sensor measurement data in real time and share these data with the Visual SLAM algorithm so that these data can be effectively transmitted to where they are needed.

接著，在全景影片拍攝步驟S34中，第四模組140根據所述第一資料與所述第二資料生成一操作資料，並分別提供所述操作資料於每一個所述使用者裝置300，其中所述操作資料係一關聯於所述攝像單元對於所述目標的焦距以及方向的一指示。更具體而言，第四模組140根據所述第一資料與所述第二資料，確定出各使用者裝置300於特徵點地圖中的相對位置，從而計算出關於各使用者裝置300拍測全景影片的方位、角度以及焦距之操作資料，進而提供至使用者裝置300的應用程式，來提示使用者如何拍攝全景影片並生成多個關於影像的訊號源。舉例而言，使用者裝置300中的應用程式設置有一使用者介面，當多個使用者開啟各自使用者裝置300中的應用程式，藉由本發明的系統100計算出操作資料並提供於使用者裝置300的應用程式，這些使用者可以根據使用者介面的提示來操作使用者裝置300的角度、方位和焦距，以拍測關於影片的訊號源。在本發明的較佳實施例中，所述操作資料之生成是即時的，且應用程式可以設定成自動的，以即時進行特徵點檢測和跟蹤，使用者裝置可以基於即時的基於操作資料來自動地調整攝像單元角度、方位及焦距，例如調整相機參數、裁剪、縮放或其他影像處理操作，但不以此為限。在本發明的其他實施例中，如果需要對多個相關焦距的影片進行集群處理，可以使用Zenoh P2P來協調不同使用者裝置300的處理節點之間的資料共用和協作，從而使各處理節點可以相互協作，以共同處理不同焦距的影片，從而優化處理效能。 Next, in the panoramic video shooting step S34, the fourth module 140 generates an operation data according to the first data and the second data, and provides the operation data to each of the user devices 300, wherein the operation data is an indication related to the focal length and direction of the camera unit with respect to the target. More specifically, the fourth module 140 determines the relative position of each user device 300 in the feature point map according to the first data and the second data, thereby calculating the operation data about the orientation, angle and focal length of the panoramic video shot by each user device 300, and then provides the operation data to the application of the user device 300 to prompt the user how to shoot a panoramic video and generate a plurality of signal sources related to the image. For example, the application in the user device 300 is provided with a user interface. When multiple users open the application in their respective user devices 300, the system 100 of the present invention calculates the operation data and provides it to the application in the user device 300. These users can operate the angle, orientation and focal length of the user device 300 according to the prompts of the user interface to shoot the signal source related to the video. In a preferred embodiment of the present invention, the generation of the operation data is real-time, and the application can be set to be automatic to perform feature point detection and tracking in real time. The user device can automatically adjust the camera unit angle, orientation and focal length based on the real-time operation data, such as adjusting camera parameters, cropping, zooming or other image processing operations, but not limited thereto. In other embodiments of the present invention, if cluster processing is required for multiple videos of related focal lengths, Zenoh P2P can be used to coordinate data sharing and collaboration between processing nodes of different user devices 300, so that each processing node can cooperate with each other to jointly process videos of different focal lengths, thereby optimizing processing performance.

在全景影片生成步驟S36中，第四模組140接著根據全景影片拍攝步驟S36中多個所述使用者裝置所拍攝的訊號源生成全景影片。在本發明的較佳實施例中，第四模組140可以是一影像串流伺服器(Video Streaming Server)並具有影像縫合(Real Time Video Stitching)的功能，第四模組140可以藉由多個所述使用者裝置所拍攝的訊號源即時的將相同時間戳的訊號源縫合在一起，以即時的生成全景影片。在本發明的較佳實施例中，第四模組係配置在一邊緣運算架構(Edge Computing framework)中，例如為Fog05，以減少集中遠端位置(例如「雲」)中執行的運算量，從而最大限度地減少異地用戶端和伺服器之間必須發生的通信量。 In the panoramic video generation step S36, the fourth module 140 then generates a panoramic video according to the signal sources captured by the plurality of user devices in the panoramic video shooting step S36. In a preferred embodiment of the present invention, the fourth module 140 may be a video streaming server (Video Streaming Server) and has a real-time video stitching function. The fourth module 140 may stitch together the signal sources with the same timestamp in real time through the signal sources captured by the plurality of user devices to generate a panoramic video in real time. In a preferred embodiment of the present invention, the fourth module is configured in an edge computing framework, such as Fog05, to reduce the amount of computing performed in a centralized remote location (such as the "cloud"), thereby minimizing the amount of communication that must occur between remote clients and servers.

在本發明的其他實施例中，全景影片拍攝步驟S34與所述全景影片生成步驟S36之間更包含一優化步驟S35，第五模組150配置成用於去除多個所述使用者裝置中至少兩者所拍攝的相同訊號源。具體而言，由於多個使用者裝置300可以具有大致相同角度、方位，因此有機會拍攝到相同的影片，因此，本發明的第五模組150於全景影片拍攝步驟S34後，接收來自使用者裝置300所拍攝的多個訊號源，並透過AI或者演算法來濾除重複影片的訊號源，以優化網路頻寬與計算效能，從而降低全景影片生成的計算量來加速全景影片的生成。在本發明的較佳實施例中，第五模組150可以配置在例如為GCP(Google Cloud Platform)和AWS(Amazon Web Services)之雲計算伺服器來去除，其中GCP或者AWS等雲計算平台。所述雲計算伺服器提供包括計算、儲存、數據庫、人工智慧、機器學習和網路服務在內的多種雲服務。GCP還擁有全球性的數據中心，以確保高可用性和低延遲。開發者和企業可以使用GCP來建立和運行應用程序。 In other embodiments of the present invention, an optimization step S35 is further included between the panoramic video shooting step S34 and the panoramic video generation step S36, and the fifth module 150 is configured to remove the same signal source shot by at least two of the multiple user devices. Specifically, since the multiple user devices 300 may have substantially the same angle and orientation, they may have a chance to shoot the same video. Therefore, after the panoramic video shooting step S34, the fifth module 150 of the present invention receives multiple signal sources shot by the user devices 300, and filters the signal sources of duplicate videos through AI or algorithms to optimize the network bandwidth and computing performance, thereby reducing the amount of computing for generating panoramic videos to accelerate the generation of panoramic videos. In a preferred embodiment of the present invention, the fifth module 150 can be configured on a cloud computing server such as GCP (Google Cloud Platform) and AWS (Amazon Web Services), wherein GCP or AWS is a cloud computing platform. The cloud computing server provides a variety of cloud services including computing, storage, database, artificial intelligence, machine learning and network services. GCP also has global data centers to ensure high availability and low latency. Developers and enterprises can use GCP to build and run applications.

總結來說，本發明之系統及方法結合Visual SLAM的技術並搭配Zenoh Protocol Peer to Peer溝通技術來精確定位多個使用者裝置相對於目標的位置以進行拍攝多個影像，並搭配邊緣運算平台來加速生成即時的全景影片，從而優化影片品質與觀看體驗。 In summary, the system and method of the present invention combines Visual SLAM technology with Zenoh Protocol Peer to Peer communication technology to accurately locate the positions of multiple user devices relative to the target to capture multiple images, and uses an edge computing platform to accelerate the generation of real-time panoramic videos, thereby optimizing video quality and viewing experience.

100:系統 100: System

110:第一模組 110: First module

111:註冊單元 111:Registered unit

112:時間取得單元 112: Time acquisition unit

113:時間調整單元 113: Time adjustment unit

120:第二模組 120: Second module

130:第三模組 130: The third module

140:第四模組 140: Fourth module

150:第五模組 150: Fifth module

300:使用者裝置 300: User device

Claims

A panoramic video generation system includes: a first module for obtaining first time data of a plurality of user devices and providing a second time data to integrate the times of the plurality of user devices into a consistent state, wherein the user device has a camera unit, the first time data is the system time of the user device, and the second time data is the system time of the system, wherein the user device has an application; a second module for generating a first data by being associated with a predetermined image or a first image provided by the user device, wherein the first data is a plurality of specific information associated with the target. A feature point, the predetermined image is an image of the target that has been photographed; a third module, based on the location information of multiple user devices, generates a second data, the second data is related to the position of the user device relative to the target; a fourth module, based on the first data and the second data, generates an operation data, and provides the operation data to each application of the user device, so as to allow a user to generate a panoramic video based on the signal source shot by operating the application according to the operation data, wherein the operation data is an indication related to the focal length and direction of the camera unit with respect to the target.

The panoramic video generation system as described in claim 1, wherein the first module includes a registration unit, using Zenoh Auto Discovery technology to implement registration from multiple user devices to the system.

A panoramic video generation system as described in claim 2, wherein the first module further comprises a time acquisition unit for acquiring a plurality of the first time data.

The panoramic video generation system as described in claim 3, wherein the first module further comprises a time adjustment unit, which adjusts the time in an application in the user device according to the second time data, thereby determining that the timestamps of multiple user devices are consistent.

The panoramic video generation system as described in claim 1, wherein the second module uses Visual SLAM technology to calculate the first data.

The panoramic video generation system as described in claim 1, wherein the third module generates the second data using Visual SLAM technology and Zenoh Protocol Peer to Peer technology.

The panoramic video generation system as described in claim 1 further includes a fifth module for removing the signal source captured by at least two of the plurality of user devices.

A method for generating a panoramic video includes: a time calibration step, obtaining first time data of a plurality of user devices, and providing a second time data to integrate the times of the plurality of user devices into a consistent time, wherein the user device has a camera unit, the first time data is the system time of the user device, and the second time data is the system time of the system, wherein the user device has an application; a feature point generation step, generating a first data by associating with a predetermined image or a first image provided by the user device, wherein the first data is a plurality of feature points associated with the target, wherein the predetermined image is a photographed image of the target. an image of the target; a relative position confirmation step, generating a second data based on the position information of the plurality of user devices, the second data being related to the position of the user device relative to the target; a panoramic video shooting step, generating an operation data based on the first data and the second data, and providing the operation data to the application of each user device respectively, thereby allowing a user to operate the application based on the operation data to shoot a signal source, wherein the operation data is an indication related to the focal length and direction of the camera unit for the target; a panoramic video generation step, generating a panoramic video based on the signal source shot by the plurality of user devices.

The method for generating panoramic videos as described in claim 8, wherein the time calibration step further includes a registration step, using Zenoh Auto Discovery technology to implement registration from multiple user devices.

The method for generating a panoramic video as described in claim 9, wherein the time calibration step further includes a time acquisition step for acquiring a plurality of the first time data; and a time adjustment step for adjusting the time in the application of the user device according to the second time data, thereby determining that the timestamps of the plurality of the user devices are consistent, wherein the second time data includes time data provided by a system time service or a network time service.

The panoramic video generation method as described in claim 8, wherein the feature point generation step is implemented using Visual SLAM technology.

The method for generating panoramic video as described in claim 8, wherein the relative position confirmation step is implemented by using Visual SLAM technology and Zenoh Protocol Peer to Peer technology.

The panoramic video generation method as described in claim 8, wherein the panoramic video shooting step and the panoramic video generation step further include an optimization step for removing the same signal source captured by at least two of the multiple user devices.