GB2635208A

GB2635208A - Shared split rendering for extended reality

Info

Publication number: GB2635208A
Application number: GB2316886.7A
Authority: GB
Inventors: Biatek Thibaud
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2025-05-07
Also published as: WO2025093805A1; GB202316886D0

Abstract

An apparatus/method is described comprising a means for analyzing data relating to coordinates within a three-dimensional space in which a plurality of extended reality devices are located, a means for generating one or more groups of devices from among the plurality of extended reality devices and means for generating one or more shared media content for each group of the one or more groups of devices. Shared split rendering for extended reality may be provided as may a means for evaluating the pose and distribution of each extended reality device. A second invention is claimed relating to the reception, encoding and assembly of shared media content.

Description

TITLE:

SHARED SPLIT RENDERING FOR EXTENDED REALITY

TECHNICAL FIELD:

[0001] Some example embodiments may generally relate to mobile or wireless telecommunication technology and systems, such as Long Term Evolution (LTE) or fifth generation (5G) new radio (NR) access technology, or 5G beyond, or sixth generation (6G) access technology or other communications systems. For example, certain example embodiments may relate to shared split rendering for extended reality.

BACKGROUND:

[0002] Examples of mobile or wireless telecommunication technology and systems may include the Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access Network (UTRAN), Long Term Evolution (LTE) Evolved UTRAN (E-UTRAN), LTE-Advanced (LTE-A), MulteFire, LTE-A Pro, fifth generation (5G) radio access technology or new radio (NR) access technology and/or sixth generation (6G) radio access technology. Fifth generation (50) and sixth generation (6G) wireless systems refer to the next generation (NG) of radio systems and network architecture. 5G and 6G network technology is mostly based on new radio (NR) technology, but the 5G/6G (or NG) network can also build on E-UTRAN radio. It is estimated that NR may provide bitrates on the order of 10-20 Gbit/s or higher and may support at least enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) as well as massive machine-type communication (mMTC). NR is expected to deliver extreme broadband and ultra-robust, low-latency connectivity and massive networking to support the Internet of Things (ioT).

SUMMARY:

100031 Various example embodiments may provide an apparatus including means for analyzing data relating to coordinates within a three-dimensional space in which a plurality of extended reality devices are located and means for generating one or more groups of devices from among the plurality of extended reality devices. The apparatus may also include means for generating one or more shared media content for each group of the one or more groups of devices.

[0004] Certain example embodiments may provide an apparatus including to means for transmitting data related to a shared session identifier of the apparatus and at least one other extended reality device and means for receiving one or more encoded shared media content along with one or more user-specific media content. The apparatus may also include means for assembling a data stream by merging the one or more encoded shared media content and the one or more user-specific media content and means for inputting the assembled data stream into a data decoder for displaying the assembled data stream.

[0005] Some example embodiments may provide a method including analyzing data relating to coordinates within a three-dimensional space in which a plurality of extended reality devices are located and generating one or more groups of devices from among the plurality of extended reality devices. The method may also include generating one or more shared media content for each group of the one or more groups of devices.

[0006] Various example embodiments may provide a method including transmitting data related to a shared session identifier of an apparatus and at least one other extended reality device and receiving one or more encoded shared media content along with one or more user-specific media content. The method may also include assembling a data stream by merging the one or more encoded shared media content and the one or more user-specific media content and inputting the assembled data stream into a data decoder for displaying the assembled data stream.

BRIEF DESCRIPTION OF THE DRAWINGS:

[0007] For proper understanding of example embodiments, reference should be made to the accompanying drawings, as follows: 100081 FIG. 1 illustrates an example scenario for split-rendering; [0009] FIG. 2 illustrates an example of a signal diagram for initiating a split rendering setup, according to certain example embodiments; 100101 FIG. 3 illustrates an example of a signal diagram for establishing a split rendering session, according to various example embodiments; [0011] FIG. 4A illustrates an example of a signal diagram for a shared split rendering session, according to some example embodiments; [0012] FIG. 4B illustrates an example of a continuation of the signal diagram 15 of FIG. 4A, according to some example embodiments; [0013] FIG. 5A illustrates an example of rendered viewports for a three-dimensional object, according to various example embodiments; [0014] FIG. 5B illustrates another example of rendered viewports for a three-dimensional object, according to certain example embodiments; zo [0015] FIG. 6A illustrates an example experimental simulation, according to some example embodiments; 100161 FIG. 6B illustrates example performance metrics of the simulation of FIG. 6A, according to various example embodiments; [0017] FIG. 7A illustrates another example experimental simulation, 25 according to certain example embodiments; [0018] FIG. 7B illustrates example performance metrics of the simulation of FIG. 7A, according to some example embodiments; [0019] FIG. 8A illustrates a further example experimental simulation, according to certain example embodiments; [0020] FIG. 8B illustrates example performance metrics of the simulation of FIG. 8A, according to various example embodiments; [0021] FIG. 9A illustrates an example experimental simulation, according to some example embodiments; [0022] FIG. 9B illustrates example performance metrics of the simulation of FIG. 9A, according to various example embodiments; 100231 FIG. 10A illustrates another example experimental simulation, according to certain example embodiments; [0024] FIG. 10B illustrates example performance metrics of the simulation of 10 FIG. 10A, according to various example embodiments; [0025] FIG. 11A illustrates a further example experimental simulation, according to some example embodiments; [0026] FIG. I I B illustrates example performance metrics of the simulation of FIG. 11A, according to various example embodiments; [0027] FIG. 12A illustrates an example experimental simulation, according to some example embodiments; [0028] FIG. 12B illustrates example performance metrics of the simulation of FIG. 12A, according to certain example embodiments; [0029] FIG. 13A illustrates another example experimental simulation, z o according to some example embodiments; [0030] FIG. I 3B illustrates example performance metrics of the simulation of FIG. 13A, according to various example embodiments; [0031] FIG. 14 an example of a flow diagram of a method, according to some example embodiments; [0032] FIG. 15 an example of a flow diagram of another method, according to certain example embodiments; and [0033] FIG. 16 illustrates a set of apparatuses, according to various example embodiments.

DETAILED DESCRIPTION:

100341 It will be readily understood that the components of certain example embodiments, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. The following is a detailed description of some example embodiments of systems, methods, apparatuses, and non-transitory computer program products for shared split rendering for extended reality. Although the devices discussed below and shown in the figures refer to 5G/6G or Next Generation NodeB (gNB) network entities and/or devices and user equipment (UE) devices, this disclosure is not limited to only network entities/devices and UEs.

[0035] It may be readily understood that the components of certain example embodiments, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Different reference designations from multiple figures may be used out of sequence in the description, to refer to a same element to illustrate their features or functions. If desired, the different functions or procedures discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the described functions or procedures may be optional or may be combined. As such, the following description should be considered as illustrative of the principles and teachings of certain example embodiments, and not in limitation thereof 100361 3rd Generation Partnership Project (3GPP) has defined specifications for immersive reality technologies. Immersive technologies extend the reality experienced by a person by merging a virtual world with the real world or by creating a fully immersive experience. For example, immersive technologies may include augmented reality (AR) in which virtual objects are overlaid on the real world, virtual reality (VR) in which a person is fully immersed in a virtual environment, and mixed reality (MR) in which virtual and real-world objects may interact with each other in real-time. The term extended reality (XR) may be used to refer to all these immersive technologies. In XR, audio and/or video data may need to be transmitted and/or received in periodic or aperiodic occasions while remaining within the bandwidth limits of the local network in which the XR device is located.

[0037] It is an aim of 3GPP to attempt to optimize efficient usage of collaborative XR applications where multiple users, equipped with XR devices, such as, for example, XR headsets, may share the same experience and view similar or the same content. One example usage of a shared XR experience among multiple users using XR devices may be a classroom in which students use XR devices for a shared or common XR simulation.

[0038] FIG. 1 illustrates an example scenario in which multiple classrooms on a university campus are equipped with XR devices of students in the classrooms. In this example, the university campus is equipped with XRenabled amphitheaters. A teacher may use XR to display additional virtual content to the students to enhance teaching. It may be preferrable for the XR headsets to be lightweight and affordable, which may be achieved by, for example, offloading costly and energy-consuming processes to a server. For example, media rendering may be offloaded to a server, such as a cloud server or other type of server on the standalone non-public network (S-NPN), which may be performed by a process known as slit-rendering.

[0039] Split-rendering may require high-throughput and low-latency connectivity. If the number of sessions in parallel is too high, congestion in the local network (e.g., S-NPN) may cause service interruption and/or low quality of experience (QoE). Split-rendering may also require a higher available computational power on the cloud server, which may result in a higher infrastructure cost. In the example of FIG. 1, multiple students located in close proximity to each other in the amphitheater may request that the server provide the same or similar media rendering because the XR devices of the students have a similar position or pose.

[0040] In the FIG. 1 scenario, two types of content may be distributed. First, offline generated content may be sent from a streaming server depending on user position in the room, and second, a live three-dimensional scene may require split-rendering to be done on the server. In both cases, delivering the content may be performed by leveraging stereoscopic encoding. However, in that case, it is expected that proximate users' views and poses may be correlated, which may not be exploited with legacy stereoscopic encoding. Thus, an additional level of optimization may be desirable by leveraging an additional and common base layer view that would be used as a common io reference for proximate users. The users may send their own pose that would be used to generate left and right views that may be encoded with a common reference view. This additional base layer reference view may save a substantial bitrate in every user's viewport coding and may be sent with multicast/broadcast mechanisms to, for example, absorb additional bitrate used for transmission. New video codecs may have the potential to efficiently leverage a common base view and may be able to save bitrate in the overall transmission. This may be achieved by using a multiview codec capable of handling three views, such as, for example, the common view and two left and right stereo views. The performance evaluation of positive impact on streaming may be determined by the savings of resources achieved on the transmission. The anchor may be a full 3D high efficiency video coding (HEVC)/3D-AVC simulcast delivery of the rendered video streams. A configuration compared against simulcast may occur where a multicast reference view is used as reference to code the left or right eye views generated for users, which may be sent with unicast. This may be used for several types of user clustering (e.g., in terms of number of users, distance, cluster size). [0041] It is desirable for 3GPP specifications to be able to exploit correlations and redundancies between multiple users using split-rendering architectures. Various example embodiments may provide technological advantages to support one or more procedures for exploiting correlations, redundancies, and/or dependencies to optimize delivery of shared XR experiences. Certain example embodiments may provide for optimizing coding and delivery of split-rendering sessions for XR implementations.

[0042] Various example embodiments may provide that multiple XR split-rendering sessions may be performed in parallel and each session may have their own allocated resources, communication pipe, encoder, and/or packager. Some example embodiments may leverage correlations between multiple users and/or XR devices used by the users to jointly optimize the split-rendering sessions and reduce the bandwidth required by N split-rendering sessions by sharing portions of the split-rendering workflow.

[0043] FIG. 2 illustrates an example of a signal diagram for initiating a split rendering setup, according to various example embodiments. Certain example embodiments may provide a configuration of, for example, an application 201, a media session handler/device 202, a split-rendering client (SRC) 203, a split-rendering server (SRS) 204, a real-time communication application function (RTC AF/SRF) 205, and an application service provider 206. Certain example embodiments may provide, at 210, the application service provider 206 may request the SRF 205 for provisioning of a split management session, and at 215, the application service provider 206 may request the SRF 205 to attach a newly created shared split management session to existing sessions already running for a shared XR service.

[0044] At 220, the shared split management session may be announced to the application 201 as part of service access information, and at 225, the application 201 may request a split of client media functions from the SRC 203. At 230, the SRC 203 may query the media session handler/device 202 about media capabilities of the SRC 203, and at 235, the SRC 203 may send, to the SRS 204, initial viewing conditions of the SRC 203, including, for example, a pose and/or position in three-dimensional space. At 240, the SRS 204 may evaluate, based on the initial viewing conditions sent by the SRC 203, different potential split sharing options. For example, the SRS 204 may evaluate that the SRC 203 may reuse at least a portion of already provisioned sessions or may evaluate which existing dataflows may be reused (e.g., already generated views, or audio track, or metadata, that would currently be sent over multi-cast broadcast services (MBS)). At 245, the SRC 203 and the SRS 204 may negotiate for acceptable capabilities for the SRC 203 and may agree on the split sharing option, which may include existing reusable dataflows, assets, media, and/or bitstreams already available through MBS.

100451 At 250, SRS 204 may start a split-rendering process, and at 255, the SRC 203 may establish a shared session, such as a WebRTC session. At 260, the SRC 203 may subscribe to available MBS sessions used to deliver common dataflow, assets, media and/or bitstreams, and at 265, SRC 203 may inform the application 201 that the split-rendering on edge is operating. At 270, the SRC 203 may send uplink metadata, such as, for example, pose and/or action information, and at 275, the SRS 204 may send the rendered asset or media to the SRC 203. At 280, the SRS 204 may send the shared media/assets to the SRC 203 with MBS.

[0046] FIG. 3 illustrates an example of a signal diagram for establishing a split rendering session, according to various example embodiments. Certain example embodiments may provide a configuration of, for example, an XR runtime application 301, a presentation engine 302 of an SRC, an XR source management function 303 of the SRC, a media access function 304 of the SRC, and an SRS 305. Certain example embodiments may provide one or more procedures including, at 310, the presentation engine 302 may discover the SRS 305 and may set up a connection to the SRS 305. The presentation engine 302 may provide information about its rendering capabilities and the XR runtime configuration, such as, for example, an OpenXR configuration. If a shared split-rendering session is established, this session may provide session-relevant parameters to enable optimization (e.g., shared session ID, scene ID, and/or the like). At 315, the SRS 305 may respond by creating a description of a split rendering output and input the SRS 305 expects to receive from SRC. If a shared split-rendering session is established, the SRS 305 may provide information on existing shared dataflows, assets, media, and/or bitstreams sent with MBS to be reused.

100471 At 320, the presentation engine 302 may request the buffer streams from the media access function 304, which in turn may establish a connection to the SRS 305 to stream, for example, a pose, and may retrieve split rendering buffers. At 325, the XR source management function 303 may retrieve pose and user input from the XR runtime 301, and at 330, the XR source management function 303 may share information such as, for example, the pose predictions and user input actions with the SRS 305. At 335, the SRS 305 may use the shared information to render a frame, and at 340, the SRS 305 may encode the rendered frame and may send it to the XR source management function 303. If a shared split-rendering session is established, the rendered frame may be comprised of a common shared view and, for example, additional user-specific information, such as, for example, additional views and/or auxiliary data. At 345, the media access function 304 may decide the encoded frame and may process the buffer frame, and at 350, the media access function 304 may transmit the decoded or raw buffer frames for display to the presentation engine 302 of the SRC and/or to the XR runtime application 301. At 355, the XR runtime application 301 may compose and render the frame.

[0048] Certain example embodiments may provide a data model for the split-rendering configuration. For example, Table 1 below provides definitions for a split-rendering configuration resource.

11110460 atu Typei ard n lity Name String 1..I A name for this Split ::* :0 ty name Tea#ATypd:::::E i di:*:::: . fMgrgP...... Rendering Configuration.

Status Boolean 1 1 Indicates whether to this split rendering configuration is active.

edgeResourceC onfigurationId Resourceld 0..1 The identifier of the edge resource configuration that will be used for sessions of this split rendering configuration.

policylemplate Id ResourceId 1..1 The identifier of the policy template that will be applied to the sessions of this split rendering configuration.

sharedStatus Boolean 1..1 Indicates whether the split rendering session is part of a shared setup or not.

sharedParamet ers Object 0..1 Describes the sharing parameters for the session. This includes, other sessions ID and status, existing common flows or cluster of sessions that can be joined.

Configuration Object 1..1 Describes the split-rendering configuration currently used by the SRS.

Editor's Note: The syntax and semantics of this element are TBD.

TABLE 1

[0049] For example, some example embodiments may provide a sharedSlatifs parameter which may indicate whether a split-rendering session is part of a shared setup. As a further example, certain example embodiments may provide a sharedParameters parameter which may describe the sharing parameters for the session. The sharing parameters may include sessions ID and status, and existing common flows or cluster of sessions that can be joined. 100501 Some example embodiments may provide a policy template for standalone split-rendering. A QoS specification may include, for example, at least one or more of two configurations for left and right eye buffer streams, one optional configuration for a depth buffer stream, and one configuration for an audio stream. Standalone split-rendering may use WebRTC for the real-time transport of the rendered media, which may apply real-time transport protocol (RTP) restrictions for WebRTC. Certain example embodiments may also, or alternatively, provide a policy template for shared split-rendering. A QoS specification may include, for example, at least one or more of one configuration for a shared view, two configurations for left and right eye buffer streams, one optional configuration for a depth buffer stream, one configuration for a shared audio stream, and one configuration for an audio stream. For shared split-rendering, the MBS may deliver at least a common portion of the rendered data or media, including, for example, a common rendered view or audio. The WebRTC for user-specific additional data (e.g., additional views) may apply RTP restrictions for WebRTC.

[0051] FIGs. 4A and 4B illustrate an example of signal diagrams for one or more procedures for a shared split rendering session, according to various example embodiments. Certain example embodiments may provide a configuration of, for example, a plurality of XR devices, such as a first XR device 401, an i-th XR device 402, and/or an n-th XR device 403. The configuration may also include, for example, a network manager 404, a server 405, and an application 406. Certain example embodiments may provide one or more procedures including, at 410, the application 406 may announce a shared XR split-rendering session to all devices on a connected network, including, for example, the first XR device 401, the i-th XR device 402, the n-th XR device 403, the network manager 404, and the server 405. At 411-413, the first XR device 401, the i-th XR device 402, and the n-th XR device 403 may join the shared split-rendering session and may provide information on their capabilities, such as, for example, capabilities for display, decoding, rendering, and/or the like.

100521 At 414, the application 406 may provision resources on a local server, such as one or more local edge servers, to handle as many users as may be registered. The application 406 may use any method for allocation of the edge servers or may allow an MNO to set up the edge servers to run the split- rendering processes. At 415, the shared split-rendering session information may be shared among all of the devices (e.g., the first XR device 401, the i-th XR device 402, the n-th XR device 403, the network manager 404, and the application 406), including, for example, relevant network IP addresses, split-rendering server IP addresses and one or more associated ports. At 416-418, each of the first XR device 401, the i-th XR device 402, and the n-th XR device 403 may send initial configuration information to the server 405, such as, their position in three-dimensional space, current speed, and/or current pose.

[0053] At 419, the server 405 may pre-configure the split-rendering session.

The server 405 may be pre-configured by clustering each device in multiple pools of users and identifying commonalities which may be shared. Then, the server 405 may instantiate the required renderers, encoders and packagers and may decide which generated media flows may be sent with PTM and PTP, to which pool of users, and may configure the network manager 404 accordingly.

At 420, the server 405 may send to the all the devices (e.g., the first XR device 40I, the i-th XR device 402, the n-th XR device 403, and the network manager 404) information about the pool of users they each belong to, and which common and user-specific data each device may need to receive. At 421-423, the first XR device 401, the i-th XR device 402, and the n-th XR device 403 may establish transport connections in order to receive common and user-specific dataflows, which may include, for example, webRTC session establishments, and subscribe to a corresponding multicast group to receive common data.

[0054] At 424-426, the first XR device 401, the i-th XR device 402, and the n-th XR device 403 may send their pose information and user actions to the server 405. At 427, the server 405 may perform rendering for each pool of users for the requested poses and may generate the appropriate number of common and user-specific media dataflows. At 428, the server 405 may perform encoding and packaging of common and user-specific media flows, including, for example, video multi-view encoding. At 429, the common media dataflows may be sent with PTM to the different pool of users, such as, for example, through their subscribed multicast group. At 430-432, the first XR device 401, the i-th XR device 402, and the n-th XR device 403 may each receive their user-specific media dataflows using PTP, such as, for example, through their own webRTC unicast sessions. At 433, the first XR device 401, the i-th XR device 402, and the n-th XR device 403 may assemble, decode and display the media dataflows received via PTM and PTP.

[0055] Various example embodiments may provide enable procedures for reduce bandwidth used to distribute multi-user split-rendering sessions. The procedures may include, for example, analyzing a space, such as a three-dimensional space, using two-dimensional or three-dimensional coordinates, and evaluating information related to the space, such as a density and/or pose similarity. Each user may have an XR device, such as, for example, an XR headset. Users (e.g., XR devices of users) may be grouped or clustering based on the evaluation of the information related to the space.

100561 A single reference content or media, such as, for example, a single reference view, may be generated for each cluster or group. The reference content/media may be encoded Multiview reference media/content. For example, users' views may be encoded with multiview with the reference view of the cluster as a base layer. The reference media (e.g., views) may be sent to the XR devices of the users using one or more point to multipoint (PTM) data streams and one or more user-specific media (e.g., views) may be sent with point to point (PTP) data streams. Each XR device or user may subscribe to a PTM stream and may receive its PTP stream in parallel.

100571 A bitstream may be assembled by merging PTM and PTP sub-streams, which may then be fed into a decoder. The decoder may output multiple media, such as, for example, three views, in which only left and right views may be sent for display in the XR device and a baseline view may not be displayed. By using a common close reference media (e.g., view) to encode the left/right view, bandwidth savings may be achieved since the encoding benefits from enhanced predictors. Efficiency may be enhanced by, for example, introducing more than one reference. However, in this example, to three views may be a more practical implementation for encoding and decoding complexity.

[0058] Simulations and experimental configurations have been performed according to various example embodiments. In the examples discussed herein, example encoding technologies may be used, such as, Anchor: MV-HEVC reference software HTM 16.30 used in stereoscopic mode. One view (left) encoded as 1 picture, the other one (right) as P. The simulations may employ MV-HEVC reference software HTM 16.30, 3-views configuration, where the baseline view may be the common shared view, and the other views may be stereoscopic views. The common view may be encoded as I picture and the two views as P referring to this common view. For example, four bitrate points may be generated for each configuration, using quantization parameters (QP) = 125,30,35,401. A +3 delta QP may be applied to the P-frames, 1-frames may be encoded with the base QP. For each user, a bitrate savings Bjontegaard delta rate (BD-RATE) may be computed by taking, as a distortion metric, an average Y PSNR for the stereoscopic views, and as bitrate, a sum of stereoscopic data rate. If a selected common view may not be selected as a user view, but as a "virtual one", the additional data rate which may be needed to carry out a virtual view which may be distributed among proximate users to account for the BD-RATE computation.

[0059] FIGs. 5A and 5B illustrate examples of rendered viewports for a three-dimensional object, according to certain example embodiments. FIGs. 5A and 5B illustrate rendered views that may be presented to an encoder from different positions in space. FIG 5A may be a left and right views from a first position, such as (1,1), and FIG. 5B may be left and right views from a second position, such as (5,8).

100601 FIGs. 6A and 6B illustrate an example experimental simulation for a case with a N=54 students, according to some example embodiments. FIG. 6A illustrates the example experimental simulation which may be applied for io 54 students with XR devices separated by, for example, 1 meter each, with 1 meter of elevation per row. An object, such as a three-dimensional object, may be centered 2 meters away from the first row of students. The students (with XR devices) may be clustered or grouped into three by three (3x3). A central student/XR device may be used for a left eye view as a reference for coding the surrounding students in the cluster/group.

[0061] FIG. 6B illustrates examples of various dataflows sent to users, coming from the layout provided in FIG. 6A. A rendered video may be encoded and then packaged before being sent to the user using the dataflows. Common media streams may be sent via leveraging multicast transmission while user-specific media streams may be sent with unicast. For example, user 0.0 may receive the common media stream with multicast and its two additional views via unicast. All three media streams may be assembled and fed a multiview decoder (MV-DEC) before a pose correction using asynchronous time warping (ATW) and displaying the media streams is performed.

[0062] FIGs. 7A and 7B illustrate another example experimental simulation, according to certain example embodiments. FIG. 7A illustrates that the example experimental simulation may provide constant user density over clusters of 3x3. A reference view may be identified as a central user left eye, which may be the closest available to surrounding users/XR devices. A common view may be delivered with multicast and other views may be distributed with unicast specifically for each user. Each XR device may receive both the multicast and unicast data streams and may reconstruct stereoscopic views based on the data streams. FIG. 7B illustrates example performance metrics of the shared split-rendering session for the clustering shown in FIG. 7A. On average, this configuration may enable a 18.25% reduction in bandwidth (e.g., bitrate) on a transmission link. It may be observed that 0% is achieved on central user-view for the left eye to be used as a reference. The encoding configuration may not change compared to simulcast for the central user.

[0063] FIGs. 8A and 8B illustrate a further example experimental simulation, according to certain example embodiments. FIG. 8A illustrates a clustering in which users and XR devices may be grouped or clustered into 1x3 clusters. FIG. 8B illustrates that the example of FIG. 8A may provide a performance metric in which a bandwidth (e.g., bitrate) may be reduced by 17.17%.

[0064] FIGs. 9A and 9B illustrate another example experimental simulation, according to certain example embodiments. FIG. 9A illustrates a clustering in which users and XR devices may be grouped or clustered into 2x3 constant clustering using alternative clustering and space analysis. FIG. 9B illustrates that the example of FIG. 9A may provide a performance metric in which a bandwidth (e.g., bitrate) may be reduced by 18.42%.

100651 FIGs. 10A and 10B illustrate another example experimental simulation, according to certain example embodiments. FIG. 1 OA illustrates a clustering in which users and XR devices may be grouped or clustered according to a user distance to the displayed object. The left/right view displacement may be reduced as the distance from the object increases. FIG. 10B illustrates that the example of FIG. 10A may provide a performance metric in which a bandwidth (e.g., bitrate) may be reduced by 19.38% over the complete transmission.

[0066] In various example embodiments, a user used as a reference may be mobile, such as, for example, the user may leave the room, move away, or disconnect, which may, for example, interrupt the session of other users which reduces the QoE for the surrounding users in the cluster or group. Hence, some example embodiments may deploy a virtual user/view.

[0067] FIGs. 11A and 11B illustrate an example experimental simulation using a virtual view, according to certain example embodiments. FIG. 11A illustrates a clustering in which users and XR devices may be grouped or clustered into 2x3 clusters or groups. The virtual view/user may be used as a to reference and located at a center of the cluster. FIG. 11B illustrates that the example of FIG. 11A may provide a performance metric in which a bandwidth (e.g., bitrate) may be reduced by 26.95% over the complete transmission. Simulating the virtual view/user may require additional bandwidth/bitrate, which may cause the reduction to be limited to 11.07%.

[0068] FIGs. 12A and 12B illustrate another example experimental simulation using a virtual view, according to certain example embodiments. FIG. 12A illustrates a clustering in which users and XR devices may be grouped or clustered into 3x3 clusters or groups. To minimize excess overhead, it may increase efficiency to align the virtual view/user with a position of a user. FIG. 12B illustrates that the example of FIG. 12A may provide a performance metric in which a bandwidth (e.g., bitrate) may be reduced by 89.74% for the aligned views/users. Simulating the virtual view/user may require additional bandwidth/bitrate, which may cause the reduction to be limited to 17.24% for the whole system.

[0069] FIGs. 13A and 13B illustrate another example experimental simulation using a virtual view, according to certain example embodiments. FIG. 13A illustrates a clustering in which users and XR devices may be grouped or clustered into 5x3 clusters or groups. In this example, a distance between users may be reduced to 0.5 meters and the elevation between rows may be reduced to 0.5 meters. With this configuration, the density of users may be increased, which may increase a correlation between proximate users' views. FIG. 13B illustrates that the example of FIG. 13A may provide a performance metric in which a bandwidth (e.g., bitrate) may be reduced by 22.50% when accounting for simulating the virtual view/user.

[0070] Various example embodiments may provide one or more methods used, for example, by leveraging a k-means clustering with the distance to the virtual scene vector as criterion or using pose similarity. Various other codecs or communication protocols, or combinations thereof, may be used. MV-lo may be advantageous as hardware HEVC encoders can be reused with a software update to support multi-layer decoding (e.g., multi-layer support only requires high-level syntax manipulation). As an example, 50 MBS may be used as a PTM transmission protocol.

[0071] FIG. 14 illustrates an example flow diagram of a method, according to certain example embodiments. In an example embodiment, the method of FIG. 14 may be performed by a network element or network entity in a 3GPP system, such as LTE or 5G-NR. For instance, in an example embodiment, the method of FIG. 14 may be performed by a server, such as SRS 204/305, similar to apparatus 1610 illustrated in FIG. 16.

z o [0072] According to various example embodiments, the method of FIG. 14 may include, at 1410, analyzing data relating to coordinates within a three-dimensional space in which a plurality of extended reality devices are located, and at 1420, generating one or more groups of devices from among the plurality of extended reality devices. The method may also include, at 1430, generating one or more shared media content for each group of the one or more groups of devices.

[0073] Some example embodiments may provide that the method further includes receiving the data related to the plurality of extended reality devices located within the three-dimensional space, and evaluating, based on the data, a distribution within the three-dimensional space and a pose of each of the plurality of extended reality devices. The method may also include encoding media content for each group of the one or more groups of devices and transmitting the encoded media content to the one or more groups of devices. [0074] Certain example embodiments may provide that the method also includes encoding, as media content, multiple perspective views combined with a reference view as a base view for each group of the one or more groups of devices and transmitting the encoded multiple perspective views and the reference view. The encoded media content may be transmitted by one or to more point-to-multipoint data streams along with one or more user-specific media content transmitted via one or more point-to-point data streams, or the encoded multiple perspective views and the reference view may be transmitted by one or more point-to-multipoint data streams along with one or more user-specific views transmitted via one or more point-to-point data streams. Once of the one or more point-to-multipoint data streams and the point-to-point data stream may be transmitted in parallel to each of the plurality of extended reality devices.

[0075] FIG. 15 illustrates an example flow diagram of a method, according to certain example embodiments. In an example embodiment, the method of FIG. 15 may be performed by a user device or user equipment in a 3GPP system, such as LTE or 5G-NR. For instance, in an example embodiment, the method of FIG. 15 may be performed by a UE, XR device, or the like, such as SRC 203, similar to apparatus 1620 illustrated in FIG. 16.

[0076] According to various example embodiments, the method of FIG. 15 may include, at 1510, transmitting data related to a shared session identifier of the apparatus and at least one other extended reality device, and at 1520, receiving one or more encoded shared media content along with one or more user-specific media content. The method may also include, at 1530, assembling a data stream by merging the one or more encoded shared media content and the one or more user-specific media content, and at 1540, inputting the assembled data stream into a data decoder for displaying the assembled data stream.

[0077] Certain example embodiments may provide the shared media content comprises multiple perspective views and a reference view. The one or more encoded shared media content may be transmitted by one or more point-tomultipoint data streams along with the one or more user-specific media content transmitted via one or more point-to-point data streams, or the encoded multiple perspective views and the reference view are received by to one or more point-to-multipoint data streams and one or more user-specific views are received via a point-to-point data stream. The method may also include subscribing to the one or more point-to-multipoint data streams and the point-to-point data stream. The one or more encoded shared media content and the one or more user-specific media content may be received in parallel.

The method may further include displaying the assembled data stream.

[0078] FIG. 16 illustrates apparatuses 1610 and 1620 according to various example embodiments. In the various example embodiments, apparatus 1610 may be an element in a network or associated with the network, or a network entity, such as a server, S-NPN, or the like. SRS 204/305 may be examples of apparatus 1610 according to various example embodiments as discussed above. It should be noted that one of ordinary skill in the art would understand that apparatus 1610 may include components or features not shown in FIG. 16. Further, the apparatus 1620 may be a user device or other similar device connected to a network, such as a UE, XR device, XR headset, or the like.

The SRCs, such as SRC 203, may be examples of apparatus 1620 according to various example embodiments as discussed above. It should be noted that one of ordinary skill in the art would understand that apparatus 1620 may include components or features not shown in FIG. 16.

[0079] According to various example embodiments, the apparatuses 1610 and/or 1620 may include one or more processors, one or more computer-readable storage medium (for example, memory, storage, or the like), one or more radio access components (for example, a modem, a transceiver, or the like), and/or a user interface. In some example embodiments, apparatuses 1610 and/or 1620 may be configured to operate using one or more radio access technologies, such as GSM, LTE, LTE-A, NR, SG, WLAN, WiFi, NB-IoT, Bluetooth, NFC, MulteFire, and/or any other radio access technologies.

[0080] As illustrated in the example of FIG. 16, apparatuses 1610 and/or 1620 may include or be coupled to processors 1612 and 1622, respectively, for processing information and executing instructions or operations. Processors 1612 and 1622 may be any type of general or specific purpose processor. In fact, processors 1612 and 1622 may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and processors based on a multi-core processor architecture, as examples. While a single processor 1612 (and 1622) for each of apparatuses 1610 and/or 1620 is shown in FIG. 16, multiple processors may be utilized according to other example embodiments. For example, it should be understood that, in certain example embodiments, apparatuses 1610 and/or 1620 may include two or more processors that may form a multiprocessor system (for example, in this case processors 1612 and 1622 may represent a multiprocessor) that may support multiprocessing. According to certain example embodiments, the multiprocessor system may be tightly coupled or loosely coupled to, for example, form a computer cluster.

[0081] Processors 1612 and 1622 may perform functions associated with the operation of apparatuses 1610 and/or 1620, respectively, including, as some examples, precoding of antenna gain/phase parameters, encoding and decoding of individual bits forming a communication message, formatting of information, and overall control of the apparatuses 1610 and/or 1620, including processes illustrated in FIGs. 2-15.

100821 Apparatuses 1610 and/or 1620 may further include or be coupled to memory 1614 and/or 1624 (internal or external), respectively, which may be coupled to processors 1612 and/or 1622, respectively, for storing information and instructions that may be executed by processors 1612 and 1622. Memory 1614 (and memory 1624) may be one or more memories and of any type suitable to the local application environment, and may be implemented using any suitable volatile or nonvolatile data storage technology such as a semiconductor-based memory device, a magnetic memory device and system, an optical memory device and system, fixed memory, and/or removable memory. For example, memory 1614 (and memory 1624) can be comprised of any combination of random access memory (RAM), read only memory (ROM), static storage such as a magnetic or optical disk, hard disk drive (HDD), or any other type of non-transitory machine or computer readable media. The instructions stored in memory 1614 and memory 1624 may include program instructions or computer program code that, when executed by processors 1612 and 1622, enable the apparatuses 1610 and/or 1620 to perform tasks as described herein.

[0083] In certain example embodiments, apparatuses 1610 and/or 1620 may further include or be coupled to (internal or external) a drive or port that is configured to accept and read an external computer readable storage medium, such as an optical disc, USB drive, flash drive, or any other storage medium. For example, the external computer readable storage medium may store a computer program or software for execution by processors 1612 and 1622 and/or apparatuses 1610 and/or 1620 to perform any of the methods illustrated in FIGs. 2-15.

[0084] In some example embodiments, apparatuses 1610 and/or 1620 may also include or be coupled to one or more antennas 1615 and 1625, respectively, for receiving a downlink signal and for transmitting via an uplink from apparatuses 1610 and/or 1620. Apparatuses 1610 and/or 1620 may further include transceivers 1616 and 1626, respectively, configured to transmit and receive information. The transceivers 1616 and 1626 may also include a radio interface (for example, a modem) respectively coupled to the antennas 1615 and 1625. The radio interface may correspond to a plurality of radio access technologies including one or more of GSM, LTE, LTE-A, 5G, NR, WLAN, NB-IoT, Bluetooth, BT-LE, NFC, RFID, UWB, or the like. The radio interface may include other components, such as filters, converters (for example, digital-to-analog converters or the like), symbol demappers, signal shaping components, an Inverse Fast Fourier Transform (IFFT) module, or the like, to process symbols, such as OFDMA symbols, carried by a downlink or an uplink.

[0085] For instance, transceivers 1616 and 1626 may be respectively configured to modulate information on to a carrier waveform for transmission by the antenna(s) 1615 and 1625, and demodulate information received via the antenna(s) 1615 and 1625 for further processing by other elements of apparatuses 1610 and/or 1620. In other example embodiments, transceivers 1616 and 1626 may be capable of transmitting and receiving signals or data directly. Additionally or alternatively, in some example embodiments, apparatuses 1610 and/or 1620 may include an input and/or output device (170 device). In certain example embodiments, apparatuses 1610 and/or 1620 may further include a user interface, such as a graphical user interface or touch screen [0086] In certain example embodiments, memory 1614 and memory 1624 may store software modules that provide functionality when executed by processors 1612 and 1622, respectively. The modules may include, for example, an operating system that provides operating system functionality for apparatuses 1610 and/or 1620. The memory may also store one or more functional modules, such as an application or program, to provide additional functionality for apparatuses 1610 and/or 1620. The components of apparatuses 1610 and/or 1620 may be implemented in hardware, or as any suitable combination of hardware and software. According to certain example embodiments, apparatus 1610 may optionally be configured to communicate with apparatus 1620 via a wireless or wired communications link 1630 according to any radio access technology, such as NR.

100871 According to certain example embodiments, processors 1612 and 1622, and memory 1614 and 1624 may be included in or may form a part of processing circuitry or control circuitry. In addition, in some example embodiments, transceivers 1616 and 1626 may be included in or may form a part of transceiving circuitry.

[0088] For instance, in certain example embodiments, the apparatus 1610 may be controlled by the memory 1614 and the processor 1612 to analyze data relating to coordinates within a three-dimensional space in which a plurality of extended reality devices are located and generate one or more groups of devices from among the plurality of extended reality devices. The apparatus 1610 may also be controlled to generate one or more shared media content for each group of the one or more groups of devices.

[0089] In some example embodiments, the apparatus 1620 may be controlled 2o by the memory 1624 and the processor 1622 to transmit data related to a shared session identifier of the apparatus and at least one other extended reality device and receive one or more encoded shared media content along with one or more user-specific media content. The apparatus 1620 may also be controlled to assemble a data stream by merging the one or more encoded shared media content and the one or more user-specific media content and input the assembled data stream into a data decoder for displaying the assembled data stream.

[0090] In some example embodiments, an apparatus (e.g., apparatus 1610 and/or apparatus 1620) may include means for performing a method, a process, or any of the variants discussed herein. Examples of the means may include one or more processors, memory, controllers, transmitters, receivers, and/or computer program code for causing the performance of the operations. [0091] Various example embodiments may be directed to an apparatus, such as apparatus 1610, that includes means for analyzing data relating to coordinates within a three-dimensional space in which a plurality of extended reality devices are located and means for generating one or more groups of devices from among the plurality of extended reality devices. The apparatus 1610 may also include means for generating one or more shared media content for each group of the one or more groups of devices.

[0092] Certain example embodiments may be directed to an apparatus, such as apparatus 1620, that includes means for transmitting data related to a shared session identifier of the apparatus and at least one other extended reality device and means for receiving one or more encoded shared media content along with one or more user-specific media content. The apparatus 1620 may also include means for assembling a data stream by merging the one or more encoded shared media content and the one or more user-specific media content and means for inputting the assembled data stream into a data decoder for displaying the assembled data stream.

[0093] As used herein, the term "circuitry" may refer to hardware-only circuitry implementations (for example, analog and/or digital circuitry), combinations of hardware circuits and software, combinations of analog and/or digital hardware circuits with software/firmware, any portions of hardware processor(s) with software, including digital signal processors, that work together to cause an apparatus (for example, apparatus 1610 and/or 1620) to perform various functions, and/or hardware circuit(s) and/or processor(s), or portions thereof, that use software for operation but where the software may not be present when it is not needed for operation. As a further example, as used herein, the term "circuitry" may also cover an implementation of merely a hardware circuit or processor or multiple processors, or portion of a hardware circuit or processor, and the accompanying software and/or firmware. The term circuitry may also cover, for example, a baseband integrated circuit in a server, cellular network node or device, or other computing or network device.

[0094] A computer program product may include one or more computer-executable components which, when the program is run, are configured to carry out some example embodiments. The one or more computer-executable components may be at least one software code or portions of it. Modifications io and configurations required for implementing functionality of certain example embodiments may be performed as routine(s), which may be implemented as added or updated software routine(s). Software routine(s) may be downloaded into the apparatus.

[0095] As an example, software or a computer program code or portions of it may be in a source code form, object code form, or in some intermediate form, and it may be stored in some sort of carrier, distribution medium, or computer readable medium, which may be any entity or device capable of carrying the program. Such carriers may include a record medium, computer memory, read-only memory, photoelectrical and/or electrical carrier signal, telecommunications signal, and software distribution package, for example.

Depending on the processing power needed, the computer program may be executed in a single electronic digital computer or it may be distributed amongst a number of computers. The computer readable medium or computer readable storage medium may be a non-transitory medium.

[0096] In other example embodiments, the functionality may be performed by hardware or circuitry included in an apparatus (for example, apparatuses 1610 and/or 1620), for example through the use of an application specific integrated circuit (ASIC), a programmable gate array (PGA), a field programmable gate array (FPGA), or any other combination of hardware and software. In yet another example embodiment, the functionality may be implemented as a signal, a non-tangible means that can be carried by an electromagnetic signal downloaded from the Internet or other network.

[0097] According to certain example embodiments, an apparatus, such as a node, device, or a corresponding component, may be configured as circuitry, a computer or a microprocessor, such as single-chip computer element, or as a chipset, including at least a memory for providing storage capacity used for arithmetic operation and an operation processor for executing the arithmetic operation.

100981 The features, structures, or characteristics of example embodiments described throughout this specification may be combined in any suitable manner in one or more example embodiments. For example, the usage of the phrases "certain embodiments," "an example embodiment," "some embodiments," or other similar language, throughout this specification refers to the fact that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment. Thus, appearances of the phrases "in certain embodiments," "an example embodiment," "in some embodiments," "in other embodiments," or other similar language, throughout this specification do not necessarily refer to the z o same group of embodiments, and the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. Further, the terms "cell", "node", "gNB", or other similar language throughout this specification may be used interchangeably. [0099] As used herein, "at least one of the following: <a list of two or more elements>" and "at least one of <a list of two or more elements>" and similar wording, where the list of two or more elements are joined by "and" or "or," mean at least any one of the elements, or at least any two or more of the elements, or at least all the elements.

[0100] One having ordinary skill in the art will readily understand that the disclosure as discussed above may be practiced with procedures in a different order, and/or with hardware elements in configurations which are different than those which are disclosed. Therefore, although the disclosure has been described based upon these example embodiments, it would be apparent to those of skill in the art that certain modifications, variations, and alternative constructions would be apparent, while remaining within the spirit and scope of example embodiments. Although the above embodiments refer to 5G NR and LTE technology, the above embodiments may also apply to any other present or future 3GPP technology, such as LTE-advanced, and/or fourth generation (4G) and/or sixth (6G) technology.

[0101] Partial Glossary: [0102] 3GPP 3rd Generation Partnership Project [0103] 5G 5th Generation [0104] 6G 6th Generation [0105] AF Application Function [0106] BD-RATE Bjontegaard Delta Rate [0107] DL Downlink [0108] EMBB Enhanced Mobile Broadband [0109] gNB 50 or Next Generation NodeB [0110] HEVC High Efficiency Video Coding 101111 ID Identifier [0112] LTE Long Term Evolution [0113] MBS Multicast Broadcast Service [0114] NR New Radio [0115] PTM Point to Multipoint [0116] PTP Point to Point [0117] QoE Quality of Experience [0118] QoS Quality of Service [0119] RTC Real-Time Communication 101201 RTP Real-Time Transport Protocol [0121] SRC Split-Rendering Client [0122] SRS Split-Rendering Server [0123] UE User Equipment [0124] UL Uplink 101251 XR Extended Reality

Claims

WE CLAIM: 1. An apparatus, comprising: means for analyzing data relating to coordinates within a three-dimensional space in which a plurality of extended reality devices are located; means for generating one or more groups of devices from among the plurality of extended reality devices; and means for generating one or more shared media content for each 10 group of the one or more groups of devices.
2. The apparatus according to claim 1, further comprising: means for receiving the data related to the plurality of extended reality devices located within the three-dimensional space.
3. The apparatus according to claim 1, further comprising: means for evaluating, based on the data, a distribution within the three-dimensional space and a pose of each of the plurality of extended reality devices.
4. The apparatus according to any one of claims 1-3, further comprising: means for encoding media content for each group of the one or more groups of devices; and means for transmitting the encoded media content to the one or more groups of devices.
5. The apparatus according to any one of claims 1-4, further comprising: means for encoding, as media content, multiple perspective views combined with a reference view as a base view for each group of the one or more groups of devices; and means for transmitting the encoded multiple perspective views and the reference view.
The apparatus according to claim 4 or claim 5, wherein: the encoded media content is transmitted by one or more point-to-multipoint data streams along with one or more user-specific media content transmitted via one or more point-to-point data streams; or the encoded multiple perspective views and the reference view are 10 transmitted by one or more point-to-multipoint data streams along with one or more user-specific views transmitted via one or more point-to-point data streams.
7. The apparatus according to claim 6, wherein one of the one or more point-to-multipoint data streams and the point-to-point data stream are transmitted in parallel to each of the plurality of extended reality devices.
8. An apparatus, comprising: means for transmitting data related to a shared session identifier of the apparatus and at least one other extended reality device; means for receiving one or more encoded shared media content along with one or more user-specific media content; means for assembling a data stream by merging the one or more encoded shared media content and the one or more user-specific media content; and means for inputting the assembled data stream into a data decoder for displaying the assembled data stream.
9. The apparatus according to claim 8, wherein the one or more shared media content comprises multiple perspective views and a reference view.
10. The apparatus according to claim 8 or claim 9, wherein: the one or more encoded shared media content is transmitted by one or more point-to-multipoint data streams along with the one or more user-specific media content transmitted via one or more point-to-point data streams; or the encoded multiple perspective views and the reference view are 10 received by one or more point-to-multipoint data streams and one or more user-specific views are received via a point-to-point data stream.
11. The apparatus according to claim 10, further comprising: means for subscribing to the one or more point-to-multipoint data streams and the point-to-point data stream.
12. The apparatus according to any one of claims 8-11, wherein the one or more encoded shared media content and the one or more user-specific media content are received in parallel.
13. The apparatus according to any one of claims 8-12, further comprising: means for displaying the assembled data stream.
14. A method, comprising: analyzing data relating to coordinates within a three-dimensional space in which a plurality of extended reality devices are located; generating one or more groups of devices from among the plurality of extended reality devices; and generating one or more shared media content for each group of the one or more groups of devices.
15. The method according to claim 14, further comprising: receiving the data related to the plurality of extended reality devices located within the three-dimensional space.
16. The method according to claim 14, further comprising: evaluating, based on the data, a distribution within the three-dimensional space and a pose of each of the plurality of extended reality devices.
17. The method according to any one of claims 14-I 6, further comprising: encoding media content for each group of the one or more groups of devices; and transmitting the encoded media content to the one or more groups of devices.
18. The method according to any one of claims 14-17, further comprising: encoding, as media content, multiple perspective views combined with a reference view as a base view for each group of the one or more groups of devices; and transmitting the encoded multiple perspective views and the reference view.
19. The method according to claim 17 or claim 18, wherein: the encoded media content is transmitted by one or more point-tomultipoint data streams along with one or more user-specific media content transmitted via one or more point-to-point data streams; or the encoded multiple perspective views and the reference view are transmitted by one or more point-to-multipoint data streams along with one or more user-specific views transmitted via one or more point-to-point data streams.
20. The method according to claim 19, wherein one of the one or more point-to-multipoint data streams and the point-to-point data stream are transmitted in parallel to each of the plurality of extended reality devices.
21. A method, comprising: transmitting data related to a shared session identifier of an apparatus and at least one other extended reality device; receiving one or more encoded shared media content along with one or more user-specific media content; assembling a data stream by merging the one or more encoded shared media content and the one or more user-specific media content; and inputting the assembled data stream into a data decoder for displaying the assembled data stream.
22. The method according to claim 21, wherein the one or more shared media content comprises multiple perspective views and a reference view.
23. The method according to claim 21 or claim 22, wherein: the one or more encoded shared media content is transmitted by one or more point-to-multipoint data streams along with the one or more user-specific media content transmitted via one or more point-to-point data streams; or the encoded multiple perspective views and the reference view are received by one or more point-to-multipoint data streams and one or more user-specific views are received via a point-to-point data stream.
24. The method according to claim 23, further comprising: subscribing to the one or more point-to-multipoint data streams and the point-to-point data stream.
25. The method according to any one of claims 21-24, wherein the one or more encoded shared media content and the one or more user-specific media content are received in parallel.
26. The method according to any one of claims 21-25, further comprising: displaying the assembled data stream.